History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QB-3399
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Unassigned
Reporter: Todd Scholl
Votes: 3
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
QuickBuild

Parallel Steps causing a double trigger of script on node

Created: 05/Jun/19 02:16 PM   Updated: 03/Mar/20 02:05 PM
Component/s: None
Affects Version/s: 8.0.32
Fix Version/s: 9.0.39

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
File Attachments: 1. Text File full-log(1).txt (359 kb)
2. Text File full-log.txt (379 kb)
3. Text File full-log_build_id_19747626.txt (359 kb)
4. Text File full-log_build_id_19781143.txt (379 kb)
5. Text File full-log_build_id_19781143_show_steps.txt (763 kb)

Environment:
System Date and Time 2019-06-05 09:49:12
Operating System Linux 4.4.114-94.14-default, amd64
JVM Java HotSpot(TM) 64-Bit Server VM 1.8.0_131, Oracle Corporation
QuickBuild Version 8.0.32 - Thu Jan 03 21:16:14 EST 2019
Total Heap Memory 5.33 GB
Used Heap Memory 696.15 MB


 Description  « Hide
Support,
We have had an issue with a job we set up to trigger a script across 8 servers using a parallel step sequence.Some time the job runs fine but, about twice a week, 2 servers in the parallel steps will fail out because the script was run twice. We are mystified as to the cause since on other days, the job runs fine.

What we know:
1. The steps trigger a ksh script that creates a log with a time stamp as its first step.
2. We know from that time stamp that the 2nd fire is about 1 - 4 seconds after the first
3. There are no repeat parameters or retry parameters setup for this job.
4. When you set the job up as sequential that the issue does not occur
5. There is no second steps running in the build log, it simply fires the script off a second time with the exact same arguments.

I will attach the build log for the 2 failed jobs we have. Let me know what else you need

 All   Comments   Work Log   Change History      Sort Order:
Change by Todd Scholl [05/Jun/19 02:38 PM]
Field Original Value New Value
Attachment full-log.txt [ 10774 ]

Change by Todd Scholl [05/Jun/19 02:39 PM]
Attachment full-log(1).txt [ 10775 ]

Robin Shen [05/Jun/19 11:46 PM]
Thanks reporting. Will investigate the issue.

Robin Shen [07/Jun/19 03:06 AM]
Can you please send backup of your database (can be taken from QB admin page) and let me know the build and step to check?

Robin Shen [07/Jun/19 03:06 AM]
Backup can be sent to [robin AT pmease DOT com]

Todd Scholl [07/Jun/19 02:53 PM]
Builds to look at:
https://automation.pfsweb.com:8810/build/19747626
https://automation.pfsweb.com:8810/build/19781143

The job is one parallel composition with multiple steps all doing the same thing on different boxes.

Robin Shen [07/Jun/19 02:57 PM]
As a vendor outside of your company, I can not access your internal server.

Robin Shen [10/Jun/19 08:19 AM]
Thanks for sending the backup. I restored it successfully now. For the failed build log, please let me know the corresponding build id, also please check the option "show steps" when download the build log.

Todd Scholl [12/Jun/19 02:28 PM]
Build logs re attached with build ids referenced

Change by Todd Scholl [12/Jun/19 02:28 PM]
Attachment full-log_build_id_19781143.txt [ 10778 ]
Attachment full-log_build_id_19747626.txt [ 10779 ]

Todd Scholl [12/Jun/19 02:30 PM]
Second build has rolled out of history (More then 30days) but I have re downloaded the build log for 19781143 with show steps

Change by Todd Scholl [12/Jun/19 02:30 PM]
Attachment full-log_build_id_19781143_show_steps.txt [ 10780 ]

Robin Shen [14/Jun/19 02:23 AM]
Thanks for sending the log. Checked carefully with code but still can not find the reason. Is it possible that you upgrade to QB9 which has additional debug logging for this issue? It can be downloaded from:
https://build.pmease.com/build/4898

After upgrading to this version, please edit general setting of the configuration in question to use debug logging, then reproduce the issue to send build log (still with "show steps" option ticked).

Mathieu Hidrio [02/Aug/19 09:39 AM]
Just to let you know that we have the exact same issue.
Already discussed on the forum here https://support.pmease.com/PMEase/QuickBuild/topics/3996/steps-executed-even-if-previous-one-is-failed?1 and by email with Robin.

We're using QB 8.20 but we can't upgrade to QB 9.0 for the moment.
We have no idea why and when this happens and it's really a pain :(

"Happy" to see that we're not alone and I hope a fix we'll be found soon.

Dennis Morand [22/Aug/19 07:58 PM]
I have the same issue and I'm running QB9.0.11

Rafael Pallares [12/Sep/19 09:24 AM]
Hi,

Can you increase the priority of this issue.
It's very time consuming for us and we don't have any workaround.

Our only hope was to upgrade to QB 9, but if it don't fix it....

We are working with a MariaDB database. I don't think it the root cause, but may be.
@Dennis Morand, what DB are you using ?

It's probably a concurrency issue while reserving resources.

Thanks,
Rafael

Dennis Morand [12/Sep/19 11:35 AM]
I am using Microsoft SQL

Robin Shen [12/Sep/19 10:40 PM]
According to several reports here, it seems that this issue occurs when the step is being enclosed in a parallel container step. Is that right? If so, have you specified the concurrent workers in advanced settings of the parallel step?

Upgrading to QB9 will not solve the issue, but will give us more clue when the issue happens. After upgrading to QB9, please modify "conf/log4j.properties" on the agent running PARENT step of the duplicating step to add below line:

log4j.logger.com.pmease.quickbuild.grid.GridImpl=DEBUG

When the issue happens, please send me the agent log on that node.

Mathieu Hidrio [19/Sep/19 01:04 PM]
Indeed, in our cases the problem always happens in parallel steps.
The number of workers is set to 0. We never used this feature till now.

Change by Steve Luo [03/Mar/20 01:54 AM]
Status Open [ 1 ] Resolved [ 5 ]
Assignee Robin Shen [ robinshine ]
Resolution Fixed [ 1 ]

Change by Steve Luo [03/Mar/20 01:54 AM]
Fix Version/s 9.0.39 [ 11932 ]

Robin Shen [03/Mar/20 01:59 AM]
Finally get this issue solved in 9.0.39:
https://build.pmease.com/build/5118

Sorry for taking so long time.

Rafael Pallares [03/Mar/20 02:05 PM]
Hurrah!
I didn't think anymore this would ever happen.

Thanks for that fix !

But now we have to upgrade ... :)