<< Back to previous view

[QB-3399] Parallel Steps causing a double trigger of script on node
Created: 05/Jun/19  Updated: 03/Mar/20

Status: Resolved
Project: QuickBuild
Component/s: None
Affects Version/s: 8.0.32
Fix Version/s: 9.0.39

Type: Bug Priority: Minor
Reporter: Todd Scholl Assigned To: Unassigned
Resolution: Fixed Votes: 3
Remaining Estimate: Unknown Time Spent: Unknown
Original Estimate: Unknown
Environment: System Date and Time 2019-06-05 09:49:12
Operating System Linux 4.4.114-94.14-default, amd64
JVM Java HotSpot(TM) 64-Bit Server VM 1.8.0_131, Oracle Corporation
QuickBuild Version 8.0.32 - Thu Jan 03 21:16:14 EST 2019
Total Heap Memory 5.33 GB
Used Heap Memory 696.15 MB

File Attachments: Text File full-log(1).txt     Text File full-log.txt     Text File full-log_build_id_19747626.txt     Text File full-log_build_id_19781143.txt     Text File full-log_build_id_19781143_show_steps.txt    

 Description   
Support,
We have had an issue with a job we set up to trigger a script across 8 servers using a parallel step sequence.Some time the job runs fine but, about twice a week, 2 servers in the parallel steps will fail out because the script was run twice. We are mystified as to the cause since on other days, the job runs fine.

What we know:
1. The steps trigger a ksh script that creates a log with a time stamp as its first step.
2. We know from that time stamp that the 2nd fire is about 1 - 4 seconds after the first
3. There are no repeat parameters or retry parameters setup for this job.
4. When you set the job up as sequential that the issue does not occur
5. There is no second steps running in the build log, it simply fires the script off a second time with the exact same arguments.

I will attach the build log for the 2 failed jobs we have. Let me know what else you need

 Comments   
Comment by Robin Shen [ 05/Jun/19 11:46 PM ]
Thanks reporting. Will investigate the issue.
Comment by Robin Shen [ 07/Jun/19 03:06 AM ]
Can you please send backup of your database (can be taken from QB admin page) and let me know the build and step to check?
Comment by Robin Shen [ 07/Jun/19 03:06 AM ]
Backup can be sent to [robin AT pmease DOT com]
Comment by Todd Scholl [ 07/Jun/19 02:53 PM ]
Builds to look at:
https://automation.pfsweb.com:8810/build/19747626
https://automation.pfsweb.com:8810/build/19781143

The job is one parallel composition with multiple steps all doing the same thing on different boxes.
Comment by Robin Shen [ 07/Jun/19 02:57 PM ]
As a vendor outside of your company, I can not access your internal server.
Comment by Robin Shen [ 10/Jun/19 08:19 AM ]
Thanks for sending the backup. I restored it successfully now. For the failed build log, please let me know the corresponding build id, also please check the option "show steps" when download the build log.
Comment by Todd Scholl [ 12/Jun/19 02:28 PM ]
Build logs re attached with build ids referenced
Comment by Todd Scholl [ 12/Jun/19 02:30 PM ]
Second build has rolled out of history (More then 30days) but I have re downloaded the build log for 19781143 with show steps
Comment by Robin Shen [ 14/Jun/19 02:23 AM ]
Thanks for sending the log. Checked carefully with code but still can not find the reason. Is it possible that you upgrade to QB9 which has additional debug logging for this issue? It can be downloaded from:
https://build.pmease.com/build/4898

After upgrading to this version, please edit general setting of the configuration in question to use debug logging, then reproduce the issue to send build log (still with "show steps" option ticked).
Comment by Mathieu Hidrio [ 02/Aug/19 09:39 AM ]
Just to let you know that we have the exact same issue.
Already discussed on the forum here https://support.pmease.com/PMEase/QuickBuild/topics/3996/steps-executed-even-if-previous-one-is-failed?1 and by email with Robin.

We're using QB 8.20 but we can't upgrade to QB 9.0 for the moment.
We have no idea why and when this happens and it's really a pain :(

"Happy" to see that we're not alone and I hope a fix we'll be found soon.
Comment by Dennis Morand [ 22/Aug/19 07:58 PM ]
I have the same issue and I'm running QB9.0.11
Comment by Rafael Pallares [ 12/Sep/19 09:24 AM ]
Hi,

Can you increase the priority of this issue.
It's very time consuming for us and we don't have any workaround.

Our only hope was to upgrade to QB 9, but if it don't fix it....

We are working with a MariaDB database. I don't think it the root cause, but may be.
@Dennis Morand, what DB are you using ?

It's probably a concurrency issue while reserving resources.

Thanks,
Rafael
Comment by Dennis Morand [ 12/Sep/19 11:35 AM ]
I am using Microsoft SQL
Comment by Robin Shen [ 12/Sep/19 10:40 PM ]
According to several reports here, it seems that this issue occurs when the step is being enclosed in a parallel container step. Is that right? If so, have you specified the concurrent workers in advanced settings of the parallel step?

Upgrading to QB9 will not solve the issue, but will give us more clue when the issue happens. After upgrading to QB9, please modify "conf/log4j.properties" on the agent running PARENT step of the duplicating step to add below line:

log4j.logger.com.pmease.quickbuild.grid.GridImpl=DEBUG

When the issue happens, please send me the agent log on that node.
Comment by Mathieu Hidrio [ 19/Sep/19 01:04 PM ]
Indeed, in our cases the problem always happens in parallel steps.
The number of workers is set to 0. We never used this feature till now.
Comment by Robin Shen [ 03/Mar/20 01:59 AM ]
Finally get this issue solved in 9.0.39:
https://build.pmease.com/build/5118

Sorry for taking so long time.
Comment by Rafael Pallares [ 03/Mar/20 02:05 PM ]
Hurrah!
I didn't think anymore this would ever happen.

Thanks for that fix !

But now we have to upgrade ... :)
Generated at Sat Apr 20 04:55:31 UTC 2024 using JIRA 189.