Fast, Efficient Message Archiving for BizTalk with the BizTalk Message Archiving Pipeline Component. Download a Free 14-Day Trial!

Yet Another Non-Deterministic BizTalk Zombie Pattern

Victor Fehlberg’s article on Zombies has prompted me to blog about a similar problem I recently encountered with instances that have ‘completed without consuming all of their messages’ and propose yet another BizTalk Zombie Pattern.

The Technet article on Zombies in BizTalk 2006 details three types of Zombie messages:

  • Terminate control messages;
  • Parallel listen receives; and
  • Sequential convoys with non-deterministic endpoints.

Victor’s article deals with the common pattern that falls into the third category – a while loop surrounding a listen with one branch having a receive and the other having a delay shape, followed by a construct which sets a variable to indicate that the while loop should stop. This is non-deterministic since the delay could be triggered, but a message could still be delivered.

Non-Deterministic Orchestration - Creates ZombieBut I believe I’ve discovered yet another non-deterministic pattern that will always produce zombies; it doesn’t quite fit the pattern described above or on Technet, but it does use sequential convoys and in my opinion, is also non-deterministic.

The branching non-deterministic zombie pattern

Consider the orchestration in the image to the left, we have a sequential convoy (albeit without the while loop, but you get the idea) which can either branch and collect the next message in the convoy (down the ELSE branch) or terminate the orchestration (down the TERMINATE branch).

If a second message is received and correlated to this particular orchestration, but the workflow logic has determined that we will go down the TERMINATE branch, the second message will never be received by the orchestration, causing a zombie. Every. Single. Time. In my opinion, this is non-deterministic in the truest of senses: at the start of the orchestration you cannot determine whether the ELSE or TERMINATE branch will be traversed!

Is there a good reason for using this sort of pattern?

I’ve been trying to think why a sequential convoy would require this kind of branching functionality and I honestly can’t think of a good reason, unless the messages being delivered to the orchestration were controlled in some way. 

More to the point, I’m surprised that the XLANG/S compiler lets you even create an orchestration with the sort of potential damage that this pattern presents!

If you are using this kind of pattern, I’d be interested to know how (and why)?

And in case you’re wondering, a very similar pattern was noticed by our testing team who encountered missing messages on our UAT environment; on closer inspection of live, we had just over 10,500 suspended, zombie instances…. Ouch.

Debugging the problem took half a day and made me realise that we really do need a good tool to help search for subscriptions based on the context properties in a message, rather than the out-of-the-box Admin Console Subscription Viewer.

Sample project

If you’re interested in trying out the pattern for yourself, download the sample project and test messages.

1 Response to “Yet Another Non-Deterministic BizTalk Zombie Pattern”


  1. 1 Gregory Van de Wiele

    Joe Duffy, whois an expert on all things multi-t and locking, and regularly writes for MSDN magazine wrote this once:

    13. A race condition or deadlock in library code is always a bug.

    http://www.bluebytesoftware.com/blog/PermaLink,guid,f8404ab3-e3e6-4933-a5bc-b69348deedba.aspx

    BizTalk is the library in this case and this problem should have been better addressed by MS.

    A workaround for certain scenarios much like the one of Victor: disable the receive location by code in the orchestration, wait twice the cache interval (forgot why exactly) and asynchronously start orchestration that enables the RL.

    Antoher workaround that I used for singleton: a service window around midnight for the RL, inside the delay branch of the listen shape you can check the time and if you are inside the service window (take a margin here + or – 1 or 2 minutes) you can more or less safely recycle. That solved the problem for me. I guess it is not bullet proof either but it worked for us.

    Gregory

Leave a Reply





Get Adobe Flash playerPlugin by wpburn.com wordpress themes

Nick Heppleston’s BizTalk Blog is Digg proof thanks to caching by WP Super Cache