Skip to main content

Windows Workflow, child workflows and parallel loops

I recently needed to create a pretty complex workflow to process a large volume of records as quickly as possible. We needed to process 500k records in an hour. We have a scheduled process that triggers the initial workflow that looks for scheduled work. When work is found a call is made out to another system to load the data we need per order (so this can be 1-X order records). Each order record can have 1 to X  number of customer records we need to process. Along the way we have a few user approvals we wait for. In trying to create a Workflow process for this we ran into a few challenges and learnings. Hopefully, this helps some others out there.

At a high level here is what we put together. 


The idea here is the Management Workflow is the workflow that is fired off by the schedule. It then can fire off the processing workflow. The green workflows can have multiple instances of them created by its parent.

The Processing Workflow is responsible for finding all the customers that need to be processed for a given order.

The Send workflow is responsible for packaging up all the data for a given customer (calls to CMS system or any other external system) and sending that data to the delivery system.

With these processes able to spin up multiple instances and load balances via IIS we are able to processes thousands of records a second. 

Since these are Windows Workflow Services hosted in IIS we load balance new workflow instances across servers. This approach allows us to scale both out and up to deal with demand. There are some things to keep in mind when setting Windows Workflow up this way though.

Duplex Communication

The first thing we needed in this model is a way for the Processing workflow to spin up child (Send) workflows and to have those child workflows report back on there status.

We used an approach called out on MSDN for Parent-Child Workflow Pattern Using Durable Duplex. Based on this code we were able to create our setup and get this parent child relationship working. To make this work you need to make sure you get your WCF bindings configured correctly and make sure your workflow objects are setup with Callback Correlation correctly. The easily way to get start on this is to download the example (direct link) and really review it.


One of the main issues we ran into once we got this setup and running was throttling issues. In each workflow we had a couple Parallel ForEach loops and some async activities. This was one of those situation where we actually slowed ourselves down by trying to go to fast. The workflow would spin so many parallels up and connections to other servers that we killed our active connections causing an app pool reset.

To solve this we had to do a couple of things. First, realize that just because you can parallel does not mean you should parallel. We reduced the number of parallel for each loop we had or manually throttled how many could be created at one time (wish Workflow designer gave you a setting for this). Second, we realized that in production there is some tuning that needs to be done on WCF (these are all Windows Workflow services so WCF). There is a great blog post for Tweaking your WCF apps for high throughput workloads that helped. It calls out how to troubleshoot this issue and the config changes you cam make for MaxConcurrentCalls, MaxConcurrentInstances and MaxConcurrentSessions.

MaxConcurrentInstances (when you follow the steps in the blog) will stop additional ones from spinning up. But we also needed to limit how many items we passed into the parallel so it did not spin up to much at a time. We did this by using a while and simply picking a certain number from our collection and passing those to a parallel for each. As long as the while still had items to pick it kept going back to the parallel.


Now that you have these workflows up and running how do you keep track of what they are doing, what state they are in and if the child workflows are healthy? If the child workflows are calling back to the parent what do you do when one dies on you?

I am sure there are a number of ways to tackle this but here is what we did. We used Windows Workflow Foundation tracking we created a custom tracking service so we can track and log the data we need about the workflow to our database. This allows us to track each state of the workflow we care about. So now we can log when a workflow goes into an aborted or unhandledexception state. Keep in mind that if you track everything this will track A LOT of data and write A LOT to the database. So in production it is important to create a custom tracking profile to limit what is logged to only what is required.

Now that you have your workflows logging information to trace you can have your workflows start to monitor that data. In our case we have the parent workflow keep track of what workflows it spins up. Then for each workflow it spins up in goes into a pick activity. One pick branch is triggered when the child calls back. The other pick branch has a heartbeat (a delay followed by a check on the child’s workflow state). If the child workflow state is found to be in an bad state (based on the state logged by the tracker) a flag is set to mark that workflow as unhealthy and the pick branch’s action is triggered. If the child workflow is healthy the heartbeat loops and wait for a configured amount of time before firing off again. If the child workflow stays healthy and calls back the first pick branch’s action is triggered and we go on about our marry way.

Now you have an approach to creating a parent child workflow and monitoring the states of the children. Next up is moving this solution to a load balanced environment (post in progress).


Popular posts from this blog

Excel XIRR and C#

I have spend that last couple days trying to figure out how to run and Excel XIRR function in a C# application. This process has been more painful that I thought it would have been when started. To save others (or myself the pain in the future if I have to do it again) I thought I would right a post about this (as post about XIRR in C# have been hard to come by). Lets start with the easy part first. In order to make this call you need to use the Microsoft.Office.Interop.Excel dll. When you use this dll take note of what version of the dll you are using. If you are using a version less then 12 (at the time of this writing 12 was the highest version) you will not have an XIRR function call. This does not mean you cannot still do XIRR though. As of version 12 (a.k.a Office 2007) the XIRR function is a built in function to Excel. Prior version need an add-in to use this function. Even if you have version 12 of the interop though it does not mean you will be able to use the function. The

Experience Profile Anonymous, Unknown and Known contacts

When you first get started with Sitecore's experience profile the reporting for contacts can cause a little confusion. There are 3 terms that are thrown around, 1) Anonymous 2) Unknown 3) Known. When you read the docs they can bleed into each other a little. First, have a read through the Sitecore tracking documentation to get a feel for what Sitecore is trying to do. There are a couple key things here to first understand: Unless you call " IdentifyAs() " for request the contact is always anonymous.  Tracking of anonymous contacts is off by default.  Even if you call "IdentifyAs()" if you don't set facet values for the contact (like first name and email) the contact will still show up in your experience profile as "unknown" (because it has no facet data to display).  Enabled Anonymous contacts Notice in the picture I have two contacts marked in a red box. Those are my "known" contacts that I called "IdentifyAs"

Uniting Testing Expression Predicate with Moq

I recently was setting up a repository in a project with an interface on all repositories that took a predicate. As part of this I needed to mock out this call so I could unit test my code. The vast majority of samples out there for mocking an expression predicate just is It.IsAny<> which is not very helpful as it does not test anything other then verify it got a predicate. What if you actually want to test that you got a certain predicate though? It is actually pretty easy to do but not very straight forward. Here is what you do for the It.IsAny<> approach in case someone is looking for that. this .bindingRepository.Setup(c => c.Get(It.IsAny<Expression<Func<UserBinding, bool >>>())) .Returns( new List<UserBinding>() { defaultBinding }.AsQueryable()); This example just says to always return a collection of UserBindings that contain “defaultBinding” (which is an object I setup previously). Here is what it looks like when you want to pass in an exp

WPF Localization - RESX Option

About a year ago I was building a WPF project in .Net 3.0 and Visual Studio 2005. I wanted to revisit this subject and see what has changed in .Net 3.5 and Visual Studio 2008. I will make a few of these posts to try and cover all the different options (RESX option, LocBaml option, Resource Dictionary Option). In this blog I will focus on using a resx file to localize an application. To show how the resx option is done I created a WPF form with three labels on it. The first label has is text set inline in XAML, the second has it text set via code behind from the resx file and the third has its text set via XAML accessing the resx file. The first thing that needs to happen to setup a project for localization is a small change to the project file. To make this change you will need to open the project file in notepad (or some other generic editor). In the first PropertyGroup section you need to add the follow XML node <UICulture>en-US</UICulture>. So the project file node w

Advanced Item Cloning

Cloning in Sitecore can be extremely useful. It makes reusing of content items and updating of those items very easy. The default capabilities for item cloning can usually handle most needs. The default behavior does have one thing that can really trip you up. By default clone, child items stay linked to the source cloned item and are not reparented to their new cloned parent. The first thing to understand is there are configuration options for cloning that allow you to change how cloning works. The configuration files have them pretty well documented but if you don't know what you are looking for you may not know they are there. <setting name="ItemCloning.Enabled" value="true"/> Specifies whether the Item Cloning feature is enabled Default value on CM and Standalone servers: true. Default value on CD, Processing and Reporting servers: false. <setting name="ItemCloning.NonInheritedFields" value=""/> Specifies a pipe-separated lis