Sitecore EXM 9.1 Performance and Scale

When working with Sitecore EXM it seems like one question everyone has is what level of performance can you get out of it. As with most things, the answer is "it depends". However, there are a number of things that go into this and things to think through and adjust to try to get a high rate of sending. Sitecore Hacker has a good blog post on scaling EXM. As I spent time trying to scale my own instance I wanted to break things down a little more and provide some more concrete examples on steps takes to performance tune and performance I have seen.

Let's breakdown some specifics about the architecture to help you understand where you might stand. I am running in AWS with a dedicated Content Management server, a dedicated dispatch server, a dedicated xConnect Server and of course a dedicated database server. Here are the specifications for all.

Content Management: 16 gb RAM, 2.3 Ghz 4 core processor.
Dedicated Dispatch: 16 GB RAM 3.0 Ghz 8 core
xConnect: 4 gb RAM, 2.3 Ghz 2 core processor.
Database: 16gb RAM, 2.3 Ghz 4 core processor.

The dedicated dispatch server has the best specs of all the servers as it will be doing the most work. As always you can scale out or up depending on how quickly you can get your emails to rendering an inject the email into your MTA.

Optimizing the page

The first thing to do is make sure your page renders as quickly as it can, with the Sitecore Friday Best Practices video to get an overview of what this is and how to use it. Can you use the debugger with an EXM email? You sure can! All the debugger wants is a site that has allowDebug (see line 7 of the XML config below) set to true and an ID of an item to render in debug mode. I have highlighted the items in the URL you need to provide. The key here is the item id is the ID of the email root. You don't have to provide the sc_site if you only have one site.

<sites>
          <site name="mysite" patch:after="site[@name='modules_website']"
            targetHostName="rhino.acme.com"
            enableTracking="true" virtualFolder="/" physicalFolder="/"
            rootPath="/sitecore/content/mysite" 
            startItem="/home" database="web" domain="extranet"
            allowDebug="true" cacheHtml="true" htmlCacheSize="50MB" registryCacheSize="0"
            viewStateCacheSize="0" xslCacheSize="25MB" filteredItemsCacheSize="10MB"
            enablePreview="true" enableWebEdit="true" enableDebugger="true"
            disableClientData="false" cacheRenderingParameters="true"
            renderingParametersCacheSize="10MB" />
        </sites>

http://<CM URL>/?sc_debug=1&sc_prof=1&sc_trace=1&sc_ri=1&sc_site=<SITE NAME>&sc_itemid={54C39350-8A72-41ED-AF8A-EB0A40043A8E}

Now load the page and find your hot spots and start tuning the code. Tuning your code is highly dependent on what your page needs to do. Be aware though, you need to think of this in terms of performance (how quickly things run fr a given request) and load (how quickly things run under load). Many times you can get performance running well but underload there are other bottlenecks that cause it to slow down. Here are a few I have found:

Search searches: Are you doing queries with //* in them?
Solr searches: While with a single request these may be fast under load it starts to created contention and queue. Do you really need to do the search every time or can you cache the result?
Personalization rules: Again these might depend on what the rule is doing and how performant the code for the rule is. I have found that a small handful of these is not overly impactful.
Caching: rendering cache and HTML caching. As always caching as much as you can helps.

Message Generation and Emulation

Once you have performance of the page optimized it is time to move on to sending. I recommend using emulation mode to break down the sending process a little. This lets you keep the Mail Transfer Agent (MTA) integration out of the picture and focus on how well just the message generation performs. Without first using emulation the integration with MTA takes up some thread processing time and can introduce additional latency.

There are a couple settings to be aware of when using emulation mode.

<!--
 The minimum amount of time to emulate a single sending (milliseconds). 
-->
<setting name="MtaEmulation.MinSendTime" value="200" patch:source="Sitecore.EmailExperience.ContentManagement.config"/>
<!--
 The maximum amount of time to emulate a single sending (milliseconds). 
-->
<setting name="MtaEmulation.MaxSendTime" value="400" patch:source="Sitecore.EmailExperience.ContentManagement.config"/>
<!--  The probability of a connection fail (%). -->
<setting name="MtaEmulation.FailProbability" value="0.01" patch:source="Sitecore.EmailExperience.ContentManagement.config"/>

These help you control what the "expected" latency of working with your MTA is. This is why emulation mode is helpful. You are able to rule out any unexpected latency or errors from the MTA integration and just see what the speed and CPU impact of message generation is.

Now it is time to start doing some sends and playing with configuration settings. The objective is to find the right combination of settings to keep your CPU about 80-90%. Here are the settings you will be working with.

NumberThreads: The number of threads you can use for sending messages. Increases CPU. It could be set significantly higher on DDS servers (to the magnitude of 100+ threads).
MaxGenerationThreads: Specifies how many sending threads can generate messages at the same time. The value should be no less than 1. This can also be pushed into 100+ on DDS Default value: Environment.ProcessorCount * 2
DispatchEnqueueBatchSize: The number of recipients en-queued in the dispatch queue. Increasing this will consume more CPU and RAM, but can increase the speed for adding large contact lists to the queue.
DispatchEnqueueThreadsNumber: The number of threads that adds recipient batches to this dispatch queue. Number of threads to create to batch users. Increasing will consume more CPU.
EXM.DispatchBatchSize: The number of contacts that each dispatch thread will attempt to process at a time.

Sitecore's EXM performance tuning guide.
"By default, four threads run in parallel and each thread queues 300 contacts at a time:
DispatchEnqueueBatchSize = default value 300.
DispatchEnqueueThreadsNumber = default value 4."

So these two settings are focused on getting contacts queued up. These settings are utilized in the QueueMessage pipeline processor.

"The message task runner starts multiple dispatch tasks according to the following settings specified in the Sitecore.EmailExperience.ContentManagement.config file:
NumberThreads = default value 10.
MaxGenerationThreads = default value is the number of processors on the current machine * 2.
The MaxGenerationThreads setting limits the number of dispatch tasks that can run concurrently. For example:
If the values for NumberThreads and MaxGenerationThreads are both set to 16, 16 dispatch tasks process concurrently.
If MaxGenerationThreads is set to 8 and NumberThreads is set to 16, only 8 of the dispatch tasks process concurrently while the other 8 tasks are blocked and wait to be processed."
These settings are used by the message task runner.

"The dispatch task processes contacts in batches according to the EXM.DispatchBatchSize setting in the Sitecore.EmailExperience.ContentManagement.config file. The default batch size is 100."

Each MaxGenerationThread is going to grab a batch of contacts based on the EXM.DispatchBatchSize setting. So if you have 1000 contacts, 10 generation threads and a batch size of 100, all threads will have work to do. Contacts/generation thread = batch size. Or batch size * generation threads = contacts. The key thing to understand here is that if your list of contacts is less then this you will have threads that have no work to due. If you keep these settings but have only 500 contacts the first 5 generation threads will grab a batch of 100 each, and at this point, there are no more batches for the other 5 threads to grab.

This is really a matter of trial and error trying different combinations to get your CPU utilization to the right place. So far with our above specs we have landed on the following:

NumberThreads = 100
MaxGenerationThreads = 90 (when this is the same as NumberThreads you may see issues with your sending UI status not getting updated).
DispatchEnqueueBatchSize = 1000
DispatchEnqueueThreadsNumber = 10
EXM.DispatchBatchSize = 200

Turn Emulation Off

Once you have the performance you want with all these tuning steps, it is time to turn off emulation mode and see what integration with the MTA does. If you are using Sitecore Cloud (SparkPost) you don't really have any control over this. For this effort, I have been integrating with a custom MTA so we wrote our own provider. The key with this is really API latency. The longer your API calls the fewer emails per second you can send. So squeeze everything you can out of your load balancers, network, and servers.

The Results

After all, this here is the throughput we landed at (this is still an ongoing effort so I will try and keep this updated with any changes).

Again servers specs are:
Content Management: 16 GB RAM, 2.3 GHz4 core processor.
Dedicated Dispatch (DDS): 16 GB RAM 3.0 GHz 8 core
xConnect: 4 GB RAM, 2.3 GHz2 core processor.
Database: 16 GB RAM, 2.3 GHz 4 core processor.

With this, we are getting ~38 emails a second.
Generate: min: 00:00:00.1249820; avg: 00:00:01.0874550;
Send: min: 00:00:00.2656099; avg: 00:00:00.4044960;
Process: min: 00:00:00.4218560; avg: 00:00:01.4919510;

Generate is how long it is taking to generate an email. We have some pretty intensive logic happening for email generation (multiple controller renderings that hold Solr or Sitecore queries or external data queries). The send line tells you how long it is taking to talk to your MTA and both Generate and Send give you your process numbers.

Hopefully, that helps give a little perspective to how EXM looks from a performance and scaling perspective. You can also scale up or out your DDS machines (assuming your license can cover it) or get your email to generate faster.

Excel XIRR and C#

I have spend that last couple days trying to figure out how to run and Excel XIRR function in a C# application. This process has been more painful that I thought it would have been when started. To save others (or myself the pain in the future if I have to do it again) I thought I would right a post about this (as post about XIRR in C# have been hard to come by). Lets start with the easy part first. In order to make this call you need to use the Microsoft.Office.Interop.Excel dll. When you use this dll take note of what version of the dll you are using. If you are using a version less then 12 (at the time of this writing 12 was the highest version) you will not have an XIRR function call. This does not mean you cannot still do XIRR though. As of version 12 (a.k.a Office 2007) the XIRR function is a built in function to Excel. Prior version need an add-in to use this function. Even if you have version 12 of the interop though it does not mean you will be able to use the function. The

Toad's Code

Search This Blog