Server Side Sync – Polling interval

I have been battling with getting server side sync and forward mailbox working the way I want it to over the last couple of weeks. Googling hasn’t helped me a lot on this regard, and so in this post I will explain my understanding of the server side sync process.

The biggest hurdle that I faced in understanding the server side sync process is how little detail there is in the mailbox sync errors. I am not talking about the CRM Tracing, just the standard OOB logging in the application (like workflow error). Lot of times, I have seen unknown errors and mailboxes getting disabled randomly, but no detailed log in the application to indicate what the issue could be.

I used the following query in LINQPad with the CRM driver installed.

from m in MailboxSet
where m.EmailAddress == "email@email.com"
select new {
m.EmailAddress,
PostponeMailboxProcessingUntil = m.PostponeMailboxProcessingUntil.Value.ToLocalTime(),
m.ProcessedTimes,
ProcessingLastAttemptedOn = m.ProcessingLastAttemptedOn.Value.ToLocalTime(),
ReceivingPostponedUntil = m.ReceivingPostponedUntil.Value.ToLocalTime(),
PostponeSendingUntil = m.PostponeSendingUntil.Value.ToLocalTime(),
m.NoEmailCount,
m.TransientFailureCount,
m.LastSuccessfulSyncCompletedOn,
m.LastSyncError,
m.LastSyncErrorCode,
m.LastSyncErrorCount,
m.LastSyncErrorMachineName,
m.LastSyncErrorOccurredOn,
m.LastSyncStartedOn,
m.ItemsFailedForLastSync,
m.ItemsProcessedForLastSync
}

Here is a sample result.

sss

Based my tests, here is how I think it works for incoming email. The same is true for outbound emails. I am assuming that the mailboxes are not disabled and email address is approved:

  1. Async service starts to process the email at time specified in the PostponeMailboxProcessingUntil field
  2. Async service starts processing the inbound emails and creates email records in CRM
  3. In order to set the PostponeMailboxProcessingUntil and the ReceivingPostponedUntil , for the next run async service looks into the NoEmailCount field field. The NoEmailCount field is incremented every time the async service attempts to process email. If it did create some emails in CRM, the counter is reset to zero.

Based on my tests, I believe that if no email is sent/received in 15 minutes, the polling period is fixed at around 5 minutes. Here are my test results.

sss-tests

As you can see based on my tests, the polling interval gradually starts climbing when there are no emails to process and hovers around 5 minutes, after 15 minutes of inactivity i.e. no email records were created on CRM and/or no emails were sent from CRM.

If you have a queue setup in CRM, and forward an email to the queue, 5 minutes is a very long time to wait for the async service to pickup the email from Exchange. I saw couple of posts about how this polling interval can be changed by modifying the “Minimum Polling Interval” field.

polling-interval

The recommended value for this field is “0” and even when I changed this to 1 minute, I haven’t seen it influence the “ProcessingLastAttemptedOn“. I tried to change “PostponeMailboxProcessingUntil” and “ReceivingPostponedUntil” for the mailbox though the SDK, but since they are readonly fields, they cannot be changed. All the tests were carried out in OnPrem CRM2015, and so I cannot confirm whether this is the same behaviour in CRM2016 or Dynamics 365.

EDIT (18/11/2016): This 5 minute polling interval seems to be a “by-design” behavior. Thank you Andre Margono for sending this info. Here is what the MSDN article says:

When synchronization by using server-side synchronization occurs, the process is dynamic and unique for each user’s mailbox. The synchronization algorithm ensures that mailboxes are synced according to dynamic parameters such as the number of email messages and the activity within the mailbox. Normally, email synchronization occurs every 5 minutes. When a mailbox has many email messages, the interval can be reduced dynamically to 2 minutes. If the mailbox is less active, the interval can be increased up to 12 minutes. Generally speaking, you can assume that a mailbox will be synced at least once every 12 minutes. Note that you can’t manually synchronize records through server-side synchronization and when you track email (Track button), this occurs immediately.

tl;dr; If your CRM application is not processing any email (sending/receiving) in 15 minutes, the polling interval will eventually be fixed at around 5 minutes. When an email is received, async service will start processing more frequently, but the polling period will slowly climb with no-activity until it gets to 5 minutes.

Redux: RetrieveMultiple on large datasets

In trying to optimise, RetrieveMultiple performance I thought I had hit a sweet-spot with using TPL, with RetrieveMultiple request spanning across pages, but after talking to @maustinjones, I now have come to the conclusion, that is probably not the best approach. The dataset size I tested on: contact and account, are large, but not significantly large. After looking at the SQL Trace logs, I have decided to stick with the pagingcookie approach, while dealing with large datasets.

In order to understand, why the pagingcookie approach is more efficient, it is important to understand, how your FetchXml is translated into SQL.

Scenario:
Retrieve all records from account. Total Number of records is 10,177

Approach 1 – TPL with RetrieveMultiple parallised across pages Approach1As you can see, the SQL query that is generated, is not exactly optimal. There are only 10,177 records in total, but 45,533 records (5001 + 10001 + 10177 + 10177 + 10177) are retrieved across 5 requests. The correct number of records (10,177)  are returned only after further in-memory processing.

Approach 2 – PagingCookie

Approach210,179 records (5001+5001+177) are retrieved across three requests, and further processing cuts down the result record set to the correct count of 10,177.

Random Notes:
Looking at how paging cookie translated to SQL, reminds me of this post from PFE Dynamics guys -> Dangers of Guid.NewGuid. I believe that the entity primary key is generated using NEWSEQUENTIALID, and hence it is not advisable to insert records directly into the MSCRM database. This is also probably how “> uniqueidentifier” query is optimised, as the Guid is in sequence, and the clustered index on the primary key can be efficiently used in this query.


Conclusion:
So, compared to the TPL approach, paging cookie approach

  1. Reduces the number of queries sent to the database
  2. Respects the count size specified in the fetchxml
  3. Does less in-memory processing
  4. Reduces the chances of encountering a database lock

In spite of all these obversations, and understanding how efficient the pagingcookie queries are, I still cannot comprehend how TPL is faster, even though it inefficiently queries the database.

Credit: Big thank you to @maustinjones in helping me understand the performance pitfills when using TPL, in this scenario.

RetrieveMultiple performance on large datasets

EDIT (21/05/15): After DMing with @maustinjones, I have added stat for paged RetrieveMultiple as well. I have updated the post to include this.

EDIT (22/05/15): Updated post to reflect correct behaviour of no-lock when parallelising. Thanks @maustinjones. Please also refer to the follow up post -> https://dreamingincrm.com/2015/05/22/redux-retrievemultiple-on-large-datasets/ on why paging cookie is the recommended approach.

EDIT (18/07/16): Please watch this -> https://www.youtube.com/watch?v=-N20I5mo7RU video from the Dynamics CRM product group, which goes into much more detail about performance techniques to optimise RetrieveMultiple queries.

EDIT (25/01/17): Eric Hagen has written a series of post covering RetrieveMultiple performance and internals. Reading these posts is time well spent -> https://community.dynamics.com/crm/b/dynamicscrmsupportblog/archive/2017/01/11/retrievemultiple-performance

MSCRM limits the maximum result size to 5000 records, when you use the RetrieveMultiple request. In order to overcome this and retrieve all the records, you’ll have to use paging. PFEDynamics team have also released an open-source library called PFE Xrm Core Library that utilises Task Parallel Library.

There is also ExecuteMultipleRequest that you can use to send bunch of requests in one go, and process the responses. This just wanted to document by findings, about the performance of these options:

  1. Just using Parallel.ForEach
  2. ExecuteMultipleRequest
  3. PFE Xrm Core
  4. Paged RetrieveMultiple

Run 1 (Batch size 1000):
I have not included the paged RetrieveMultiple in this scenario, as it is too slow, and I am too impatient to wait for it to complete.

Run 2 (Batch size 5000):

Run 3 (Batch size 5000):

Observations:

  1. Reducing the page size causes a drop in performance. I got better performance with 5000 records in one page, than 1000 records
  2. Just using Parallel.ForEach is faster than PFE Xrm Core (some overhead in instantiating the service perhaps?)
  3. ExecuteMultipleRequest is significantly slower than Parallel.ForEach

Best Practice tip from PFE Dynamics:

If you read the source code for the ParallelServiceProxy class, the help text for the RetrieveMultiple method actually has this note:

IMPORTANT!! This approach should only be used if multiple queries for varying entity types are required or the result set can’t be expressed in a single query. In the latter case,
leverage NoLock=true where possible to reduce database contention.

So ideally, you should be using RetrieveMultiple method in ParallelProxy, to retrieve records from multiple entities. But, in my case I am retrieving from the same entity, hence I am using the no-lock option the fetchxml.

DMed with @maustinjones, and here is an important point to consider, before choosing a technique. 

Query pages shouldn’t be retrieved in parallel. only separate queries altogether

When you use TPL to submit a bunch of RetrieveMultiple requests in one go, there is a chance of encountering database lock issues, even though you use no-lock. So, it is OK to parallelise queries that fetch from different entities, but using PagingCookie is the recommended approach.

To be honest, I did not experience any lock issues during the test, but standard CRM load was not simulated during the test. I am not sure what would have happened, if users were viewing contacts/running reports on contacts, and I am running these tests at the same time.
Database contentention and locking is applicable only in scenarios, where no-lock is not feasible.

TPL just blitzes the other techniques in these tests, but I’ll have to test this further to see if lock contention issues arise, in much larger datasets. Do consider PFE Xrm Core, if you are looking to use RetrieveMultiple from multiple entities. It is written by the PFE Dynamics guys and is open-source.

Best Practice tips from msdn regarding TPL:

This article in msdn is a must read if you are thinking of using TPL -> https://msdn.microsoft.com/en-us/library/dd997392%28v=vs.110%29.aspx.

I had performance issues with TPL, as I had Console.WriteLine inside the lambda for ForEach, and it basically killed the performance. The article told me why, and so it is quite an useful read.

Note about the code:
I quickly wrote this just to test the performance, and not production quality in mind. If you read the code, you can see I am sending 20 requests at a time, inside an infinite loop to retrieve all the pages. I exit the loop when any one of the pages, returns no result. I haven’t tried ExecuteMultipleRequest inside the Parallel.ForEach. It would be interesting to see what the performance will be in that case.

Code: Here is the code, I tested with, for reference.

using Microsoft.Xrm.Sdk;
using Microsoft.Xrm.Sdk.Messages;
using Microsoft.Xrm.Sdk.Query;
using Microsoft.Xrm.Tooling.Connector;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
using System.Xml.Linq;

namespace Playground
{
    class Program
    {
        static void Main(string[] args)
        {
            RetrieveAllPages(@"<fetch count='5000' no-lock='true' page='{0}' >
                                  <entity name='account' >
                                    <attribute name='name' />
                                    <order attribute='accountid' />
                                    <filter>
                                        <condition attribute='statecode' operator='eq' value='0' />
                                    </filter>
                                   </entity>
                                </fetch>");

            RetrieveAllPages(@"<fetch count='5000' no-lock='true' page='{0}' >
                                  <entity name='contact' >
                                    <attribute name='fullname' />
                                    <order attribute='contactid' />
                                    <filter>
                                      <condition attribute='statecode' operator='eq' value='0' />
                                    </filter>
                                  </entity>
                                </fetch>");
        }

        static void RetrieveAllPages(string fetchXml)
        {
            Console.WriteLine("Entity: {0}\n", XElement.Parse(fetchXml).Element("entity").Attribute("name").Value);
            var organisationSvcManager = new OrganizationServiceManager(new Uri("CRMURL"),
                "[username]", "[password]", "[domain]");
            var crmSvc = new CrmServiceClient(new NetworkCredential("[username]", "[password]", "[domain]"), Microsoft.Xrm.Tooling.Connector.AuthenticationType.AD, "CRMURL", "80", "[Org]");
            var isDone = false;
            int pageStart = 1, pageEnd = 20;
            if (crmSvc.IsReady)
            {
                var stopWatch = new Stopwatch();
                stopWatch.Start();
                var results = new List<Entity>();

                while (!isDone)
                {
                    IDictionary<string, QueryBase> queries = new Dictionary<string, QueryBase>();
                    //Console.WriteLine("Page {0} to {1}", pageStart, pageEnd);
                    for (int i = pageStart; i <= pageStart + 19; i++)
                    {
                        queries.Add(i.ToString(), new FetchExpression(string.Format(fetchXml, i)));

                    };
                    pageStart = pageEnd + 1;
                    pageEnd = pageStart + 19;
                    var threadNum = 1;
                    Parallel.ForEach(queries,
                        (query) =>
                        {
                            if (!isDone)
                            {
                                var pageResults = crmSvc.OrganizationServiceProxy.RetrieveMultiple(query.Value).Entities;
                                if (!pageResults.Any())
                                {
                                    isDone = true;
                                    return;
                                }
                                results.AddRange(pageResults);
                            }
                        });
                }

                Console.WriteLine(results.Count);
                stopWatch.Stop();
                Console.WriteLine("Parallel: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(results.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                var resultCount = 0;
                pageStart = 1;
                pageEnd = 20;
                stopWatch.Restart();
                isDone = false;
                while (!isDone)
                {
                    //Console.WriteLine("Page {0} to {1}", pageStart, pageEnd);
                    var executeMultipleRequest = new ExecuteMultipleRequest
                    {
                        Requests = new OrganizationRequestCollection(),
                        Settings = new ExecuteMultipleSettings() { ContinueOnError = true, ReturnResponses = true }
                    };

                    for (var i = pageStart; i <= pageStart + 19; i++)
                    {
                        executeMultipleRequest.Requests.Add(new RetrieveMultipleRequest()
                        {
                            Query = new FetchExpression(string.Format(fetchXml, i))
                        });
                    };
                    var executeMultipleResponses = ((ExecuteMultipleResponse)crmSvc.OrganizationServiceProxy.Execute(executeMultipleRequest)).Responses.ToList();
                    executeMultipleResponses.ForEach(x =>
                    {
                        var executeMultipleResultEntities = ((RetrieveMultipleResponse) x.Response).EntityCollection.Entities;
                        if (!executeMultipleResultEntities.Any())
                        {
                            isDone = true;
                            return;
                        }
                        resultCount = resultCount + executeMultipleResultEntities.Count;
                    });
                    pageStart = pageEnd + 1;
                    pageEnd = pageStart + 19;
                }
                Console.WriteLine(resultCount);
                stopWatch.Stop();
                Console.WriteLine("ExecuteMultiple: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(results.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                IDictionary<string, QueryBase> entityQuery = new Dictionary<string, QueryBase>();
                entityQuery.Add("result", new FetchExpression(fetchXml));
                stopWatch.Restart();
                var queryResult = organisationSvcManager.ParallelProxy.RetrieveMultiple(entityQuery, true,
                    (pair, exception) => Console.WriteLine("{0} throwed {1}", pair.Key, exception.Message));
                stopWatch.Stop();
                Console.WriteLine(queryResult.Values.First().Entities.Count);
                Console.WriteLine("PFE.Xrm: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(queryResult.Values.First().Entities.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                resultCount = 0;
                var pagedResults = new EntityCollection();
                var fetchToQuery = new FetchXmlToQueryExpressionRequest {FetchXml = string.Format(fetchXml, 1)};
                var retrieveQuery = ((FetchXmlToQueryExpressionResponse)crmSvc.OrganizationServiceProxy.Execute(fetchToQuery)).Query;
                retrieveQuery.PageInfo = new PagingInfo {PageNumber = 1};
                stopWatch.Restart();
                do
                {
                    pagedResults = crmSvc.OrganizationServiceProxy.RetrieveMultiple(retrieveQuery);
                    resultCount += pagedResults.Entities.Count;
                    retrieveQuery.PageInfo.PageNumber++;
                    retrieveQuery.PageInfo.PagingCookie = pagedResults.PagingCookie;
                } while (pagedResults.MoreRecords);
                Console.WriteLine(resultCount);
                stopWatch.Stop();
                Console.WriteLine("Paged RetrieveMultiple: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(resultCount / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");
            }
        }
    }
}