RetrieveMultiple performance on large datasets

EDIT (21/05/15): After DMing with @maustinjones, I have added stat for paged RetrieveMultiple as well. I have updated the post to include this.

EDIT (22/05/15): Updated post to reflect correct behaviour of no-lock when parallelising. Thanks @maustinjones. Please also refer to the follow up post -> https://dreamingincrm.com/2015/05/22/redux-retrievemultiple-on-large-datasets/ on why paging cookie is the recommended approach.

EDIT (18/07/16): Please watch this -> https://www.youtube.com/watch?v=-N20I5mo7RU video from the Dynamics CRM product group, which goes into much more detail about performance techniques to optimise RetrieveMultiple queries.

EDIT (25/01/17): Eric Hagen has written a series of post covering RetrieveMultiple performance and internals. Reading these posts is time well spent -> https://community.dynamics.com/crm/b/dynamicscrmsupportblog/archive/2017/01/11/retrievemultiple-performance

MSCRM limits the maximum result size to 5000 records, when you use the RetrieveMultiple request. In order to overcome this and retrieve all the records, you’ll have to use paging. PFEDynamics team have also released an open-source library called PFE Xrm Core Library that utilises Task Parallel Library.

There is also ExecuteMultipleRequest that you can use to send bunch of requests in one go, and process the responses. This just wanted to document by findings, about the performance of these options:

  1. Just using Parallel.ForEach
  2. ExecuteMultipleRequest
  3. PFE Xrm Core
  4. Paged RetrieveMultiple

Run 1 (Batch size 1000):
I have not included the paged RetrieveMultiple in this scenario, as it is too slow, and I am too impatient to wait for it to complete.

Run 2 (Batch size 5000):

Run 3 (Batch size 5000):

Observations:

  1. Reducing the page size causes a drop in performance. I got better performance with 5000 records in one page, than 1000 records
  2. Just using Parallel.ForEach is faster than PFE Xrm Core (some overhead in instantiating the service perhaps?)
  3. ExecuteMultipleRequest is significantly slower than Parallel.ForEach

Best Practice tip from PFE Dynamics:

If you read the source code for the ParallelServiceProxy class, the help text for the RetrieveMultiple method actually has this note:

IMPORTANT!! This approach should only be used if multiple queries for varying entity types are required or the result set can’t be expressed in a single query. In the latter case,
leverage NoLock=true where possible to reduce database contention.

So ideally, you should be using RetrieveMultiple method in ParallelProxy, to retrieve records from multiple entities. But, in my case I am retrieving from the same entity, hence I am using the no-lock option the fetchxml.

DMed with @maustinjones, and here is an important point to consider, before choosing a technique. 

Query pages shouldn’t be retrieved in parallel. only separate queries altogether

When you use TPL to submit a bunch of RetrieveMultiple requests in one go, there is a chance of encountering database lock issues, even though you use no-lock. So, it is OK to parallelise queries that fetch from different entities, but using PagingCookie is the recommended approach.

To be honest, I did not experience any lock issues during the test, but standard CRM load was not simulated during the test. I am not sure what would have happened, if users were viewing contacts/running reports on contacts, and I am running these tests at the same time.
Database contentention and locking is applicable only in scenarios, where no-lock is not feasible.

TPL just blitzes the other techniques in these tests, but I’ll have to test this further to see if lock contention issues arise, in much larger datasets. Do consider PFE Xrm Core, if you are looking to use RetrieveMultiple from multiple entities. It is written by the PFE Dynamics guys and is open-source.

Best Practice tips from msdn regarding TPL:

This article in msdn is a must read if you are thinking of using TPL -> https://msdn.microsoft.com/en-us/library/dd997392%28v=vs.110%29.aspx.

I had performance issues with TPL, as I had Console.WriteLine inside the lambda for ForEach, and it basically killed the performance. The article told me why, and so it is quite an useful read.

Note about the code:
I quickly wrote this just to test the performance, and not production quality in mind. If you read the code, you can see I am sending 20 requests at a time, inside an infinite loop to retrieve all the pages. I exit the loop when any one of the pages, returns no result. I haven’t tried ExecuteMultipleRequest inside the Parallel.ForEach. It would be interesting to see what the performance will be in that case.

Code: Here is the code, I tested with, for reference.

using Microsoft.Xrm.Sdk;
using Microsoft.Xrm.Sdk.Messages;
using Microsoft.Xrm.Sdk.Query;
using Microsoft.Xrm.Tooling.Connector;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
using System.Xml.Linq;

namespace Playground
{
    class Program
    {
        static void Main(string[] args)
        {
            RetrieveAllPages(@"<fetch count='5000' no-lock='true' page='{0}' >
                                  <entity name='account' >
                                    <attribute name='name' />
                                    <order attribute='accountid' />
                                    <filter>
                                        <condition attribute='statecode' operator='eq' value='0' />
                                    </filter>
                                   </entity>
                                </fetch>");

            RetrieveAllPages(@"<fetch count='5000' no-lock='true' page='{0}' >
                                  <entity name='contact' >
                                    <attribute name='fullname' />
                                    <order attribute='contactid' />
                                    <filter>
                                      <condition attribute='statecode' operator='eq' value='0' />
                                    </filter>
                                  </entity>
                                </fetch>");
        }

        static void RetrieveAllPages(string fetchXml)
        {
            Console.WriteLine("Entity: {0}\n", XElement.Parse(fetchXml).Element("entity").Attribute("name").Value);
            var organisationSvcManager = new OrganizationServiceManager(new Uri("CRMURL"),
                "[username]", "[password]", "[domain]");
            var crmSvc = new CrmServiceClient(new NetworkCredential("[username]", "[password]", "[domain]"), Microsoft.Xrm.Tooling.Connector.AuthenticationType.AD, "CRMURL", "80", "[Org]");
            var isDone = false;
            int pageStart = 1, pageEnd = 20;
            if (crmSvc.IsReady)
            {
                var stopWatch = new Stopwatch();
                stopWatch.Start();
                var results = new List<Entity>();

                while (!isDone)
                {
                    IDictionary<string, QueryBase> queries = new Dictionary<string, QueryBase>();
                    //Console.WriteLine("Page {0} to {1}", pageStart, pageEnd);
                    for (int i = pageStart; i <= pageStart + 19; i++)
                    {
                        queries.Add(i.ToString(), new FetchExpression(string.Format(fetchXml, i)));

                    };
                    pageStart = pageEnd + 1;
                    pageEnd = pageStart + 19;
                    var threadNum = 1;
                    Parallel.ForEach(queries,
                        (query) =>
                        {
                            if (!isDone)
                            {
                                var pageResults = crmSvc.OrganizationServiceProxy.RetrieveMultiple(query.Value).Entities;
                                if (!pageResults.Any())
                                {
                                    isDone = true;
                                    return;
                                }
                                results.AddRange(pageResults);
                            }
                        });
                }

                Console.WriteLine(results.Count);
                stopWatch.Stop();
                Console.WriteLine("Parallel: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(results.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                var resultCount = 0;
                pageStart = 1;
                pageEnd = 20;
                stopWatch.Restart();
                isDone = false;
                while (!isDone)
                {
                    //Console.WriteLine("Page {0} to {1}", pageStart, pageEnd);
                    var executeMultipleRequest = new ExecuteMultipleRequest
                    {
                        Requests = new OrganizationRequestCollection(),
                        Settings = new ExecuteMultipleSettings() { ContinueOnError = true, ReturnResponses = true }
                    };

                    for (var i = pageStart; i <= pageStart + 19; i++)
                    {
                        executeMultipleRequest.Requests.Add(new RetrieveMultipleRequest()
                        {
                            Query = new FetchExpression(string.Format(fetchXml, i))
                        });
                    };
                    var executeMultipleResponses = ((ExecuteMultipleResponse)crmSvc.OrganizationServiceProxy.Execute(executeMultipleRequest)).Responses.ToList();
                    executeMultipleResponses.ForEach(x =>
                    {
                        var executeMultipleResultEntities = ((RetrieveMultipleResponse) x.Response).EntityCollection.Entities;
                        if (!executeMultipleResultEntities.Any())
                        {
                            isDone = true;
                            return;
                        }
                        resultCount = resultCount + executeMultipleResultEntities.Count;
                    });
                    pageStart = pageEnd + 1;
                    pageEnd = pageStart + 19;
                }
                Console.WriteLine(resultCount);
                stopWatch.Stop();
                Console.WriteLine("ExecuteMultiple: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(results.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                IDictionary<string, QueryBase> entityQuery = new Dictionary<string, QueryBase>();
                entityQuery.Add("result", new FetchExpression(fetchXml));
                stopWatch.Restart();
                var queryResult = organisationSvcManager.ParallelProxy.RetrieveMultiple(entityQuery, true,
                    (pair, exception) => Console.WriteLine("{0} throwed {1}", pair.Key, exception.Message));
                stopWatch.Stop();
                Console.WriteLine(queryResult.Values.First().Entities.Count);
                Console.WriteLine("PFE.Xrm: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(queryResult.Values.First().Entities.Count / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");

                resultCount = 0;
                var pagedResults = new EntityCollection();
                var fetchToQuery = new FetchXmlToQueryExpressionRequest {FetchXml = string.Format(fetchXml, 1)};
                var retrieveQuery = ((FetchXmlToQueryExpressionResponse)crmSvc.OrganizationServiceProxy.Execute(fetchToQuery)).Query;
                retrieveQuery.PageInfo = new PagingInfo {PageNumber = 1};
                stopWatch.Restart();
                do
                {
                    pagedResults = crmSvc.OrganizationServiceProxy.RetrieveMultiple(retrieveQuery);
                    resultCount += pagedResults.Entities.Count;
                    retrieveQuery.PageInfo.PageNumber++;
                    retrieveQuery.PageInfo.PagingCookie = pagedResults.PagingCookie;
                } while (pagedResults.MoreRecords);
                Console.WriteLine(resultCount);
                stopWatch.Stop();
                Console.WriteLine("Paged RetrieveMultiple: Executed in {0} sec", stopWatch.ElapsedMilliseconds / 1000);
                Console.WriteLine(resultCount / (stopWatch.ElapsedMilliseconds / 1000) + " per sec\n");
            }
        }
    }
}
Advertisements

3 thoughts on “RetrieveMultiple performance on large datasets

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s