Redux: RetrieveMultiple on large datasets

In trying to optimise, RetrieveMultiple performance I thought I had hit a sweet-spot with using TPL, with RetrieveMultiple request spanning across pages, but after talking to @maustinjones, I now have come to the conclusion, that is probably not the best approach. The dataset size I tested on: contact and account, are large, but not significantly large. After looking at the SQL Trace logs, I have decided to stick with the pagingcookie approach, while dealing with large datasets.

In order to understand, why the pagingcookie approach is more efficient, it is important to understand, how your FetchXml is translated into SQL.

Scenario:
Retrieve all records from account. Total Number of records is 10,177

Approach 1 – TPL with RetrieveMultiple parallised across pages As you can see, the SQL query that is generated, is not exactly optimal. There are only 10,177 records in total, but 45,533 records (5001 + 10001 + 10177 + 10177 + 10177) are retrieved across 5 requests. The correct number of records (10,177) are returned only after further in-memory processing.

Approach 2 – PagingCookie

10,179 records (5001+5001+177) are retrieved across three requests, and further processing cuts down the result record set to the correct count of 10,177.

Random Notes:
Looking at how paging cookie translated to SQL, reminds me of this post from PFE Dynamics guys -> Dangers of Guid.NewGuid. I believe that the entity primary key is generated using NEWSEQUENTIALID, and hence it is not advisable to insert records directly into the MSCRM database. This is also probably how “> uniqueidentifier” query is optimised, as the Guid is in sequence, and the clustered index on the primary key can be efficiently used in this query.

Conclusion:
So, compared to the TPL approach, paging cookie approach

Reduces the number of queries sent to the database
Respects the count size specified in the fetchxml
Does less in-memory processing
Reduces the chances of encountering a database lock

In spite of all these obversations, and understanding how efficient the pagingcookie queries are, I still cannot comprehend how TPL is faster, even though it inefficiently queries the database.

Credit: Big thank you to @maustinjones in helping me understand the performance pitfills when using TPL, in this scenario.

One comment

Dreaming in CRM & Power Platform

My thoughts on Power Platform, JavaScript, C# and Software Development. Sometimes, I post when I am half-asleep

Redux: RetrieveMultiple on large datasets

One comment

Leave a comment Cancel reply

Redux: RetrieveMultiple on large datasets

Share this:

Related

One comment

Leave a comment Cancel reply