In trying to optimise, RetrieveMultiple performance I thought I had hit a sweet-spot with using TPL, with RetrieveMultiple request spanning across pages, but after talking to @maustinjones, I now have come to the conclusion, that is probably not the best approach. The dataset size I tested on: contact and account, are large, but not significantly large. After looking at the SQL Trace logs, I have decided to stick with the pagingcookie approach, while dealing with large datasets.
In order to understand, why the pagingcookie approach is more efficient, it is important to understand, how your FetchXml is translated into SQL.
Retrieve all records from account. Total Number of records is 10,177
Approach 1 – TPL with RetrieveMultiple parallised across pages As you can see, the SQL query that is generated, is not exactly optimal. There are only 10,177 records in total, but 45,533 records (5001 + 10001 + 10177 + 10177 + 10177) are retrieved across 5 requests. The correct number of records (10,177) are returned only after further in-memory processing.
Approach 2 – PagingCookie
10,179 records (5001+5001+177) are retrieved across three requests, and further processing cuts down the result record set to the correct count of 10,177.
Looking at how paging cookie translated to SQL, reminds me of this post from PFE Dynamics guys -> Dangers of Guid.NewGuid. I believe that the entity primary key is generated using NEWSEQUENTIALID, and hence it is not advisable to insert records directly into the MSCRM database. This is also probably how “> uniqueidentifier” query is optimised, as the Guid is in sequence, and the clustered index on the primary key can be efficiently used in this query.
So, compared to the TPL approach, paging cookie approach
- Reduces the number of queries sent to the database
- Respects the count size specified in the fetchxml
- Does less in-memory processing
- Reduces the chances of encountering a database lock
In spite of all these obversations, and understanding how efficient the pagingcookie queries are, I still cannot comprehend how TPL is faster, even though it inefficiently queries the database.
Credit: Big thank you to @maustinjones in helping me understand the performance pitfills when using TPL, in this scenario.
[…] Redux: RetrieveMultiple on large datasets […]