Publish a Duplicate Detection Rule: Behind the scenes

When you import a solution you get a nice progress bar that you can use to approximately estimate the finish time. But when you publish a duplicate detection rule, you get nothing. All you can basically do is hit the publish and check back 10-15 minutes later.

Sometimes, it might sit in the publishing state longer that you expect. It can be frustating to just wait, without actually knowing whether the process has frozen or it is actually doing something. The underlying entity DuplicateRule is not searcheable from advanced find, and so there is no other information available except the window saying “Publishing”.

I did some research into what actually happens under the covers when you hit publish. When you hit publish CRM has to:

  1. Create new matchcode tables for the base entity and the matching entity
  2. Populate the matchcode tables with the duplicate criteria specified in the rule

For some reason that I haven’t been able to figure out so far, it seems the matchcode table count is always actual entity count + 1. There is a matchcode record with ObjectId 00000000-0000-0000-0000-000000000000 in each matchcode table. I have no idea why this record exists, but my theory is that this is some sort of bit mask. I maybe totally wrong on this though and this could just be some sort of sequence.

I wrote the script below to confirm the duplicate rules are indeed doing something.

declare @duplicaterules table (rownum int identity(1,1), name varchar(200),matchcodetable varchar(200),entity varchar(200))
declare @i int,@matchcodetable varchar(200),@entity varchar(200),@name varchar(200),@max int
declare @result varchar(max),@query nvarchar(max),@matchcodecount int,@entityrowcount int
insert into @duplicaterules
select name,MatchingEntityMatchCodeTable,MatchingEntityName from DuplicateRule where statecode=0
select @max=@@ROWCOUNT,@i=1
	select @matchcodetable=matchcodetable,@entity=entity,@name=name from @duplicaterules where rownum=@i
	set @query=N'select @matchcodecountOUT=count(*) from ' + @matchcodetable + ' where ObjectId<>''00000000-0000-0000-0000-000000000000'''
	exec sp_executesql @query,N'@matchcodecountOUT int OUTPUT',@matchcodecountOUT=@matchcodecount OUTPUT
	set @query=N'select @entityrowcountOUT=count(*) from ' + @entity
	exec sp_executesql @query,N'@entityrowcountOUT int OUTPUT',@entityrowcountOUT=@entityrowcount OUTPUT
	print 'Duplicate Rule '+@name+' is '+CAST(CAST(@matchcodecount*100.0/@entityrowcount as numeric(5,2)) as varchar(6))+'% complete: '
	print 'Entity: '+@entity+', Matchcode Table: '+@matchcodetable+', Matchcodes: '+CAST(@matchcodecount as varchar(20))+', Entity Rows: '+CAST(@entityrowcount as varchar(20))+CHAR(10)
	set @i=@i+1

When you run the script in SSMS on the MSCRM database, you will see a output that looks something like this.


So now you can quickly check the status in the database, if you are little impatient. Once again, sorry CRMOnline, but I like having access to the actual database.

EDIT (22/09/15): Updated SQL Script slightly to produce the percentages correctly. The previous script was always displaying integer. I also exclude matchcode row with objectid 00000000-0000-0000-0000-000000000000

SQL Script: Entity row count

Run the script below on the MSCRM database to quickly get a count of records, for real and customisable entities (not logical).

declare @query nvarchar(max)
set @query=''
select @query=@query+'select count(*) Records,'''+Label+''' Entity, '+ cast(IsCustomEntity as varchar(1)) +' as IsCustomEntity from '+name+' union all ' from dbo.LocalizedLabelAsIfPublishedView a
inner join dbo.EntityAsIfPublishedLogicalView b on a.ObjectId=b.EntityId
where a.ObjectColumnName='LocalizedName'
and IsLogicalEntity=0 and IsCustomizable=1
set @query='select Entity,Records, case when IsCustomEntity=''0'' then ''No'' else ''Yes'' end as CustomEntity from ('+left(@query,len(@query)-10)+') s where s.Records>0 order by Entity'
exec sp_executesql @query

Gotcha: fn_UTCToLocalTime and fn_LocalTimeToUTC

There are not many scenarios where you would directly operate against an entity base table in CRM. There two scenarios that I can think of are:

  1. Reporting
  2. Data Migration/Integration

Filtered view is the recommended approach when it comes to writing reports for OnPremise CRM server. But if the performance is not really upto the mark, you are forced to either optimise the query by creating indexes/statistics against the correct tables, or directly operate against the base tables.

CRM 2015 Update 1 has introduced the option to create a datetime field as Timezone independent, but in all versions prior to this one, you’ll have to take care of the UTC/Local and Local/UTC conversion, when it comes to datetime fields. CRM does this using fn_UTCToLocalTime and fn_LocalTimeToUTC UDFs that are defined in the MSCRM database.

When you are connected into the SQL database as an Active Directory user, who is not a user in CRM, both these functions will return null. The reason CRM needs to know who the user is to find out the Timezone preferences that are stored again the user’s profile. This is needed to perform the UTC/Local conversion.

In order to understand why this happens, you’ll have to understand how CRM finds out who the current user is.

CRM first tries to match the Active Directory login name of the user against the DomainName column of SystemUser table in MSCRM database. If it can’t find a match, it then tries to match the SystemUserId in the SystemUser table with the context_info on the connection.

Query to understand the behaviour

--user has access to mscrm db, but not a user in CRM
EXECUTE AS user ='CONTOSO\Max.Power'
select dbo.fn_UTCToLocalTime(CreatedOn) from

DECLARE @systemuserid varbinary(128);
select @systemuserid=SystemUserId FROM dbo.systemuser where DomainName='CONTOSO\Alan.Smith'

--impersonate the context of an user who in CRM
SET context_info @systemuserid
select dbo.fn_UTCToLocalTime(CreatedOn) from

--reset context

When you run this query in SSMS, you’ll get something similar to this.

Query Result
Query Result

As you can see the first fn_UTCToLocalTime returned null, but the second one doesn’t, as a different user is impersonated using the context_info. Please be mindful of this behaviour when you are directly working against the base tables.