

$sourcecheck = "SELECT SUM($checksum_sret) AS CheckSumValue FROM $checksum_sourcedatabase.dbo.$article WITH(NOLOCK)" $checksum_dret = Get-CheckSumColumn -readserver $checksum_destinationserver -readquery $checksum_destination -options 0

$checksum_sret = Get-CheckSumColumn -readserver $checksum_sourceserver -readquery $checksum_source -options 0 $scon.ConnectionString = "Data Source=$readserver Initial Catalog=$distributiondb Integrated Security=true Connection Timeout=0 " INNER JOIN MSdistribution_agents da ON da.publisher_id = s.publisher_id INNER JOIN master.sysservers srv ON srv.srvid = s.publisher_id INNER JOIN master.sysservers ss ON s.subscriber_id = ss.srvid INNER JOIN MSsubscriptions s ON s.publisher_db = dist.publisher_db INNER JOIN MSpublications p ON a.publication_id = p.publication_id ) AS dist ON dist.publication_id = a.publication_id , 'SELECT TOP 1 COLUMN_NAME FROM ' + s.subscriber_db + '.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = ''' + a.article + ''' AND DATA_TYPE IN (''numeric'',''int'',''decimal'',''float'')' AS CheckSumQueryDestination , 'SELECT TOP 1 COLUMN_NAME FROM ' + s.publisher_db + '.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = ''' + a.article + ''' AND DATA_TYPE IN (''numeric'',''int'',''decimal'',''float'')' AS CheckSumQuerySource , $destination_serverĭENSE_RANK() OVER (ORDER BY (dist.publisher + ' ' + s.publisher_db + ' ' + a.article + ' ' + ss.srvname + ' ' + s.subscriber_db + ' ' + a.destination_object)) ID Write-Warning "Get-CheckSumColumn ($options)" $scon.ConnectionString = "Data Source=$readserver Initial Catalog=master Integrated Security=true " Those cautions being written, we can use an aggregate function - like SUM - to perform a check if we suspect there may be an issue with a particular table. Theīottom line: this approach will - from a statistical standpoint - possibly create false alerts, and you'll unintentionally be training people to ignore them, which is far worse than relying on mismatches or the command and transaction backlog (both of which together are precise and can be used with averages and deviations overall).įor ETL flows, this script will help since most ETL flows occur at specific times and should finish at specific times in these cases, the below script makes sense to use. In an example where I measure a sum after the Hang Seng closes - so in theory, replication should have finished, I could just as easily measure the RowCount property instead of performing an expensive sum. You will have to not only know the average and deviation, but you'd need to know these values for different points of a load. Is that a cause for concern? This entire approach could open a statistical nightmare because in any given moment with replication, it could be off by 100. Even then, you should be very careful about how you will approach it and be willing to accept the reality that you may see some delays in your data.įor instance, suppose I take the sum of two columns and the destination is different by 100, and we know replication is still running. I highly advise against using aggregates for regular analysis of data involved in replication, unless you know specific down times where these will not interrupt any replication or reporting process.
#Aggregate date up to a date sql code#
No perfect way exists for monitoring replication flow, but one must be aware of the advantages and disadvantages of every approach, as their approach may create a problem of latency (I've seen many replication reports from developers that do this and when they complain about latency, I point out that bad code guarantees latency).

Imagine I used a hard count, where I actually performed a COUNT(*), locking the table - not only could I possibly delay replication in my check, the timing of one may finish before the other - creating the possibility of false alerts. A few thousand tables can be checked within two minutes, unless we see network issues where it struggles to connect to either the source or destination server (and this would be a red flag for replication too - if that script struggles to connect to a server, replication will as well). NET's RowCount property for a reason: it's fast and the measurement of the source and destination will be similar in time. In the tip Using PowerShell for SQL Server Replication Tracking, I use. We can use an aggregate function on a numeric column to double verify, but let's first look at our row count report because, depending on how we're running it, we may be guaranteeing some problems.
