A distributed transaction will be opened for both queries below, and a row will be updated remotely in a given table. However, they will behave internally in a slightly different manner:
UPDATE C
SET FIELD1 = 'TEST'
FROM OPENQUERY([REMOTESERVER], 'SELECT * FROM TESTDB.DBO.TB_TEST WHERE ID_TEST = 3528051') C
UPDATE [REMOTESERVER].TESTDB.DBO.TB_TEST
SET FIELD1 = 'TEST'
WHERE ID_TEST = 3528051
Mousing over both execution plans shows that the first query will have an estimated execution cost aproximatelly 660% lower than the second one.
It appears that the first query will use the seek predicate to optimize the search for the row to be updated on the remote server, and data transfer between sql server instances will be limited to the row(s) fulfilling the search conditions, whereas the second
query will scan over the whole table, transfer its entire data over the OLEDB connection, and then filter the results to find the matching rows.
My main question is... why? Why the engine has to be so primitive as to not submit a predicate over the linked server and generate a physical operation accordingly?
Secondly, for a table containing 6 million rows of aproximately 1 kb each, the latter query simply will not run. It will take almost a minute just to generate its execution plan. Neither servers are under heavy loads, there is no resource contention and transactions
are either operating under read uncommitted or snapshot isolation levels, which eliminates locking performance issues. Even with a full table scan and transfer over the network, based on the size of the table and network speed, this operation shouldn't take
more than a minute to complete.
Capturing events in SQL Profiler for the remote session opened by the linked server, I've noticed there is an endless execution of the sp_cursorfetch stored procedure. The parameters differ for each execution, which happens about 100 times per second:
exec sp_cursorfetch 180150007,16,16402,100
Seaching about this on the internet, I found that when running DML statements remotely, data is queried in a row-by-row basis, which is much slower than doing a simple select. Basically, we are not only facing a very ineficient querying algorithm, which scans
and transfers the whole table regardless of predicates, it also fetches data only one row at a time instead of submiting batches.
Is this really the intended behavior? The performance of these queries are so poor that I believe we are looking at a SQL Server bug.