Comparing / Synchronizing floats - differ in last bit(s)?
kipb7
Posts: 20
We have some float data that I wish to synch between two machines (same SQL Server 2008, same Windows 7, same Xeon hardware).
I synchronized the table but then after a refresh compare, it showed as different again. Looking at the values, I see they differ in the lowest bit. Is there an option to "round" floats before comparison, or else sync in a way that preserves the full binary precision?
Here's an example script on SQL 2008 showing the problem of float to string to float giving a different value.
DECLARE @xs VARCHAR(30), @xb VARBINARY(30), @xd DECIMAL(30,20), @xf FLOAT(53)
SET @xf = 0.85195833333333266000
SELECT @xs = CONVERT(VARCHAR(30),@xf), @xd = CONVERT(DECIMAL(30,20), @xf)
SELECT @xb = CONVERT(VARBINARY(30), @xf)
PRINT @xf
PRINT @xs
PRINT @xd
PRINT @xb
DECLARE @xf2 FLOAT(53)
SET @xf2 = CONVERT(FLOAT, @xd)
PRINT @xf-@xf2
PRINT CONVERT(VARBINARY, @xf2)
Output:
0.851958
0.851958
0.85195833333333254000
0x3FEB433E1F671523
1.11022e-016
0x3FEB433E1F671522
I synchronized the table but then after a refresh compare, it showed as different again. Looking at the values, I see they differ in the lowest bit. Is there an option to "round" floats before comparison, or else sync in a way that preserves the full binary precision?
Here's an example script on SQL 2008 showing the problem of float to string to float giving a different value.
DECLARE @xs VARCHAR(30), @xb VARBINARY(30), @xd DECIMAL(30,20), @xf FLOAT(53)
SET @xf = 0.85195833333333266000
SELECT @xs = CONVERT(VARCHAR(30),@xf), @xd = CONVERT(DECIMAL(30,20), @xf)
SELECT @xb = CONVERT(VARBINARY(30), @xf)
PRINT @xf
PRINT @xs
PRINT @xd
PRINT @xb
DECLARE @xf2 FLOAT(53)
SET @xf2 = CONVERT(FLOAT, @xd)
PRINT @xf-@xf2
PRINT CONVERT(VARBINARY, @xf2)
Output:
0.851958
0.851958
0.85195833333333254000
0x3FEB433E1F671523
1.11022e-016
0x3FEB433E1F671522
Comments
Thanks for your post. There isn't any way to round floats pre comparison, but there's nothing to stop you creating a view and casting the datatypes to round the value, and then comparing the views. Assuming you have also excluded the underlying tables, syncing the views will update the underlying base tables
HTH!
Pete
Red Gate Software Ltd
+44 (0)870 160 0037 ext. 8569
1 866 RED GATE ext. 8569
Now I wonder how the SQL Server value is converted to string to use in the script. In the programming language Python, they have two ways to show a string from a float - human readable (str) and suitable for re-creating the original value (repr).
Perhaps the Red Gate "float-to-string" logic is not quite right for this application -- there is another "decimal" representation of a given float that, when converted to binary, will give the original binary value.
Do a Google search for: burger indiana.edu float
"Printing Floating-Point Numbers Quickly and Accurately"
It is the 'Accurately' that may be the issue here.
Thanks for the help!
SQL's parsing of floats is buggy:
DECLARE @f1 FLOAT, @f2 FLOAT
SET @f1 = 0.8519583333333327
SET @f2 = 0.85195833333333271
IF @f1>@f2 PRINT 'Greater' --True - prints
(Tested in SQL 2008 and SQL 2005)
SQL Data Compare synch scripts could work around this defect by changing
UPDATE TABLEX SET [Val]=0.85 WHERE [KEYX]=3
to
UPDATE TABLEX SET [Val]=0.85E0 WHERE [KEYX]=3
The E0 worked in all 10,000 cases I had where SQL Data Compare did not correctly synchronize, none of which had E in the float already.
Having SQL Data Compare Synchronization so it actually synchronizes would save me a lot of time in working around this.
Here is my report to Microsoft:
--Some floats parse inconsistently:
DECLARE @A FLOAT, @B FLOAT
SET @A = 0.8519583333333327
SET @B = 0.85195833333333271
IF @A>@B PRINT 'Greater' --prints
PRINT @A-@B --positive (expect 0 or negative)
One can work around this either by appending 'E0' onto the end of the float or by using CONVERT(float,'0.85195833333333271'). The numbers above are equal using either. These also helped with about 10,000 other such cases.
Here is the reply:
"The behavior you are seeing is by design. When you specify a floating point value without the scientific notation, it is actually considered a decimal value not float. And you are then converting the decimal value to float which can result in rounding errors depending on the value.
"In your case, the value for @A is represented by decimal(16, 16) and @B is represented by decimal(17, 17). As you discovered, you
should use the scientific notation for specifying the floating point value.
"Btw, you can determine the type of a constant by doing something like below:
declare @v sql_variant = 0.8519583333333327;
select sql_variant_property(@v, 'BaseType') ,
sql_variant_property(@v,'Precision') ,
sql_variant_property(@v, 'Scale');
-- Umachandar, SQL Programmability Team"
Thanks for all your posts on this, I'll be sure to log all this in our systems for consideration in future releases of the tool.
Thanks again!
Pete
Red Gate Software Ltd
+44 (0)870 160 0037 ext. 8569
1 866 RED GATE ext. 8569
Previously, it was suggested that appending e0 onto the end of a float would make SQL process it as a float instead of as a decimal. Here is a case where that does not work, and it may be an error in how Red Gate generates the decimal value for a float. Also create this table on another database - but without the INSERT.
Now use SQL Data Compare and see what value is generated:
INSERT INTO [dbo].[FloatTest] ([PK], [Fraction]) VALUES (1, 0.0768922324036398)
--This rounds to 16 places, it seems, or 15 significant figures.
As in the original report, this is the incorrect binary value for the float. Here's the new part: adding on e0 is ALSO INCORRECT:
INSERT INTO [dbo].[FloatTest] ([PK], [Fraction]) VALUES (1, 0.0768922324036398e0)
So, would it be possible to get Red Gate Data Compare's Synchronize to synchronize floats correctly? We have been manually appending e0 onto floats for some months now when synchronizing. Now we see this is not enough.
We have fairly recently purchased several licences for SQL Data Compare Professional v9 because we needed a supported version for a server installation to run as as part of a batch process.
On the same platform with the same source and destinations SQL Data Compare v9 sees differences that v7 (7.1.0.245 standard) successfully doesn't (as far as we are concerned there aren't any differences).
This is a very serious setback to this project and I ask
1. How can we emulate the behaviour of v7 when using v9 (and quite quickly).
2. If v7 can get it right then obviously there is a way because Redgate have done it before. v7 can still do it on the same platforms and database engine versions (same servers and dbs ) so it's not an environment or MS thing.
3. We have paid for support so hopefully this will result in some more positive posts than the ones I've read so far.
many thanks in advance...
m.
- Use a view on each side that rounds the floats and then sync the views (I haven't done this but it was suggested by RedGate) instead of sync'ing the tables. The tables won't be exactly the same but they will look the same to RG.
- Use decimal(20,16) etc. instead of float - but this requires that you know the range of your values in advance, and that you can stand the possibly larger data size. We've switched a lot of our floats to decimal to make sync easier and some other benefits.
- Use a non-RedGate solution to sync the tables having floats. We've done this, in SQL 2008 using the MERGE statement over the network and it runs much faster than RG so we use it for the bulk of the sync and then run RG to see if we missed anything. This only works if SQL Server A can read data from SQL Server B, whereas RG only requires that the client machine running SQL Data Compare be able to see both A and B. (We use checksum of each end's table to know if it is worth doing the merge.)
- Wave your hands and yell about how floats are only "approximate" and how can you expect them to blah blah and you shouldn't be using floats anyway and every programmer knows.... This is what we got so far including from Microsoft, whose sync tools have the same problems.
- Use RedGate SQL Data Compare 9 as-is. This requires that you tolerate your floats being very close and not exact. When you re-compare, they'll look again like a sync is required, though the specific values look to be identical when printed in decimal. It is likely that this is OK for your application. "do sync; trust result."
I hope you'll keep pushing RG to come up with a way to sync floats correctly. It's not easy as they need to have a way for A to output a string that B will convert to the identical binary value inside. I spent a few hours at it and couldn't come up with one that worked in every case, where the string was generated by a SQL expression. Converting binary float inside RG to an 18- or 20-digit decimal float will probably work. 17 significant digits isn't enough for all floats and that's all SQL Server will do.