Version 3 of DiffEngineX (the compare Excel xlsx workbooks software utility) has been released and is available for download.
This is a paid upgrade. If you have purchased within the past two years, you are eligible for a discounted upgrade price. Click here to purchase an upgrade license.
If you have purchased over two years ago or are upgrading from version 1, click here to purchase at the regular upgrade price.
By paying for an upgrade you are helping us produce the next generation of our software products.
Version 2.29 of DiffEngineX was released today. It has better error handling when the two Microsoft Excel spreadsheets being compared contain cells with problematic text containing invalid characters. The problem this release fixes only occurred when Microsoft Excel 2007 (rather than Excel 2003, 2010 or 2013) was installed.
We have recently released a new style Office app for Excel 2013 and Word 2013. It will not work with Office 2010 or earlier.
In Excel 2013, it will find the differences between the text in two selected cells. It fetches the cell value. The formula can be manually copied and pasted into the app if desired. It is potentially useful in reporting the differences between two long formulae or simply lengthy blocks of cell text.
In Word 2013, it reports the plain text differences between two different document selections.
Although it runs inside Office, it can act as a simple clipboard diff tool.
Please note it uses new Microsoft technology and only works in the desktop versions Excel 2013 and Word 2013. Potentially it could be rather useful in Word Web App when Microsoft update it to support Office apps.
It can be found in the Microsoft Store.
Of course if you want a full featured Excel diff tool that compares all the formulae, values, comments, names and VBA macros for practically all versions of Excel, there is always DiffEngineX.
In the screenshot above we can see that the text of the paragraph has been colored blue as it has been moved up in the document. Additionally the sentences have been reordered. Indicating this, the background color has been changed to alternately light blue and yellow. Not many compare utilities are capable of this. At some point we may release this as a stand-alone utility running outside of Microsoft Office.
DiffEngineX is an executable (*.exe) file that compares Excel workbooks. It is usually run from its user-interface, but it can also be invoked with command line arguments. As such multiple Excel file comparisons can be automated in a *.bat or *.cmd file. Alternatively DiffEngineX can be simply called from the Windows Command Prompt.
The simplest invocation of DiffEngineX is given below.
"C:\Program Files\Florencesoft\DiffEngineX\DiffEngineX" /inbook1:"a.xlsx" /inbook2:"b.xlsx" /report:r.xlsx
If you are using Excel 2003 or earlier, please replace .xlsx with .xls.
It is also possible to call DiffEngineX from Microsoft .NET code using Microsoft's Process class.
A full list of DiffEngineX exit codes and command line arguments is found in our detailed help file.
The following links give more information.
Finding out how your document has changed from one version to another is an important topic, especially if it has modifications made to it by several different authors. This is especially the case for Microsoft Excel documents, which are used for financial planning, complex calculations and often contain Visual Basic for Applications macros. You may want to change one value between otherwise identical Microsoft Excel workbooks and see if there are any significant numeric differences. DiffEngineX is ideal for this as it can ignore differences below a user specified value or percentage change.
If you want to compare a plain text file, you have at your disposal a vast array of tools, including Microsoft WinDiff, which is included as part of Visual Studio. Microsoft Word has a built-in facility to compare an original with a revised document. However from Office '95 to Office 2010, Excel has no intrinsic functionality to find out how one spreadsheet differs from another one.
Excel documents are not straightforward to compare. The spreadsheet cells can either contain a formula (which performs a calculation based on the value of other cells) or a constant (text, number, date...). If it contains a formula, there is a choice of whether the formula itself should be compared or its calculated value (i.e. =6*7 or 42). Not only do the visible cells have to be compared, but also defined names, cell comments and the Visual Basic for Applications (VBA) macros.
Some spreadsheets contain data, Visual Basic macros and formulae. Other people use Excel to import and visualize imported database data. If using a difference algorithm to compare the latter, it is important to realize the data needs to be in sorted order first. (Excel's built-in sort functionality is under its Data tab.) Diff algorithms will not reorder rows to find similarities, but are limited to the insertion of blanks to line up the similarities.
What is a difference algorithm?
There isn't just one difference algorithm, but rather a family of them. They all have the aim of finding the longest, in-order run of similarities between two strings of letters, lines of source code or rows. The aim is always to report the minimum number of differences.
Difference algorithms are used in the field of biology to compare protein and DNA sequences for similarities. Often different life processes will use very similar protein sequences with the only differences between them being small insertions and deletions. A difference algorithm will describe the changes necessary to convert one sequence into another.
Standard difference algorithms do have drawbacks. If a run of characters is moved out-of-sequence it won't be recognized as being shared between two documents. Consider the case below in which the string of characters "two three" is compared against "three two". The algorithm matches the longest, in-order run of characters and so correctly spots "three" is common between the two strings, but fails to realize "two" has only been moved back.
If you are comparing database rows imported into Excel, pre-sorting them first will ensure that no similar row between two sets of data is missed. If you have to carry out this step, make sure you save your changes to the file system (File->Save) before opening the workbooks in DiffEngineX.
Microsoft Excel Workbooks
DiffEngineX uses a standard diff algorithm to align the similar rows and columns of worksheets before comparing the cells (and any attached comments). A more sophisticated algorithm is used to compare the Visual Basic macros embedded in the spreadsheets, in so much as lines of code moved up or down (but not changed) will be recognized.
Names have their definitions simply compared against each other for equality. A diff algorithm is not needed here.
DiffEngineX can optionally highlight in red the precise characters that differ between two spreadsheet cells.
Row Alignment by DiffEngineX Alters Cell References of Differences in Workbook Copies
One of the principles of DiffEngineX is not to alter the workbooks being compared. It does however need to insert blank rows in order to align similar rows. Similar rows need to be aligned as they won't be recognized as identical if they have different Excel row numbers. This is why DiffEngineX automatically creates copies of your workbooks and compares those instead.
However the process of row alignment alters the workbook copies. Rows are shifted down and so each difference will have a different cell reference as compared to the original workbook. The difference report produced by DiffEngineX lists each pair of cell differences against their Excel cell reference.
It seems hardly fair to give a cell reference for a difference that refers to a temporary workbook (altered by blank row insertion) produced during the process of comparison instead of the real, unaltered originals.
The solution is that DiffEngineX gives cell references with respect to both the altered workbook copies and the unaltered originals. DiffEngineX provides an option to hyperlink each reported difference back to its corresponding spreadsheet cell.
A free 30 day trial of our software to find the differences between Excel spreadsheets is available.
Checking Use Alignment Plus on DiffEngineX's main user-interface changes from one row/column alignment algorithm to another.
When Align Rows is selected, blank rows are inserted into copies of your Excel spreadsheets in order to align similar rows. When the similarities are paired off with the same row numbers, the minimum number of differences can then be reported. If you have imported database rows in Excel, make sure you use Excel to sort the worksheets first (then Save them from the File menu) before using DiffEngineX to compare them. This is because DiffEngineX will not re-order rows in order to match them up.
What is the difference between the two different row alignment algorithms?
before_a.xlsx and before_b.xlsx show two sheets before row alignment. We can see both contain "Bob" and "Stuart" rows. The row in between them differs.
Use Alignment Plus Is Unchecked
rowalign_a.xlsx and rowalign_b.xlsx show the results of row alignment when Use Alignment Plus is unchecked. DiffEngineX has inserted a blank, yellow row in order to align the "Bob" and "Stuart" rows. The two different rows ("Robin" and "Peter") both end up in row 3 and have been colored because they are different. DiffEngineX has done its job: it has aligned similarities and reported the minimum number of differences.
Use Alignment Plus Is Checked
For those interested in database data, there is the desire that unmatched rows ("Robin" and "Peter") don't end up with the same rows numbers i.e. "Robin" is paired against a blank row in the sheet it is compared against and vice-versa for "Peter". Checking Use Alignment Plus ensures these results are obtained. We can see "Robin" is colored red for a deleted row and "Peter" is colored green for a newly added row in rowalign_plus_a.xlsx and rowalign_plus_b.xlsx.
To summarize, Align Rows aligns similar rows. When Use Alignment Plus is checked, unmatched rows are explicitly paired with blank rows.