Finding out how your document has changed from one version to another is an important topic, especially if it has modifications made to it by several different authors. This is especially the case for Microsoft Excel documents, which are used for financial planning, complex calculations and often contain Visual Basic for Applications macros. You may want to change one value between otherwise identical Microsoft Excel workbooks and see if there are any significant numeric differences. DiffEngineX is ideal for this as it can ignore differences below a user specified value or percentage change.
If you want to compare a plain text file, you have at your disposal a vast array of tools, including Microsoft WinDiff, which is included as part of Visual Studio. Microsoft Word has a built-in facility to compare an original with a revised document. However from Office '95 to Office 2010, Excel has no intrinsic functionality to find out how one spreadsheet differs from another one.
Excel documents are not straightforward to compare. The spreadsheet cells can either contain a formula (which performs a calculation based on the value of other cells) or a constant (text, number, date...). If it contains a formula, there is a choice of whether the formula itself should be compared or its calculated value (i.e. =6*7 or 42). Not only do the visible cells have to be compared, but also defined names, cell comments and the Visual Basic for Applications (VBA) macros.
Some spreadsheets contain data, Visual Basic macros and formulae. Other people use Excel to import and visualize imported database data. If using a difference algorithm to compare the latter, it is important to realize the data needs to be in sorted order first. (Excel's built-in sort functionality is under its Data tab.) Diff algorithms will not reorder rows to find similarities, but are limited to the insertion of blanks to line up the similarities.
What is a difference algorithm?
There isn't just one difference algorithm, but rather a family of them. They all have the aim of finding the longest, in-order run of similarities between two strings of letters, lines of source code or rows. The aim is always to report the minimum number of differences.
Difference algorithms are used in the field of biology to compare protein and DNA sequences for similarities. Often different life processes will use very similar protein sequences with the only differences between them being small insertions and deletions. A difference algorithm will describe the changes necessary to convert one sequence into another.
Standard difference algorithms do have drawbacks. If a run of characters is moved out-of-sequence it won't be recognized as being shared between two documents. Consider the case below in which the string of characters "two three" is compared against "three two". The algorithm matches the longest, in-order run of characters and so correctly spots "three" is common between the two strings, but fails to realize "two" has only been moved back.
If you are comparing database rows imported into Excel, pre-sorting them first will ensure that no similar row between two sets of data is missed. If you have to carry out this step, make sure you save your changes to the file system (File->Save) before opening the workbooks in DiffEngineX.
Microsoft Excel Workbooks
DiffEngineX uses a standard diff algorithm to align the similar rows and columns of worksheets before comparing the cells (and any attached comments). A more sophisticated algorithm is used to compare the Visual Basic macros embedded in the spreadsheets, in so much as lines of code moved up or down (but not changed) will be recognized.
Names have their definitions simply compared against each other for equality. A diff algorithm is not needed here.
DiffEngineX can optionally highlight in red the precise characters that differ between two spreadsheet cells.
Row Alignment by DiffEngineX Alters Cell References of Differences in Workbook Copies
One of the principles of DiffEngineX is not to alter the workbooks being compared. It does however need to insert blank rows in order to align similar rows. Similar rows need to be aligned as they won't be recognized as identical if they have different Excel row numbers. This is why DiffEngineX automatically creates copies of your workbooks and compares those instead.
However the process of row alignment alters the workbook copies. Rows are shifted down and so each difference will have a different cell reference as compared to the original workbook. The difference report produced by DiffEngineX lists each pair of cell differences against their Excel cell reference.
It seems hardly fair to give a cell reference for a difference that refers to a temporary workbook (altered by blank row insertion) produced during the process of comparison instead of the real, unaltered originals.
The solution is that DiffEngineX gives cell references with respect to both the altered workbook copies and the unaltered originals. DiffEngineX provides an option to hyperlink each reported difference back to its corresponding spreadsheet cell.
A free 30 day trial of our software to find the differences between Excel spreadsheets is available.
Checking Use Alignment Plus on DiffEngineX's main user-interface changes from one row/column alignment algorithm to another.
When Align Rows is selected, blank rows are inserted into copies of your Excel spreadsheets in order to align similar rows. When the similarities are paired off with the same row numbers, the minimum number of differences can then be reported. If you have imported database rows in Excel, make sure you use Excel to sort the worksheets first (then Save them from the File menu) before using DiffEngineX to compare them. This is because DiffEngineX will not re-order rows in order to match them up.
What is the difference between the two different row alignment algorithms?
before_a.xlsx and before_b.xlsx show two sheets before row alignment. We can see both contain "Bob" and "Stuart" rows. The row in between them differs.
Use Alignment Plus Is Unchecked
rowalign_a.xlsx and rowalign_b.xlsx show the results of row alignment when Use Alignment Plus is unchecked. DiffEngineX has inserted a blank, yellow row in order to align the "Bob" and "Stuart" rows. The two different rows ("Robin" and "Peter") both end up in row 3 and have been colored because they are different. DiffEngineX has done its job: it has aligned similarities and reported the minimum number of differences.
Use Alignment Plus Is Checked
For those interested in database data, there is the desire that unmatched rows ("Robin" and "Peter") don't end up with the same rows numbers i.e. "Robin" is paired against a blank row in the sheet it is compared against and vice-versa for "Peter". Checking Use Alignment Plus ensures these results are obtained. We can see "Robin" is colored red for a deleted row and "Peter" is colored green for a newly added row in rowalign_plus_a.xlsx and rowalign_plus_b.xlsx.
To summarize, Align Rows aligns similar rows. When Use Alignment Plus is checked, unmatched rows are explicitly paired with blank rows.
Excel add-ins (.xla and .xlam) must first be converted to workbooks (.xls and .xlsm) before using DiffEngineX to compare them. The steps to achieve this are described below.
- Use File->Open to open the file in Excel.
- Click the Developer tab and then the Visual Basic button. (Excel 2003: Tools->Macro->Visual Basic Editor.)
- In the Project Explorer window in the Visual Basic Editor, you will see an entry similar to VBAProjectName(Add-in file name *.xla). Click the plus sign to the left of this. You may need to enter a password at this point.
- Expand Microsoft Excel Objects and click ThisWorkbook.
- From the Tools menu select VBAProjectProperties... .
- Select the Protection tab. Uncheck "Lock project for viewing". Clear both the password and confirm password boxes.
- In the Properties window change the value of IsAddin from True to False. (Make sure ThisWorkbook is selected in the Project Explorer.)
- Select "Close and Return to Microsoft Excel" from the File menu.
- For Excel 2007 and later select File->Save As and then select Excel Macro-Enabled Workbook (*.xlsm). For Excel 2003, select File->Save As and then Microsoft Office Excel Workbook (*.xls).
Once your Excel add-ins (*.xla and *.xlam) are converted to Excel workbooks (*.xlsm or for Excel 2003 *.xls), you can use DiffEngineX to compare them. Click here for instructions on how to do this.
Free 30-Day Trial Software Download
A free 30 day trial of our software to find the differences between Excel spreadsheets is available.
About DiffEngineX's Unique Advantages
DiffEngineX is one of the few Excel workbook compare utilities that generates its reports as Excel workbooks & it is FAST! DiffEngineX generates three workbooks. Two are color highlighted copies of your original spreadsheets - color is used to mark the differences. The third is a cell-by-cell listing of all the differences. Optionally each difference listed in the difference report can be hyperlinked to both the original and colored workbooks. Functionality on the Extras dialog box allows matching rows to be hidden. This allows you to immediately see the differences on the color highlighted workbook copies. With Excel's built-in functionality to allow you to view the two copies side-by-side, you can easily see all the differences in their original format.
Other compare utilities generate their reports in a Windows user interface control. With a large number of differences, performance can be sluggish. Having reports in Excel workbook format is a huge advantage.
Although there are a large number of Excel compare utilities, they mainly only compare the spreadsheet cells. DiffEngineX will also compare the Excel comments, defined names (named ranges) and VBA (Visual Basic for Applications) macros.
Unlike the majority of Excel compare tools, DiffEngineX will ignore numeric differences below a specified threshold. If you are comparing two financial models, you may not want to have reported all the tiny, insignificant differences. DiffEngineX can round numbers up to a certain number of decimal places before comparison or ignore differences below a certain value or percentage.
To date DiffEngineX is the only Excel compare utility we have seen that can optionally highlight Excel differences at the character level. You have the choice between seeing character level differences on either automatically made copies of your workbooks (Main User Interface->Highlight Character Level Differences) or on the difference report (Options->Color in red...).
Spreadsheets used for financial modelling often have large blocks of equivalent formulae. If the block's formula changes, the difference report can end up reporting the same difference tens to hundreds of times. DiffEngineX has the option to group adjacent like changes together and to report the change to a range of cells.
Obviously DiffEngineX allows you to choose whether you compare workbook formulae or their calculated values. White space and the differences between upper and lower-case letters can be optionally ignored.
By default, hidden cells and filtered rows are made visible in the workbook copies before a comparison takes place.
The Extras dialog allows you to choose what colors are used to highlight cell differences and to optionally remove existing workbook color allowing the differences to be clearly shown.
Your original spreadsheets are left unaltered by DiffEngineX as any modifications are always made to automatically made workbook copies.
DiffEngineX has a large number of users and is a fast, scalable and reliable software application.
If you cast your mind back at least six years ago, there were only a small handful of software utilities to compare Excel workbooks. Typically even for average sized workbooks they took at least ten minutes. Worse than that some of them even failed to spot all the differences. All these Excel compare utilities were written as Excel add-ins, meaning they ran inside Microsoft Excel and were invoked from its menu system.
We did not want to do the exact same thing again. We decided to write DiffEngineX as an desktop application running outside of Microsoft Excel. This way it would be a little easier to invoke it from the command line with arguments or from source control systems.
With the benefit of time, we realized an almost arbitrary decision was fortuitous. When Microsoft released new versions of their runtimes and libraries, Excel add-ins broke. When Microsoft released 64-bit operating systems, Excel add-ins broke. DiffEngineX runs outside Excel and it kept on working despite all Microsoft's changes. DiffEngineX has been the compare utility that people turn to when everything else fails to work.
DiffEngineX was one of the first Excel compare utilities to report all the differences between two medium sized Excel workbooks within tens of seconds rather than tens of minutes.
The early versions of DiffEngineX were only intended as a minimal implementation of the classic diff algorithm i.e. software to point out the similarities between two sequences of information. Over time with customer feedback DiffEngineX has grown into the application it is today.
We have found several academic articles referencing DiffEngineX.
DiffEngineX is a true difference tool for Excel workbooks, which correctly handles row/column insertions and deletions and their effects on formulae. It offers two different row alignment algorithms. Use Alignment Plus not only aligns similar rows, but makes sure unmatched rows are always paired with a blank row (in the other workbook) such that deletions and additions can be easily color highlighted and inspected. "Options->Hide Matching Rows, but show 4 rows on either side as context" is a powerful way to see all the differences at once with a little context on either side. DiffEngineX remains unique in its scalability.