private static int _CalculateMinimumCellEditDistance(List<List<int>> matrix, string oldString, int oldCharacterIndex, string newString, int newCharacterIndex)
{
var oldChar = oldString[oldCharacterIndex];
var newChar = newString[newCharacterIndex];
var localEditDistance = oldChar == newChar ? 0 : 1;
var leftArrayValue = localEditDistance + _GetCellValue(matrix, oldCharacterIndex - 1, newCharacterIndex);
var topLeftArrayValue = localEditDistance + _GetCellValue(matrix, oldCharacterIndex - 1, newCharacterIndex - 1);
var topArrayValue = localEditDistance + _GetCellValue(matrix, oldCharacterIndex, newCharacterIndex - 1);
var minEditDistance = Math.Min(Math.Min(leftArrayValue, topLeftArrayValue), topArrayValue);
return minEditDistance;
}
That makes perfect sense if you don't think about the left and top edges. Here we have to introduce a lil finagling to calculate those cells appropriately. Basically the following code means that if you're in the top row looking up the edit distance in column number oldCharacterIndex, the only way to have gotten there is to have made (oldCharacterIndex + 1) count deletions. By the same token, if we're in the left-most column looking for the value in row number newCharacterIndex, the only way to have gotten there is to have made (newCharacterIndex + 1) count insertions.
private static int _GetCellValue(List<List<int>> matrix, int oldCharacterIndex, int newCharacterIndex)
{
if (oldCharacterIndex < 0 && newCharacterIndex < 0)
return 0;
if (oldCharacterIndex < 0)
return newCharacterIndex + 1;
if (newCharacterIndex < 0)
return oldCharacterIndex + 1;
return matrix[oldCharacterIndex][newCharacterIndex];
}
Now I'd like to point out, I'm 100% certain that this is **A** "right" way but my code might have bugs so take it with a grain of salt. The main purpose of this post is to show another way of tackling a different dynamic programming problem and the thought process I went through.
Once I've scored each pair of characters I then set out to traverse the given matrix by going to the smallest edits possible. I also created a class to handle tracking the deletes and edits that would be needed along the way. This class can be used to play back my changes to tell someone how they could change the old string into the newer one. Also, since dynamic programming calls for us to work through the problem backwards, this class reorders the steps to read correctly.
Now, once again we create a matrix of how well each cell performs according to the above method. In our context a cell's value should essentially be thought of as the number of steps required to convert the character at the cell's column index into the character at cell's row index. In my calculation I'm also taking into account shifting the character into the appropriate position. That means that every cell has a been scored in such a way that they all solved their own version of the problem as though they were the only portion of it. What's beautiful about this is that by bringing these results together we can find the optimal string editing just by following the smallest values backwards from the lower right cell of our matrix. Basically you start at what has to be your end point and assume that you took the optimal path there from the previous three optional cells (the cell immediately to the left, to the top-left, and to the top). If you have a tie between top or left just choose whichever you like. In the case of a tie between the top-left and one of the others though I default to the top-left since that's roughly half the number of cells (and thus string edit steps) we'll need to process.
Quick Example
To help pull together all of this here's an edit distance matrix from my program for diffing "bcdeffghi" and "abcdefghij". This string is useful because it exercises all 3 diff operations (insert, swap, and delete):
b c d e f f g h i
a 1 2 3 4 5 6 7 8 9
b 1 2 3 4 5 6 7 8 9
c 2 1 2 3 4 5 6 7 8
d 3 2 1 2 3 4 5 6 7
e 4 3 2 1 2 3 4 5 6
f 5 4 3 2 1 2 3 4 5
g 6 5 4 3 2 2 2 3 4
h 7 6 5 4 3 3 3 2 3
i 8 7 6 5 4 4 4 3 2
j 9 8 7 6 5 5 5 4 3
The lower right hand corner (3 in this case) is our optimal edit distance. This means the string on the top of the table can be transformed into the string on the left of the table with exactly three separate steps.
As always here's the link to my code (It's in the StringMatching folder along with the pattern matcher):