Get Levenshtein DistanceFinding the longest unique sub-string in a stringShortest path navigation across a grid using Best First SearchExtracting information from spreadsheet into 12 tables (one per month)Tabulate schedule/calendar for database importCreate quantile slices from dataframe columnsMinimum number of basic operations to convert str1 into str2Message classification with Levenshtein DistanceExtracting the first two components of a version numberFilling a string at minimal costString reversing x times
Linear Path Optimization with Two Dependent Variables
High voltage LED indicator 40-1000 VDC without additional power supply
What is a clear way to write a bar that has an extra beat?
Convert two switches to a dual stack, and add outlet - possible here?
Uncaught TypeError: 'set' on proxy: trap returned falsish for property Name
When a company launches a new product do they "come out" with a new product or do they "come up" with a new product?
Revoked SSL certificate
Operational amplifier as a comparator at high frequency
What typically incentivizes a professor to change jobs to a lower ranking university?
Client team has low performances and low technical skills: we always fix their work and now they stop collaborate with us. How to solve?
Can I ask the recruiters in my resume to put the reason why I am rejected?
Do I have a twin with permutated remainders?
Why does Kotter return in Welcome Back Kotter?
Watching something be written to a file live with tail
How does one intimidate enemies without having the capacity for violence?
Filter any system log file by date or date range
meaning of に in 本当に?
how to check a propriety using r studio
Today is the Center
How can bays and straits be determined in a procedurally generated map?
How much of data wrangling is a data scientist's job?
Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?
Roll the carpet
Mutually beneficial digestive system symbiotes
Get Levenshtein Distance
Finding the longest unique sub-string in a stringShortest path navigation across a grid using Best First SearchExtracting information from spreadsheet into 12 tables (one per month)Tabulate schedule/calendar for database importCreate quantile slices from dataframe columnsMinimum number of basic operations to convert str1 into str2Message classification with Levenshtein DistanceExtracting the first two components of a version numberFilling a string at minimal costString reversing x times
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
Logic based of this site which had a VB example I used as a base.
function Get-LevenshteinDistance
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Was working on a PPCG question where I had to roll my own LD calculator. Since I saw future value in this I made a more professional looking version. Since it is iterating over every character in a string for every other character in the other string it really starts to show its performance issues when they start to get over 50 chars in size.
The model it uses supports deletions, insertions and substitutions. If I wanted to change the weight of substitutions I suppose I would need to edit this line and support a parameter for it?
$cost=if($compareCharacter -eq $differenceCharacter)0else1
Never really played with multidimensional arrays before and I am not sure how I could improve on this. I know there are a few things I could do to make the code more brief and still functional however I opted for this solution as I liked the improved readability.
performance strings powershell edit-distance
$endgroup$
add a comment |
$begingroup$
Logic based of this site which had a VB example I used as a base.
function Get-LevenshteinDistance
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Was working on a PPCG question where I had to roll my own LD calculator. Since I saw future value in this I made a more professional looking version. Since it is iterating over every character in a string for every other character in the other string it really starts to show its performance issues when they start to get over 50 chars in size.
The model it uses supports deletions, insertions and substitutions. If I wanted to change the weight of substitutions I suppose I would need to edit this line and support a parameter for it?
$cost=if($compareCharacter -eq $differenceCharacter)0else1
Never really played with multidimensional arrays before and I am not sure how I could improve on this. I know there are a few things I could do to make the code more brief and still functional however I opted for this solution as I liked the improved readability.
performance strings powershell edit-distance
$endgroup$
$begingroup$
itteratingshould probably beiterating
$endgroup$
– yuri
May 30 '17 at 12:08
$begingroup$
@yuri probably :)
$endgroup$
– Matt
May 30 '17 at 12:08
add a comment |
$begingroup$
Logic based of this site which had a VB example I used as a base.
function Get-LevenshteinDistance
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Was working on a PPCG question where I had to roll my own LD calculator. Since I saw future value in this I made a more professional looking version. Since it is iterating over every character in a string for every other character in the other string it really starts to show its performance issues when they start to get over 50 chars in size.
The model it uses supports deletions, insertions and substitutions. If I wanted to change the weight of substitutions I suppose I would need to edit this line and support a parameter for it?
$cost=if($compareCharacter -eq $differenceCharacter)0else1
Never really played with multidimensional arrays before and I am not sure how I could improve on this. I know there are a few things I could do to make the code more brief and still functional however I opted for this solution as I liked the improved readability.
performance strings powershell edit-distance
$endgroup$
Logic based of this site which had a VB example I used as a base.
function Get-LevenshteinDistance
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Was working on a PPCG question where I had to roll my own LD calculator. Since I saw future value in this I made a more professional looking version. Since it is iterating over every character in a string for every other character in the other string it really starts to show its performance issues when they start to get over 50 chars in size.
The model it uses supports deletions, insertions and substitutions. If I wanted to change the weight of substitutions I suppose I would need to edit this line and support a parameter for it?
$cost=if($compareCharacter -eq $differenceCharacter)0else1
Never really played with multidimensional arrays before and I am not sure how I could improve on this. I know there are a few things I could do to make the code more brief and still functional however I opted for this solution as I liked the improved readability.
performance strings powershell edit-distance
performance strings powershell edit-distance
edited May 30 '17 at 12:08
Matt
asked May 30 '17 at 11:53
MattMatt
2,127826
2,127826
$begingroup$
itteratingshould probably beiterating
$endgroup$
– yuri
May 30 '17 at 12:08
$begingroup$
@yuri probably :)
$endgroup$
– Matt
May 30 '17 at 12:08
add a comment |
$begingroup$
itteratingshould probably beiterating
$endgroup$
– yuri
May 30 '17 at 12:08
$begingroup$
@yuri probably :)
$endgroup$
– Matt
May 30 '17 at 12:08
$begingroup$
itterating should probably be iterating$endgroup$
– yuri
May 30 '17 at 12:08
$begingroup$
itterating should probably be iterating$endgroup$
– yuri
May 30 '17 at 12:08
$begingroup$
@yuri probably :)
$endgroup$
– Matt
May 30 '17 at 12:08
$begingroup$
@yuri probably :)
$endgroup$
– Matt
May 30 '17 at 12:08
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):
Get-LevenshteinDistanceOrig(=1×, for comparison): the original function body (Copy&Paste);Get-LevenshteinDistanceOrig+(>3×, still insufficient): removed all time-expensiveWrite-Verboseoutput;Get-LevenshteinDistanceTry1(>50×, a substantial improvement): time-expensive pipeline toMeasure-Objectreplaced by[math]::Min(n,m)static method (see .NET system.math class). The following are only minor advancement steps:Get-LevenshteinDistanceTry2(~1×..2×): slight algorithm renovation:- instead calculating the cost (
$costvariable): if compared characters are equal then computing and comparing values$cellAbove,$cellLeftand$cellUpperLeftis useless as we already know the necessary value; - added: if strings are equal the LD is known (zero) and we can stop;
- instead calculating the cost (
Get-LevenshteinDistance(~2×): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.
Comparison table: .CR164518test.ps1 | Format-Table -AutoSize
cmdlet similarity runtime (ms) lengths LD
------ ---------- ------------ ------- --
Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
Get-LevenshteinDistanceTry2 equal .1795 106 106 0
Get-LevenshteinDistance equal .1539 106 106 0
Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
Get-LevenshteinDistance stochastic 13.4852 106 102 79
Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
Get-LevenshteinDistance similar 8.4234 106 102 4
Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
Get-LevenshteinDistance different 13.8924 106 102 106
Column explanation:
cmdlet- function namesimilarity- in brief "like" similarity of input stringsruntime (ms)-TotalMillisecondslengths- lengths of input strings, space delimitedLD- the Levenshtein Distance of input strings
Comparison script 164518test.ps1:
Function TestLD
param ([string]$Similarity = '')
$aux = 0
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation 'Start' -PercentComplete $aux
$cmdletTails = 'Orig','Orig+','Try1','Try2',''
foreach ( $cmdletTail in $cmdletTails )
$cmdlet = "Get-LevenshteinDistance$cmdletTail"
$scriptBlock = $LevenshteinDistance = & $cmdlet `
-CompareString $strC -DifferenceString $strD
$TimeSpan = Measure-Command -Expression $scriptBlock
$aux += [int](100 / $cmdletTails.Count)
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation $cmdlet -PercentComplete $aux
[PSCustomObject]@
'cmdlet' = $cmdlet
'similarity' = $Similarity
'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
ToString('#####.###0',$cultureinfo).
PadLeft(10))
'lengths' = "$($strC.Length) $($strD.Length)"
'LD' = $LevenshteinDistance
. D:PShellCR164518.ps1 # reload the functions
$cultureinfo = [cultureinfo]::InvariantCulture # my one is different
$strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
$strD=$strC
TestLD -Similarity 'equal'
$strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
TestLD -Similarity 'stochastic'
$strC='a' * $strC.Length
$strD='a' * $strD.Length
TestLD -Similarity 'similar'
$strD='x' * $strD.Length
TestLD -Similarity 'different'
The main script 164518.ps1 (functions are defined here in the reverse order):
function Get-LevenshteinDistance
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix as a (rectangular) jagged array
# $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
$comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
for($index=0; $index -le $differenceStringLength; $index++)
# Create row
# $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
$comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
# Populate the first column
$comparisonMatrix[$index][0]=$index
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0][$index]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list (a variable for the only use)
# $differenceCharacter = $differenceString[$rowIndex-1] # the only use
# Calculate the cost
if($compareCharacter -ceq $differenceString[$rowIndex-1])
$comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
else
# $cost = 1 # unneeded variable for the only use
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1
# # The cell diagonally above and to the left plus the cost ↓
$cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength][$compareStringLength]
function Get-LevenshteinDistanceTry2
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
# Calculate the cost
if($compareCharacter -ceq $differenceCharacter)
$comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
else Measure-Object -Minimum).Minimum
$comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceTry1
#[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
##Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
# $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig+
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:
permanently: change-ceqto-eqin their two occurrences, or
ad hoc, in a particular case: use.ToUpper()or.ToLower()functions, e.g. as
Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
# or
Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f164518%2fget-levenshtein-distance%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):
Get-LevenshteinDistanceOrig(=1×, for comparison): the original function body (Copy&Paste);Get-LevenshteinDistanceOrig+(>3×, still insufficient): removed all time-expensiveWrite-Verboseoutput;Get-LevenshteinDistanceTry1(>50×, a substantial improvement): time-expensive pipeline toMeasure-Objectreplaced by[math]::Min(n,m)static method (see .NET system.math class). The following are only minor advancement steps:Get-LevenshteinDistanceTry2(~1×..2×): slight algorithm renovation:- instead calculating the cost (
$costvariable): if compared characters are equal then computing and comparing values$cellAbove,$cellLeftand$cellUpperLeftis useless as we already know the necessary value; - added: if strings are equal the LD is known (zero) and we can stop;
- instead calculating the cost (
Get-LevenshteinDistance(~2×): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.
Comparison table: .CR164518test.ps1 | Format-Table -AutoSize
cmdlet similarity runtime (ms) lengths LD
------ ---------- ------------ ------- --
Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
Get-LevenshteinDistanceTry2 equal .1795 106 106 0
Get-LevenshteinDistance equal .1539 106 106 0
Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
Get-LevenshteinDistance stochastic 13.4852 106 102 79
Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
Get-LevenshteinDistance similar 8.4234 106 102 4
Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
Get-LevenshteinDistance different 13.8924 106 102 106
Column explanation:
cmdlet- function namesimilarity- in brief "like" similarity of input stringsruntime (ms)-TotalMillisecondslengths- lengths of input strings, space delimitedLD- the Levenshtein Distance of input strings
Comparison script 164518test.ps1:
Function TestLD
param ([string]$Similarity = '')
$aux = 0
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation 'Start' -PercentComplete $aux
$cmdletTails = 'Orig','Orig+','Try1','Try2',''
foreach ( $cmdletTail in $cmdletTails )
$cmdlet = "Get-LevenshteinDistance$cmdletTail"
$scriptBlock = $LevenshteinDistance = & $cmdlet `
-CompareString $strC -DifferenceString $strD
$TimeSpan = Measure-Command -Expression $scriptBlock
$aux += [int](100 / $cmdletTails.Count)
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation $cmdlet -PercentComplete $aux
[PSCustomObject]@
'cmdlet' = $cmdlet
'similarity' = $Similarity
'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
ToString('#####.###0',$cultureinfo).
PadLeft(10))
'lengths' = "$($strC.Length) $($strD.Length)"
'LD' = $LevenshteinDistance
. D:PShellCR164518.ps1 # reload the functions
$cultureinfo = [cultureinfo]::InvariantCulture # my one is different
$strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
$strD=$strC
TestLD -Similarity 'equal'
$strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
TestLD -Similarity 'stochastic'
$strC='a' * $strC.Length
$strD='a' * $strD.Length
TestLD -Similarity 'similar'
$strD='x' * $strD.Length
TestLD -Similarity 'different'
The main script 164518.ps1 (functions are defined here in the reverse order):
function Get-LevenshteinDistance
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix as a (rectangular) jagged array
# $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
$comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
for($index=0; $index -le $differenceStringLength; $index++)
# Create row
# $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
$comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
# Populate the first column
$comparisonMatrix[$index][0]=$index
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0][$index]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list (a variable for the only use)
# $differenceCharacter = $differenceString[$rowIndex-1] # the only use
# Calculate the cost
if($compareCharacter -ceq $differenceString[$rowIndex-1])
$comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
else
# $cost = 1 # unneeded variable for the only use
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1
# # The cell diagonally above and to the left plus the cost ↓
$cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength][$compareStringLength]
function Get-LevenshteinDistanceTry2
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
# Calculate the cost
if($compareCharacter -ceq $differenceCharacter)
$comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
else Measure-Object -Minimum).Minimum
$comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceTry1
#[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
##Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
# $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig+
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:
permanently: change-ceqto-eqin their two occurrences, or
ad hoc, in a particular case: use.ToUpper()or.ToLower()functions, e.g. as
Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
# or
Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()
$endgroup$
add a comment |
$begingroup$
Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):
Get-LevenshteinDistanceOrig(=1×, for comparison): the original function body (Copy&Paste);Get-LevenshteinDistanceOrig+(>3×, still insufficient): removed all time-expensiveWrite-Verboseoutput;Get-LevenshteinDistanceTry1(>50×, a substantial improvement): time-expensive pipeline toMeasure-Objectreplaced by[math]::Min(n,m)static method (see .NET system.math class). The following are only minor advancement steps:Get-LevenshteinDistanceTry2(~1×..2×): slight algorithm renovation:- instead calculating the cost (
$costvariable): if compared characters are equal then computing and comparing values$cellAbove,$cellLeftand$cellUpperLeftis useless as we already know the necessary value; - added: if strings are equal the LD is known (zero) and we can stop;
- instead calculating the cost (
Get-LevenshteinDistance(~2×): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.
Comparison table: .CR164518test.ps1 | Format-Table -AutoSize
cmdlet similarity runtime (ms) lengths LD
------ ---------- ------------ ------- --
Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
Get-LevenshteinDistanceTry2 equal .1795 106 106 0
Get-LevenshteinDistance equal .1539 106 106 0
Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
Get-LevenshteinDistance stochastic 13.4852 106 102 79
Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
Get-LevenshteinDistance similar 8.4234 106 102 4
Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
Get-LevenshteinDistance different 13.8924 106 102 106
Column explanation:
cmdlet- function namesimilarity- in brief "like" similarity of input stringsruntime (ms)-TotalMillisecondslengths- lengths of input strings, space delimitedLD- the Levenshtein Distance of input strings
Comparison script 164518test.ps1:
Function TestLD
param ([string]$Similarity = '')
$aux = 0
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation 'Start' -PercentComplete $aux
$cmdletTails = 'Orig','Orig+','Try1','Try2',''
foreach ( $cmdletTail in $cmdletTails )
$cmdlet = "Get-LevenshteinDistance$cmdletTail"
$scriptBlock = $LevenshteinDistance = & $cmdlet `
-CompareString $strC -DifferenceString $strD
$TimeSpan = Measure-Command -Expression $scriptBlock
$aux += [int](100 / $cmdletTails.Count)
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation $cmdlet -PercentComplete $aux
[PSCustomObject]@
'cmdlet' = $cmdlet
'similarity' = $Similarity
'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
ToString('#####.###0',$cultureinfo).
PadLeft(10))
'lengths' = "$($strC.Length) $($strD.Length)"
'LD' = $LevenshteinDistance
. D:PShellCR164518.ps1 # reload the functions
$cultureinfo = [cultureinfo]::InvariantCulture # my one is different
$strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
$strD=$strC
TestLD -Similarity 'equal'
$strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
TestLD -Similarity 'stochastic'
$strC='a' * $strC.Length
$strD='a' * $strD.Length
TestLD -Similarity 'similar'
$strD='x' * $strD.Length
TestLD -Similarity 'different'
The main script 164518.ps1 (functions are defined here in the reverse order):
function Get-LevenshteinDistance
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix as a (rectangular) jagged array
# $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
$comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
for($index=0; $index -le $differenceStringLength; $index++)
# Create row
# $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
$comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
# Populate the first column
$comparisonMatrix[$index][0]=$index
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0][$index]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list (a variable for the only use)
# $differenceCharacter = $differenceString[$rowIndex-1] # the only use
# Calculate the cost
if($compareCharacter -ceq $differenceString[$rowIndex-1])
$comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
else
# $cost = 1 # unneeded variable for the only use
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1
# # The cell diagonally above and to the left plus the cost ↓
$cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength][$compareStringLength]
function Get-LevenshteinDistanceTry2
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
# Calculate the cost
if($compareCharacter -ceq $differenceCharacter)
$comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
else Measure-Object -Minimum).Minimum
$comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceTry1
#[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
##Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
# $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig+
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:
permanently: change-ceqto-eqin their two occurrences, or
ad hoc, in a particular case: use.ToUpper()or.ToLower()functions, e.g. as
Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
# or
Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()
$endgroup$
add a comment |
$begingroup$
Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):
Get-LevenshteinDistanceOrig(=1×, for comparison): the original function body (Copy&Paste);Get-LevenshteinDistanceOrig+(>3×, still insufficient): removed all time-expensiveWrite-Verboseoutput;Get-LevenshteinDistanceTry1(>50×, a substantial improvement): time-expensive pipeline toMeasure-Objectreplaced by[math]::Min(n,m)static method (see .NET system.math class). The following are only minor advancement steps:Get-LevenshteinDistanceTry2(~1×..2×): slight algorithm renovation:- instead calculating the cost (
$costvariable): if compared characters are equal then computing and comparing values$cellAbove,$cellLeftand$cellUpperLeftis useless as we already know the necessary value; - added: if strings are equal the LD is known (zero) and we can stop;
- instead calculating the cost (
Get-LevenshteinDistance(~2×): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.
Comparison table: .CR164518test.ps1 | Format-Table -AutoSize
cmdlet similarity runtime (ms) lengths LD
------ ---------- ------------ ------- --
Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
Get-LevenshteinDistanceTry2 equal .1795 106 106 0
Get-LevenshteinDistance equal .1539 106 106 0
Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
Get-LevenshteinDistance stochastic 13.4852 106 102 79
Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
Get-LevenshteinDistance similar 8.4234 106 102 4
Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
Get-LevenshteinDistance different 13.8924 106 102 106
Column explanation:
cmdlet- function namesimilarity- in brief "like" similarity of input stringsruntime (ms)-TotalMillisecondslengths- lengths of input strings, space delimitedLD- the Levenshtein Distance of input strings
Comparison script 164518test.ps1:
Function TestLD
param ([string]$Similarity = '')
$aux = 0
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation 'Start' -PercentComplete $aux
$cmdletTails = 'Orig','Orig+','Try1','Try2',''
foreach ( $cmdletTail in $cmdletTails )
$cmdlet = "Get-LevenshteinDistance$cmdletTail"
$scriptBlock = $LevenshteinDistance = & $cmdlet `
-CompareString $strC -DifferenceString $strD
$TimeSpan = Measure-Command -Expression $scriptBlock
$aux += [int](100 / $cmdletTails.Count)
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation $cmdlet -PercentComplete $aux
[PSCustomObject]@
'cmdlet' = $cmdlet
'similarity' = $Similarity
'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
ToString('#####.###0',$cultureinfo).
PadLeft(10))
'lengths' = "$($strC.Length) $($strD.Length)"
'LD' = $LevenshteinDistance
. D:PShellCR164518.ps1 # reload the functions
$cultureinfo = [cultureinfo]::InvariantCulture # my one is different
$strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
$strD=$strC
TestLD -Similarity 'equal'
$strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
TestLD -Similarity 'stochastic'
$strC='a' * $strC.Length
$strD='a' * $strD.Length
TestLD -Similarity 'similar'
$strD='x' * $strD.Length
TestLD -Similarity 'different'
The main script 164518.ps1 (functions are defined here in the reverse order):
function Get-LevenshteinDistance
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix as a (rectangular) jagged array
# $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
$comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
for($index=0; $index -le $differenceStringLength; $index++)
# Create row
# $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
$comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
# Populate the first column
$comparisonMatrix[$index][0]=$index
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0][$index]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list (a variable for the only use)
# $differenceCharacter = $differenceString[$rowIndex-1] # the only use
# Calculate the cost
if($compareCharacter -ceq $differenceString[$rowIndex-1])
$comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
else
# $cost = 1 # unneeded variable for the only use
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1
# # The cell diagonally above and to the left plus the cost ↓
$cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength][$compareStringLength]
function Get-LevenshteinDistanceTry2
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
# Calculate the cost
if($compareCharacter -ceq $differenceCharacter)
$comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
else Measure-Object -Minimum).Minimum
$comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceTry1
#[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
##Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
# $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig+
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:
permanently: change-ceqto-eqin their two occurrences, or
ad hoc, in a particular case: use.ToUpper()or.ToLower()functions, e.g. as
Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
# or
Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()
$endgroup$
Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):
Get-LevenshteinDistanceOrig(=1×, for comparison): the original function body (Copy&Paste);Get-LevenshteinDistanceOrig+(>3×, still insufficient): removed all time-expensiveWrite-Verboseoutput;Get-LevenshteinDistanceTry1(>50×, a substantial improvement): time-expensive pipeline toMeasure-Objectreplaced by[math]::Min(n,m)static method (see .NET system.math class). The following are only minor advancement steps:Get-LevenshteinDistanceTry2(~1×..2×): slight algorithm renovation:- instead calculating the cost (
$costvariable): if compared characters are equal then computing and comparing values$cellAbove,$cellLeftand$cellUpperLeftis useless as we already know the necessary value; - added: if strings are equal the LD is known (zero) and we can stop;
- instead calculating the cost (
Get-LevenshteinDistance(~2×): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.
Comparison table: .CR164518test.ps1 | Format-Table -AutoSize
cmdlet similarity runtime (ms) lengths LD
------ ---------- ------------ ------- --
Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
Get-LevenshteinDistanceTry2 equal .1795 106 106 0
Get-LevenshteinDistance equal .1539 106 106 0
Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
Get-LevenshteinDistance stochastic 13.4852 106 102 79
Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
Get-LevenshteinDistance similar 8.4234 106 102 4
Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
Get-LevenshteinDistance different 13.8924 106 102 106
Column explanation:
cmdlet- function namesimilarity- in brief "like" similarity of input stringsruntime (ms)-TotalMillisecondslengths- lengths of input strings, space delimitedLD- the Levenshtein Distance of input strings
Comparison script 164518test.ps1:
Function TestLD
param ([string]$Similarity = '')
$aux = 0
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation 'Start' -PercentComplete $aux
$cmdletTails = 'Orig','Orig+','Try1','Try2',''
foreach ( $cmdletTail in $cmdletTails )
$cmdlet = "Get-LevenshteinDistance$cmdletTail"
$scriptBlock = $LevenshteinDistance = & $cmdlet `
-CompareString $strC -DifferenceString $strD
$TimeSpan = Measure-Command -Expression $scriptBlock
$aux += [int](100 / $cmdletTails.Count)
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation $cmdlet -PercentComplete $aux
[PSCustomObject]@
'cmdlet' = $cmdlet
'similarity' = $Similarity
'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
ToString('#####.###0',$cultureinfo).
PadLeft(10))
'lengths' = "$($strC.Length) $($strD.Length)"
'LD' = $LevenshteinDistance
. D:PShellCR164518.ps1 # reload the functions
$cultureinfo = [cultureinfo]::InvariantCulture # my one is different
$strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
$strD=$strC
TestLD -Similarity 'equal'
$strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
TestLD -Similarity 'stochastic'
$strC='a' * $strC.Length
$strD='a' * $strD.Length
TestLD -Similarity 'similar'
$strD='x' * $strD.Length
TestLD -Similarity 'different'
The main script 164518.ps1 (functions are defined here in the reverse order):
function Get-LevenshteinDistance
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix as a (rectangular) jagged array
# $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
$comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
for($index=0; $index -le $differenceStringLength; $index++)
# Create row
# $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
$comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
# Populate the first column
$comparisonMatrix[$index][0]=$index
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0][$index]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list (a variable for the only use)
# $differenceCharacter = $differenceString[$rowIndex-1] # the only use
# Calculate the cost
if($compareCharacter -ceq $differenceString[$rowIndex-1])
$comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
else
# $cost = 1 # unneeded variable for the only use
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1
# # The cell diagonally above and to the left plus the cost ↓
$cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength][$compareStringLength]
function Get-LevenshteinDistanceTry2
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
# Calculate the cost
if($compareCharacter -ceq $differenceCharacter)
$comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
else Measure-Object -Minimum).Minimum
$comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceTry1
#[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
##Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
# $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig+
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
function Get-LevenshteinDistanceOrig
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)
# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"
# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index
# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index
# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]
# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]
Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
Write-Verbose "Cost: $cost"
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"
# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"
# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"
# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft
# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]
Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:
permanently: change-ceqto-eqin their two occurrences, or
ad hoc, in a particular case: use.ToUpper()or.ToLower()functions, e.g. as
Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
# or
Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()
answered 7 mins ago
JosefZJosefZ
251210
251210
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f164518%2fget-levenshtein-distance%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
itteratingshould probably beiterating$endgroup$
– yuri
May 30 '17 at 12:08
$begingroup$
@yuri probably :)
$endgroup$
– Matt
May 30 '17 at 12:08