Get Levenshtein DistanceFinding the longest unique sub-string in a stringShortest path navigation across a grid using Best First SearchExtracting information from spreadsheet into 12 tables (one per month)Tabulate schedule/calendar for database importCreate quantile slices from dataframe columnsMinimum number of basic operations to convert str1 into str2Message classification with Levenshtein DistanceExtracting the first two components of a version numberFilling a string at minimal costString reversing x times

Linear Path Optimization with Two Dependent Variables

High voltage LED indicator 40-1000 VDC without additional power supply

What is a clear way to write a bar that has an extra beat?

Convert two switches to a dual stack, and add outlet - possible here?

Uncaught TypeError: 'set' on proxy: trap returned falsish for property Name

When a company launches a new product do they "come out" with a new product or do they "come up" with a new product?

Revoked SSL certificate

Operational amplifier as a comparator at high frequency

What typically incentivizes a professor to change jobs to a lower ranking university?

Client team has low performances and low technical skills: we always fix their work and now they stop collaborate with us. How to solve?

Can I ask the recruiters in my resume to put the reason why I am rejected?

Do I have a twin with permutated remainders?

Why does Kotter return in Welcome Back Kotter?

Watching something be written to a file live with tail

How does one intimidate enemies without having the capacity for violence?

Filter any system log file by date or date range

meaning of に in 本当に?

how to check a propriety using r studio

Today is the Center

How can bays and straits be determined in a procedurally generated map?

How much of data wrangling is a data scientist's job?

Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?

Roll the carpet

Mutually beneficial digestive system symbiotes



Get Levenshtein Distance


Finding the longest unique sub-string in a stringShortest path navigation across a grid using Best First SearchExtracting information from spreadsheet into 12 tables (one per month)Tabulate schedule/calendar for database importCreate quantile slices from dataframe columnsMinimum number of basic operations to convert str1 into str2Message classification with Levenshtein DistanceExtracting the first two components of a version numberFilling a string at minimal costString reversing x times






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


Logic based of this site which had a VB example I used as a base.



function Get-LevenshteinDistance
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength

# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index


# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]



Was working on a PPCG question where I had to roll my own LD calculator. Since I saw future value in this I made a more professional looking version. Since it is iterating over every character in a string for every other character in the other string it really starts to show its performance issues when they start to get over 50 chars in size.



The model it uses supports deletions, insertions and substitutions. If I wanted to change the weight of substitutions I suppose I would need to edit this line and support a parameter for it?



$cost=if($compareCharacter -eq $differenceCharacter)0else1


Never really played with multidimensional arrays before and I am not sure how I could improve on this. I know there are a few things I could do to make the code more brief and still functional however I opted for this solution as I liked the improved readability.










share|improve this question











$endgroup$











  • $begingroup$
    itterating should probably be iterating
    $endgroup$
    – yuri
    May 30 '17 at 12:08










  • $begingroup$
    @yuri probably :)
    $endgroup$
    – Matt
    May 30 '17 at 12:08

















2












$begingroup$


Logic based of this site which had a VB example I used as a base.



function Get-LevenshteinDistance
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength

# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index


# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]



Was working on a PPCG question where I had to roll my own LD calculator. Since I saw future value in this I made a more professional looking version. Since it is iterating over every character in a string for every other character in the other string it really starts to show its performance issues when they start to get over 50 chars in size.



The model it uses supports deletions, insertions and substitutions. If I wanted to change the weight of substitutions I suppose I would need to edit this line and support a parameter for it?



$cost=if($compareCharacter -eq $differenceCharacter)0else1


Never really played with multidimensional arrays before and I am not sure how I could improve on this. I know there are a few things I could do to make the code more brief and still functional however I opted for this solution as I liked the improved readability.










share|improve this question











$endgroup$











  • $begingroup$
    itterating should probably be iterating
    $endgroup$
    – yuri
    May 30 '17 at 12:08










  • $begingroup$
    @yuri probably :)
    $endgroup$
    – Matt
    May 30 '17 at 12:08













2












2








2





$begingroup$


Logic based of this site which had a VB example I used as a base.



function Get-LevenshteinDistance
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength

# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index


# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]



Was working on a PPCG question where I had to roll my own LD calculator. Since I saw future value in this I made a more professional looking version. Since it is iterating over every character in a string for every other character in the other string it really starts to show its performance issues when they start to get over 50 chars in size.



The model it uses supports deletions, insertions and substitutions. If I wanted to change the weight of substitutions I suppose I would need to edit this line and support a parameter for it?



$cost=if($compareCharacter -eq $differenceCharacter)0else1


Never really played with multidimensional arrays before and I am not sure how I could improve on this. I know there are a few things I could do to make the code more brief and still functional however I opted for this solution as I liked the improved readability.










share|improve this question











$endgroup$




Logic based of this site which had a VB example I used as a base.



function Get-LevenshteinDistance
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength

# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index


# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]



Was working on a PPCG question where I had to roll my own LD calculator. Since I saw future value in this I made a more professional looking version. Since it is iterating over every character in a string for every other character in the other string it really starts to show its performance issues when they start to get over 50 chars in size.



The model it uses supports deletions, insertions and substitutions. If I wanted to change the weight of substitutions I suppose I would need to edit this line and support a parameter for it?



$cost=if($compareCharacter -eq $differenceCharacter)0else1


Never really played with multidimensional arrays before and I am not sure how I could improve on this. I know there are a few things I could do to make the code more brief and still functional however I opted for this solution as I liked the improved readability.







performance strings powershell edit-distance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 30 '17 at 12:08







Matt

















asked May 30 '17 at 11:53









MattMatt

2,127826




2,127826











  • $begingroup$
    itterating should probably be iterating
    $endgroup$
    – yuri
    May 30 '17 at 12:08










  • $begingroup$
    @yuri probably :)
    $endgroup$
    – Matt
    May 30 '17 at 12:08
















  • $begingroup$
    itterating should probably be iterating
    $endgroup$
    – yuri
    May 30 '17 at 12:08










  • $begingroup$
    @yuri probably :)
    $endgroup$
    – Matt
    May 30 '17 at 12:08















$begingroup$
itterating should probably be iterating
$endgroup$
– yuri
May 30 '17 at 12:08




$begingroup$
itterating should probably be iterating
$endgroup$
– yuri
May 30 '17 at 12:08












$begingroup$
@yuri probably :)
$endgroup$
– Matt
May 30 '17 at 12:08




$begingroup$
@yuri probably :)
$endgroup$
– Matt
May 30 '17 at 12:08










1 Answer
1






active

oldest

votes


















0












$begingroup$

Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):




  • Get-LevenshteinDistanceOrig  (= , for comparison): the original function body (Copy&Paste);


  • Get-LevenshteinDistanceOrig+ (> , still insufficient): removed all time-expensive Write-Verbose output;


  • Get-LevenshteinDistanceTry1  (> 50×, a substantial improvement): time-expensive pipeline to Measure-Object replaced by [math]::Min(n,m) static method (see .NET system.math class). The following are only minor advancement steps:


  • Get-LevenshteinDistanceTry2  (~ ..): slight algorithm renovation:

    • instead calculating the cost ($cost variable): if compared characters are equal then computing and comparing values $cellAbove, $cellLeft and $cellUpperLeft is useless as we already know the necessary value;

    • added: if strings are equal the LD is known (zero) and we can stop;



  • Get-LevenshteinDistance      (~ ): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.

Comparison table: .CR164518test.ps1 | Format-Table -AutoSize



cmdlet similarity runtime (ms) lengths LD
------ ---------- ------------ ------- --
Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
Get-LevenshteinDistanceTry2 equal .1795 106 106 0
Get-LevenshteinDistance equal .1539 106 106 0
Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
Get-LevenshteinDistance stochastic 13.4852 106 102 79
Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
Get-LevenshteinDistance similar 8.4234 106 102 4
Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
Get-LevenshteinDistance different 13.8924 106 102 106


Column explanation:




  • cmdlet       - function name


  • similarity   - in brief "like" similarity of input strings


  • runtime (ms) - TotalMilliseconds


  • lengths      - lengths of input strings, space delimited


  • LD           - the Levenshtein Distance of input strings

Comparison script 164518test.ps1:



Function TestLD 
param ([string]$Similarity = '')

$aux = 0
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation 'Start' -PercentComplete $aux
$cmdletTails = 'Orig','Orig+','Try1','Try2',''
foreach ( $cmdletTail in $cmdletTails )
$cmdlet = "Get-LevenshteinDistance$cmdletTail"
$scriptBlock = $LevenshteinDistance = & $cmdlet `
-CompareString $strC -DifferenceString $strD
$TimeSpan = Measure-Command -Expression $scriptBlock
$aux += [int](100 / $cmdletTails.Count)
Write-Progress -Activity "LD info ($Similarity strings)" `
-CurrentOperation $cmdlet -PercentComplete $aux
[PSCustomObject]@
'cmdlet' = $cmdlet
'similarity' = $Similarity
'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
ToString('#####.###0',$cultureinfo).
PadLeft(10))
'lengths' = "$($strC.Length) $($strD.Length)"
'LD' = $LevenshteinDistance





. D:PShellCR164518.ps1 # reload the functions
$cultureinfo = [cultureinfo]::InvariantCulture # my one is different
$strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
$strD=$strC
TestLD -Similarity 'equal'
$strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
TestLD -Similarity 'stochastic'
$strC='a' * $strC.Length
$strD='a' * $strD.Length
TestLD -Similarity 'similar'
$strD='x' * $strD.Length
TestLD -Similarity 'different'


The main script 164518.ps1 (functions are defined here in the reverse order):



function Get-LevenshteinDistance
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0

# Build the comparison matrix as a (rectangular) jagged array
# $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
$comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
for($index=0; $index -le $differenceStringLength; $index++)
# Create row
# $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
$comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
# Populate the first column
$comparisonMatrix[$index][0]=$index

# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0][$index]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list (a variable for the only use)
# $differenceCharacter = $differenceString[$rowIndex-1] # the only use

# Calculate the cost
if($compareCharacter -ceq $differenceString[$rowIndex-1])
$comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
else
# $cost = 1 # unneeded variable for the only use
# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1

# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1

# # The cell diagonally above and to the left plus the cost ↓
$cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1

# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength][$compareStringLength]


function Get-LevenshteinDistanceTry2
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength
# If strings are equal the LD is known and we can stop
if ($CompareString -ceq $DifferenceString) return 0
# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
# $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index

# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]

# Calculate the cost
if($compareCharacter -ceq $differenceCharacter)
$comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
else Measure-Object -Minimum).Minimum
$comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]


function Get-LevenshteinDistanceTry1
#[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength

# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index


# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]

##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
##Write-Verbose "Cost: $cost"

# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
# $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]


function Get-LevenshteinDistanceOrig+
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength

# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index


# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]


function Get-LevenshteinDistanceOrig
[cmdletbinding()]
param(
[string]$CompareString,
[string]$DifferenceString
)

# Collect the string lengths
$compareStringLength = $CompareString.Length
$differenceStringLength = $DifferenceString.Length
Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

# If either of the strings are empty the LD is known and we can stop
if($compareStringLength -eq 0)return $differenceStringLength
if($differenceStringLength -eq 0)return $compareStringLength

# Build the comparison matrix and populate the first column and first row.
$comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

# Populate the first row
for($index=0; $index -le $compareStringLength; $index++)
$comparisonMatrix[0,$index]=$index


# Populate the first column
for($index=0; $index -le $differenceStringLength; $index++)
$comparisonMatrix[$index,0]=$index


# Calculate the Levenshtein distance by working through each position in the matrix.
# Working the columns
for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
# Cycle each character in the list
$compareCharacter = $compareString[$columnIndex-1]

# Working the rows
for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
# Cycle each character in the list
$differenceCharacter = $differenceString[$rowIndex-1]

Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
# Calculate the cost
$cost=if($compareCharacter -ceq $differenceCharacter)0else1
Write-Verbose "Cost: $cost"

# The cell immediately above plus 1
$cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

# The cell immediately to the left plus 1
$cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

# The cell diagonally above and to the left plus the cost
$cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

# Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
$comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

# The cacluated LD will now be in the bottom right of the matrix.
return $comparisonMatrix[$differenceStringLength,$compareStringLength]



Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:




  • permanently: change -ceq to -eq in their two occurrences, or


  • ad hoc, in a particular case: use .ToUpper() or .ToLower() functions, e.g. as

Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
# or
Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()




share









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f164518%2fget-levenshtein-distance%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):




    • Get-LevenshteinDistanceOrig  (= , for comparison): the original function body (Copy&Paste);


    • Get-LevenshteinDistanceOrig+ (> , still insufficient): removed all time-expensive Write-Verbose output;


    • Get-LevenshteinDistanceTry1  (> 50×, a substantial improvement): time-expensive pipeline to Measure-Object replaced by [math]::Min(n,m) static method (see .NET system.math class). The following are only minor advancement steps:


    • Get-LevenshteinDistanceTry2  (~ ..): slight algorithm renovation:

      • instead calculating the cost ($cost variable): if compared characters are equal then computing and comparing values $cellAbove, $cellLeft and $cellUpperLeft is useless as we already know the necessary value;

      • added: if strings are equal the LD is known (zero) and we can stop;



    • Get-LevenshteinDistance      (~ ): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.

    Comparison table: .CR164518test.ps1 | Format-Table -AutoSize



    cmdlet similarity runtime (ms) lengths LD
    ------ ---------- ------------ ------- --
    Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
    Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
    Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
    Get-LevenshteinDistanceTry2 equal .1795 106 106 0
    Get-LevenshteinDistance equal .1539 106 106 0
    Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
    Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
    Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
    Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
    Get-LevenshteinDistance stochastic 13.4852 106 102 79
    Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
    Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
    Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
    Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
    Get-LevenshteinDistance similar 8.4234 106 102 4
    Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
    Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
    Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
    Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
    Get-LevenshteinDistance different 13.8924 106 102 106


    Column explanation:




    • cmdlet       - function name


    • similarity   - in brief "like" similarity of input strings


    • runtime (ms) - TotalMilliseconds


    • lengths      - lengths of input strings, space delimited


    • LD           - the Levenshtein Distance of input strings

    Comparison script 164518test.ps1:



    Function TestLD 
    param ([string]$Similarity = '')

    $aux = 0
    Write-Progress -Activity "LD info ($Similarity strings)" `
    -CurrentOperation 'Start' -PercentComplete $aux
    $cmdletTails = 'Orig','Orig+','Try1','Try2',''
    foreach ( $cmdletTail in $cmdletTails )
    $cmdlet = "Get-LevenshteinDistance$cmdletTail"
    $scriptBlock = $LevenshteinDistance = & $cmdlet `
    -CompareString $strC -DifferenceString $strD
    $TimeSpan = Measure-Command -Expression $scriptBlock
    $aux += [int](100 / $cmdletTails.Count)
    Write-Progress -Activity "LD info ($Similarity strings)" `
    -CurrentOperation $cmdlet -PercentComplete $aux
    [PSCustomObject]@
    'cmdlet' = $cmdlet
    'similarity' = $Similarity
    'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
    ToString('#####.###0',$cultureinfo).
    PadLeft(10))
    'lengths' = "$($strC.Length) $($strD.Length)"
    'LD' = $LevenshteinDistance





    . D:PShellCR164518.ps1 # reload the functions
    $cultureinfo = [cultureinfo]::InvariantCulture # my one is different
    $strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
    $strD=$strC
    TestLD -Similarity 'equal'
    $strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
    TestLD -Similarity 'stochastic'
    $strC='a' * $strC.Length
    $strD='a' * $strD.Length
    TestLD -Similarity 'similar'
    $strD='x' * $strD.Length
    TestLD -Similarity 'different'


    The main script 164518.ps1 (functions are defined here in the reverse order):



    function Get-LevenshteinDistance
    param(
    [string]$CompareString,
    [string]$DifferenceString
    )

    # Collect the string lengths
    $compareStringLength = $CompareString.Length
    $differenceStringLength = $DifferenceString.Length

    # If either of the strings are empty the LD is known and we can stop
    if($compareStringLength -eq 0)return $differenceStringLength
    if($differenceStringLength -eq 0)return $compareStringLength
    # If strings are equal the LD is known and we can stop
    if ($CompareString -ceq $DifferenceString) return 0

    # Build the comparison matrix as a (rectangular) jagged array
    # $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
    $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
    for($index=0; $index -le $differenceStringLength; $index++)
    # Create row
    # $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
    $comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
    # Populate the first column
    $comparisonMatrix[$index][0]=$index

    # Populate the first row
    for($index=0; $index -le $compareStringLength; $index++)
    $comparisonMatrix[0][$index]=$index


    # Calculate the Levenshtein distance by working through each position in the matrix.
    # Working the columns
    for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
    # Cycle each character in the list
    $compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable

    # Working the rows
    for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
    # Cycle each character in the list (a variable for the only use)
    # $differenceCharacter = $differenceString[$rowIndex-1] # the only use

    # Calculate the cost
    if($compareCharacter -ceq $differenceString[$rowIndex-1])
    $comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
    else
    # $cost = 1 # unneeded variable for the only use
    # The cell immediately above plus 1
    $cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1

    # The cell immediately to the left plus 1
    $cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1

    # # The cell diagonally above and to the left plus the cost ↓
    $cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1

    # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
    $comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



    # The cacluated LD will now be in the bottom right of the matrix.
    return $comparisonMatrix[$differenceStringLength][$compareStringLength]


    function Get-LevenshteinDistanceTry2
    param(
    [string]$CompareString,
    [string]$DifferenceString
    )

    # Collect the string lengths
    $compareStringLength = $CompareString.Length
    $differenceStringLength = $DifferenceString.Length

    # If either of the strings are empty the LD is known and we can stop
    if($compareStringLength -eq 0)return $differenceStringLength
    if($differenceStringLength -eq 0)return $compareStringLength
    # If strings are equal the LD is known and we can stop
    if ($CompareString -ceq $DifferenceString) return 0
    # Build the comparison matrix and populate the first column and first row.
    $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
    # $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
    # Populate the first row
    for($index=0; $index -le $compareStringLength; $index++)
    $comparisonMatrix[0,$index]=$index

    # Populate the first column
    for($index=0; $index -le $differenceStringLength; $index++)
    $comparisonMatrix[$index,0]=$index


    # Calculate the Levenshtein distance by working through each position in the matrix.
    # Working the columns
    for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
    # Cycle each character in the list
    $compareCharacter = $compareString[$columnIndex-1]

    # Working the rows
    for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
    # Cycle each character in the list
    $differenceCharacter = $differenceString[$rowIndex-1]

    # Calculate the cost
    if($compareCharacter -ceq $differenceCharacter)
    $comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
    else Measure-Object -Minimum).Minimum
    $comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



    # The cacluated LD will now be in the bottom right of the matrix.
    return $comparisonMatrix[$differenceStringLength,$compareStringLength]


    function Get-LevenshteinDistanceTry1
    #[cmdletbinding()]
    param(
    [string]$CompareString,
    [string]$DifferenceString
    )

    # Collect the string lengths
    $compareStringLength = $CompareString.Length
    $differenceStringLength = $DifferenceString.Length
    ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
    ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

    # If either of the strings are empty the LD is known and we can stop
    if($compareStringLength -eq 0)return $differenceStringLength
    if($differenceStringLength -eq 0)return $compareStringLength

    # Build the comparison matrix and populate the first column and first row.
    $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

    # Populate the first row
    for($index=0; $index -le $compareStringLength; $index++)
    $comparisonMatrix[0,$index]=$index


    # Populate the first column
    for($index=0; $index -le $differenceStringLength; $index++)
    $comparisonMatrix[$index,0]=$index


    # Calculate the Levenshtein distance by working through each position in the matrix.
    # Working the columns
    for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
    # Cycle each character in the list
    $compareCharacter = $compareString[$columnIndex-1]

    # Working the rows
    for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
    # Cycle each character in the list
    $differenceCharacter = $differenceString[$rowIndex-1]

    ##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
    ##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
    # Calculate the cost
    $cost=if($compareCharacter -ceq $differenceCharacter)0else1
    ##Write-Verbose "Cost: $cost"

    # The cell immediately above plus 1
    $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
    ##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

    # The cell immediately to the left plus 1
    $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
    ##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

    # The cell diagonally above and to the left plus the cost
    $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
    ##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

    # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
    # $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

    # The cacluated LD will now be in the bottom right of the matrix.
    return $comparisonMatrix[$differenceStringLength,$compareStringLength]


    function Get-LevenshteinDistanceOrig+
    [cmdletbinding()]
    param(
    [string]$CompareString,
    [string]$DifferenceString
    )

    # Collect the string lengths
    $compareStringLength = $CompareString.Length
    $differenceStringLength = $DifferenceString.Length
    ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
    ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

    # If either of the strings are empty the LD is known and we can stop
    if($compareStringLength -eq 0)return $differenceStringLength
    if($differenceStringLength -eq 0)return $compareStringLength

    # Build the comparison matrix and populate the first column and first row.
    $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

    # Populate the first row
    for($index=0; $index -le $compareStringLength; $index++)
    $comparisonMatrix[0,$index]=$index


    # Populate the first column
    for($index=0; $index -le $differenceStringLength; $index++)
    $comparisonMatrix[$index,0]=$index


    # Calculate the Levenshtein distance by working through each position in the matrix.
    # Working the columns
    for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
    # Cycle each character in the list
    $compareCharacter = $compareString[$columnIndex-1]

    # Working the rows
    for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


    # The cacluated LD will now be in the bottom right of the matrix.
    return $comparisonMatrix[$differenceStringLength,$compareStringLength]


    function Get-LevenshteinDistanceOrig
    [cmdletbinding()]
    param(
    [string]$CompareString,
    [string]$DifferenceString
    )

    # Collect the string lengths
    $compareStringLength = $CompareString.Length
    $differenceStringLength = $DifferenceString.Length
    Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
    Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

    # If either of the strings are empty the LD is known and we can stop
    if($compareStringLength -eq 0)return $differenceStringLength
    if($differenceStringLength -eq 0)return $compareStringLength

    # Build the comparison matrix and populate the first column and first row.
    $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

    # Populate the first row
    for($index=0; $index -le $compareStringLength; $index++)
    $comparisonMatrix[0,$index]=$index


    # Populate the first column
    for($index=0; $index -le $differenceStringLength; $index++)
    $comparisonMatrix[$index,0]=$index


    # Calculate the Levenshtein distance by working through each position in the matrix.
    # Working the columns
    for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
    # Cycle each character in the list
    $compareCharacter = $compareString[$columnIndex-1]

    # Working the rows
    for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
    # Cycle each character in the list
    $differenceCharacter = $differenceString[$rowIndex-1]

    Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
    Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
    # Calculate the cost
    $cost=if($compareCharacter -ceq $differenceCharacter)0else1
    Write-Verbose "Cost: $cost"

    # The cell immediately above plus 1
    $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
    Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

    # The cell immediately to the left plus 1
    $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
    Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

    # The cell diagonally above and to the left plus the cost
    $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
    Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

    # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
    $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

    # The cacluated LD will now be in the bottom right of the matrix.
    return $comparisonMatrix[$differenceStringLength,$compareStringLength]



    Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:




    • permanently: change -ceq to -eq in their two occurrences, or


    • ad hoc, in a particular case: use .ToUpper() or .ToLower() functions, e.g. as

    Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
    # or
    Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()




    share









    $endgroup$

















      0












      $begingroup$

      Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):




      • Get-LevenshteinDistanceOrig  (= , for comparison): the original function body (Copy&Paste);


      • Get-LevenshteinDistanceOrig+ (> , still insufficient): removed all time-expensive Write-Verbose output;


      • Get-LevenshteinDistanceTry1  (> 50×, a substantial improvement): time-expensive pipeline to Measure-Object replaced by [math]::Min(n,m) static method (see .NET system.math class). The following are only minor advancement steps:


      • Get-LevenshteinDistanceTry2  (~ ..): slight algorithm renovation:

        • instead calculating the cost ($cost variable): if compared characters are equal then computing and comparing values $cellAbove, $cellLeft and $cellUpperLeft is useless as we already know the necessary value;

        • added: if strings are equal the LD is known (zero) and we can stop;



      • Get-LevenshteinDistance      (~ ): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.

      Comparison table: .CR164518test.ps1 | Format-Table -AutoSize



      cmdlet similarity runtime (ms) lengths LD
      ------ ---------- ------------ ------- --
      Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
      Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
      Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
      Get-LevenshteinDistanceTry2 equal .1795 106 106 0
      Get-LevenshteinDistance equal .1539 106 106 0
      Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
      Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
      Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
      Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
      Get-LevenshteinDistance stochastic 13.4852 106 102 79
      Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
      Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
      Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
      Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
      Get-LevenshteinDistance similar 8.4234 106 102 4
      Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
      Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
      Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
      Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
      Get-LevenshteinDistance different 13.8924 106 102 106


      Column explanation:




      • cmdlet       - function name


      • similarity   - in brief "like" similarity of input strings


      • runtime (ms) - TotalMilliseconds


      • lengths      - lengths of input strings, space delimited


      • LD           - the Levenshtein Distance of input strings

      Comparison script 164518test.ps1:



      Function TestLD 
      param ([string]$Similarity = '')

      $aux = 0
      Write-Progress -Activity "LD info ($Similarity strings)" `
      -CurrentOperation 'Start' -PercentComplete $aux
      $cmdletTails = 'Orig','Orig+','Try1','Try2',''
      foreach ( $cmdletTail in $cmdletTails )
      $cmdlet = "Get-LevenshteinDistance$cmdletTail"
      $scriptBlock = $LevenshteinDistance = & $cmdlet `
      -CompareString $strC -DifferenceString $strD
      $TimeSpan = Measure-Command -Expression $scriptBlock
      $aux += [int](100 / $cmdletTails.Count)
      Write-Progress -Activity "LD info ($Similarity strings)" `
      -CurrentOperation $cmdlet -PercentComplete $aux
      [PSCustomObject]@
      'cmdlet' = $cmdlet
      'similarity' = $Similarity
      'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
      ToString('#####.###0',$cultureinfo).
      PadLeft(10))
      'lengths' = "$($strC.Length) $($strD.Length)"
      'LD' = $LevenshteinDistance





      . D:PShellCR164518.ps1 # reload the functions
      $cultureinfo = [cultureinfo]::InvariantCulture # my one is different
      $strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
      $strD=$strC
      TestLD -Similarity 'equal'
      $strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
      TestLD -Similarity 'stochastic'
      $strC='a' * $strC.Length
      $strD='a' * $strD.Length
      TestLD -Similarity 'similar'
      $strD='x' * $strD.Length
      TestLD -Similarity 'different'


      The main script 164518.ps1 (functions are defined here in the reverse order):



      function Get-LevenshteinDistance
      param(
      [string]$CompareString,
      [string]$DifferenceString
      )

      # Collect the string lengths
      $compareStringLength = $CompareString.Length
      $differenceStringLength = $DifferenceString.Length

      # If either of the strings are empty the LD is known and we can stop
      if($compareStringLength -eq 0)return $differenceStringLength
      if($differenceStringLength -eq 0)return $compareStringLength
      # If strings are equal the LD is known and we can stop
      if ($CompareString -ceq $DifferenceString) return 0

      # Build the comparison matrix as a (rectangular) jagged array
      # $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
      $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
      for($index=0; $index -le $differenceStringLength; $index++)
      # Create row
      # $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
      $comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
      # Populate the first column
      $comparisonMatrix[$index][0]=$index

      # Populate the first row
      for($index=0; $index -le $compareStringLength; $index++)
      $comparisonMatrix[0][$index]=$index


      # Calculate the Levenshtein distance by working through each position in the matrix.
      # Working the columns
      for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
      # Cycle each character in the list
      $compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable

      # Working the rows
      for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
      # Cycle each character in the list (a variable for the only use)
      # $differenceCharacter = $differenceString[$rowIndex-1] # the only use

      # Calculate the cost
      if($compareCharacter -ceq $differenceString[$rowIndex-1])
      $comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
      else
      # $cost = 1 # unneeded variable for the only use
      # The cell immediately above plus 1
      $cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1

      # The cell immediately to the left plus 1
      $cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1

      # # The cell diagonally above and to the left plus the cost ↓
      $cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1

      # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
      $comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



      # The cacluated LD will now be in the bottom right of the matrix.
      return $comparisonMatrix[$differenceStringLength][$compareStringLength]


      function Get-LevenshteinDistanceTry2
      param(
      [string]$CompareString,
      [string]$DifferenceString
      )

      # Collect the string lengths
      $compareStringLength = $CompareString.Length
      $differenceStringLength = $DifferenceString.Length

      # If either of the strings are empty the LD is known and we can stop
      if($compareStringLength -eq 0)return $differenceStringLength
      if($differenceStringLength -eq 0)return $compareStringLength
      # If strings are equal the LD is known and we can stop
      if ($CompareString -ceq $DifferenceString) return 0
      # Build the comparison matrix and populate the first column and first row.
      $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
      # $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
      # Populate the first row
      for($index=0; $index -le $compareStringLength; $index++)
      $comparisonMatrix[0,$index]=$index

      # Populate the first column
      for($index=0; $index -le $differenceStringLength; $index++)
      $comparisonMatrix[$index,0]=$index


      # Calculate the Levenshtein distance by working through each position in the matrix.
      # Working the columns
      for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
      # Cycle each character in the list
      $compareCharacter = $compareString[$columnIndex-1]

      # Working the rows
      for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
      # Cycle each character in the list
      $differenceCharacter = $differenceString[$rowIndex-1]

      # Calculate the cost
      if($compareCharacter -ceq $differenceCharacter)
      $comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
      else Measure-Object -Minimum).Minimum
      $comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



      # The cacluated LD will now be in the bottom right of the matrix.
      return $comparisonMatrix[$differenceStringLength,$compareStringLength]


      function Get-LevenshteinDistanceTry1
      #[cmdletbinding()]
      param(
      [string]$CompareString,
      [string]$DifferenceString
      )

      # Collect the string lengths
      $compareStringLength = $CompareString.Length
      $differenceStringLength = $DifferenceString.Length
      ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
      ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

      # If either of the strings are empty the LD is known and we can stop
      if($compareStringLength -eq 0)return $differenceStringLength
      if($differenceStringLength -eq 0)return $compareStringLength

      # Build the comparison matrix and populate the first column and first row.
      $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

      # Populate the first row
      for($index=0; $index -le $compareStringLength; $index++)
      $comparisonMatrix[0,$index]=$index


      # Populate the first column
      for($index=0; $index -le $differenceStringLength; $index++)
      $comparisonMatrix[$index,0]=$index


      # Calculate the Levenshtein distance by working through each position in the matrix.
      # Working the columns
      for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
      # Cycle each character in the list
      $compareCharacter = $compareString[$columnIndex-1]

      # Working the rows
      for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
      # Cycle each character in the list
      $differenceCharacter = $differenceString[$rowIndex-1]

      ##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
      ##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
      # Calculate the cost
      $cost=if($compareCharacter -ceq $differenceCharacter)0else1
      ##Write-Verbose "Cost: $cost"

      # The cell immediately above plus 1
      $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
      ##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

      # The cell immediately to the left plus 1
      $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
      ##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

      # The cell diagonally above and to the left plus the cost
      $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
      ##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

      # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
      # $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

      # The cacluated LD will now be in the bottom right of the matrix.
      return $comparisonMatrix[$differenceStringLength,$compareStringLength]


      function Get-LevenshteinDistanceOrig+
      [cmdletbinding()]
      param(
      [string]$CompareString,
      [string]$DifferenceString
      )

      # Collect the string lengths
      $compareStringLength = $CompareString.Length
      $differenceStringLength = $DifferenceString.Length
      ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
      ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

      # If either of the strings are empty the LD is known and we can stop
      if($compareStringLength -eq 0)return $differenceStringLength
      if($differenceStringLength -eq 0)return $compareStringLength

      # Build the comparison matrix and populate the first column and first row.
      $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

      # Populate the first row
      for($index=0; $index -le $compareStringLength; $index++)
      $comparisonMatrix[0,$index]=$index


      # Populate the first column
      for($index=0; $index -le $differenceStringLength; $index++)
      $comparisonMatrix[$index,0]=$index


      # Calculate the Levenshtein distance by working through each position in the matrix.
      # Working the columns
      for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
      # Cycle each character in the list
      $compareCharacter = $compareString[$columnIndex-1]

      # Working the rows
      for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


      # The cacluated LD will now be in the bottom right of the matrix.
      return $comparisonMatrix[$differenceStringLength,$compareStringLength]


      function Get-LevenshteinDistanceOrig
      [cmdletbinding()]
      param(
      [string]$CompareString,
      [string]$DifferenceString
      )

      # Collect the string lengths
      $compareStringLength = $CompareString.Length
      $differenceStringLength = $DifferenceString.Length
      Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
      Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

      # If either of the strings are empty the LD is known and we can stop
      if($compareStringLength -eq 0)return $differenceStringLength
      if($differenceStringLength -eq 0)return $compareStringLength

      # Build the comparison matrix and populate the first column and first row.
      $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

      # Populate the first row
      for($index=0; $index -le $compareStringLength; $index++)
      $comparisonMatrix[0,$index]=$index


      # Populate the first column
      for($index=0; $index -le $differenceStringLength; $index++)
      $comparisonMatrix[$index,0]=$index


      # Calculate the Levenshtein distance by working through each position in the matrix.
      # Working the columns
      for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
      # Cycle each character in the list
      $compareCharacter = $compareString[$columnIndex-1]

      # Working the rows
      for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
      # Cycle each character in the list
      $differenceCharacter = $differenceString[$rowIndex-1]

      Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
      Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
      # Calculate the cost
      $cost=if($compareCharacter -ceq $differenceCharacter)0else1
      Write-Verbose "Cost: $cost"

      # The cell immediately above plus 1
      $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
      Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

      # The cell immediately to the left plus 1
      $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
      Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

      # The cell diagonally above and to the left plus the cost
      $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
      Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

      # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
      $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

      # The cacluated LD will now be in the bottom right of the matrix.
      return $comparisonMatrix[$differenceStringLength,$compareStringLength]



      Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:




      • permanently: change -ceq to -eq in their two occurrences, or


      • ad hoc, in a particular case: use .ToUpper() or .ToLower() functions, e.g. as

      Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
      # or
      Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()




      share









      $endgroup$















        0












        0








        0





        $begingroup$

        Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):




        • Get-LevenshteinDistanceOrig  (= , for comparison): the original function body (Copy&Paste);


        • Get-LevenshteinDistanceOrig+ (> , still insufficient): removed all time-expensive Write-Verbose output;


        • Get-LevenshteinDistanceTry1  (> 50×, a substantial improvement): time-expensive pipeline to Measure-Object replaced by [math]::Min(n,m) static method (see .NET system.math class). The following are only minor advancement steps:


        • Get-LevenshteinDistanceTry2  (~ ..): slight algorithm renovation:

          • instead calculating the cost ($cost variable): if compared characters are equal then computing and comparing values $cellAbove, $cellLeft and $cellUpperLeft is useless as we already know the necessary value;

          • added: if strings are equal the LD is known (zero) and we can stop;



        • Get-LevenshteinDistance      (~ ): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.

        Comparison table: .CR164518test.ps1 | Format-Table -AutoSize



        cmdlet similarity runtime (ms) lengths LD
        ------ ---------- ------------ ------- --
        Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
        Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
        Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
        Get-LevenshteinDistanceTry2 equal .1795 106 106 0
        Get-LevenshteinDistance equal .1539 106 106 0
        Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
        Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
        Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
        Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
        Get-LevenshteinDistance stochastic 13.4852 106 102 79
        Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
        Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
        Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
        Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
        Get-LevenshteinDistance similar 8.4234 106 102 4
        Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
        Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
        Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
        Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
        Get-LevenshteinDistance different 13.8924 106 102 106


        Column explanation:




        • cmdlet       - function name


        • similarity   - in brief "like" similarity of input strings


        • runtime (ms) - TotalMilliseconds


        • lengths      - lengths of input strings, space delimited


        • LD           - the Levenshtein Distance of input strings

        Comparison script 164518test.ps1:



        Function TestLD 
        param ([string]$Similarity = '')

        $aux = 0
        Write-Progress -Activity "LD info ($Similarity strings)" `
        -CurrentOperation 'Start' -PercentComplete $aux
        $cmdletTails = 'Orig','Orig+','Try1','Try2',''
        foreach ( $cmdletTail in $cmdletTails )
        $cmdlet = "Get-LevenshteinDistance$cmdletTail"
        $scriptBlock = $LevenshteinDistance = & $cmdlet `
        -CompareString $strC -DifferenceString $strD
        $TimeSpan = Measure-Command -Expression $scriptBlock
        $aux += [int](100 / $cmdletTails.Count)
        Write-Progress -Activity "LD info ($Similarity strings)" `
        -CurrentOperation $cmdlet -PercentComplete $aux
        [PSCustomObject]@
        'cmdlet' = $cmdlet
        'similarity' = $Similarity
        'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
        ToString('#####.###0',$cultureinfo).
        PadLeft(10))
        'lengths' = "$($strC.Length) $($strD.Length)"
        'LD' = $LevenshteinDistance





        . D:PShellCR164518.ps1 # reload the functions
        $cultureinfo = [cultureinfo]::InvariantCulture # my one is different
        $strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
        $strD=$strC
        TestLD -Similarity 'equal'
        $strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
        TestLD -Similarity 'stochastic'
        $strC='a' * $strC.Length
        $strD='a' * $strD.Length
        TestLD -Similarity 'similar'
        $strD='x' * $strD.Length
        TestLD -Similarity 'different'


        The main script 164518.ps1 (functions are defined here in the reverse order):



        function Get-LevenshteinDistance
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength
        # If strings are equal the LD is known and we can stop
        if ($CompareString -ceq $DifferenceString) return 0

        # Build the comparison matrix as a (rectangular) jagged array
        # $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
        $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
        for($index=0; $index -le $differenceStringLength; $index++)
        # Create row
        # $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
        $comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
        # Populate the first column
        $comparisonMatrix[$index][0]=$index

        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0][$index]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
        # Cycle each character in the list (a variable for the only use)
        # $differenceCharacter = $differenceString[$rowIndex-1] # the only use

        # Calculate the cost
        if($compareCharacter -ceq $differenceString[$rowIndex-1])
        $comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
        else
        # $cost = 1 # unneeded variable for the only use
        # The cell immediately above plus 1
        $cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1

        # The cell immediately to the left plus 1
        $cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1

        # # The cell diagonally above and to the left plus the cost ↓
        $cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1

        # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
        $comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength][$compareStringLength]


        function Get-LevenshteinDistanceTry2
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength
        # If strings are equal the LD is known and we can stop
        if ($CompareString -ceq $DifferenceString) return 0
        # Build the comparison matrix and populate the first column and first row.
        $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
        # $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0,$index]=$index

        # Populate the first column
        for($index=0; $index -le $differenceStringLength; $index++)
        $comparisonMatrix[$index,0]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1]

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
        # Cycle each character in the list
        $differenceCharacter = $differenceString[$rowIndex-1]

        # Calculate the cost
        if($compareCharacter -ceq $differenceCharacter)
        $comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
        else Measure-Object -Minimum).Minimum
        $comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength,$compareStringLength]


        function Get-LevenshteinDistanceTry1
        #[cmdletbinding()]
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length
        ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
        ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength

        # Build the comparison matrix and populate the first column and first row.
        $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0,$index]=$index


        # Populate the first column
        for($index=0; $index -le $differenceStringLength; $index++)
        $comparisonMatrix[$index,0]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1]

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
        # Cycle each character in the list
        $differenceCharacter = $differenceString[$rowIndex-1]

        ##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
        ##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
        # Calculate the cost
        $cost=if($compareCharacter -ceq $differenceCharacter)0else1
        ##Write-Verbose "Cost: $cost"

        # The cell immediately above plus 1
        $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
        ##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

        # The cell immediately to the left plus 1
        $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
        ##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

        # The cell diagonally above and to the left plus the cost
        $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
        ##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

        # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
        # $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength,$compareStringLength]


        function Get-LevenshteinDistanceOrig+
        [cmdletbinding()]
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length
        ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
        ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength

        # Build the comparison matrix and populate the first column and first row.
        $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0,$index]=$index


        # Populate the first column
        for($index=0; $index -le $differenceStringLength; $index++)
        $comparisonMatrix[$index,0]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1]

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength,$compareStringLength]


        function Get-LevenshteinDistanceOrig
        [cmdletbinding()]
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length
        Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
        Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength

        # Build the comparison matrix and populate the first column and first row.
        $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0,$index]=$index


        # Populate the first column
        for($index=0; $index -le $differenceStringLength; $index++)
        $comparisonMatrix[$index,0]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1]

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
        # Cycle each character in the list
        $differenceCharacter = $differenceString[$rowIndex-1]

        Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
        Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
        # Calculate the cost
        $cost=if($compareCharacter -ceq $differenceCharacter)0else1
        Write-Verbose "Cost: $cost"

        # The cell immediately above plus 1
        $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
        Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

        # The cell immediately to the left plus 1
        $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
        Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

        # The cell diagonally above and to the left plus the cost
        $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
        Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

        # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
        $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength,$compareStringLength]



        Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:




        • permanently: change -ceq to -eq in their two occurrences, or


        • ad hoc, in a particular case: use .ToUpper() or .ToLower() functions, e.g. as

        Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
        # or
        Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()




        share









        $endgroup$



        Here is my development series (in parentheses: approx fast ratio compared with the previous, tested using strings cca 100 characters in size, see the comparison table below):




        • Get-LevenshteinDistanceOrig  (= , for comparison): the original function body (Copy&Paste);


        • Get-LevenshteinDistanceOrig+ (> , still insufficient): removed all time-expensive Write-Verbose output;


        • Get-LevenshteinDistanceTry1  (> 50×, a substantial improvement): time-expensive pipeline to Measure-Object replaced by [math]::Min(n,m) static method (see .NET system.math class). The following are only minor advancement steps:


        • Get-LevenshteinDistanceTry2  (~ ..): slight algorithm renovation:

          • instead calculating the cost ($cost variable): if compared characters are equal then computing and comparing values $cellAbove, $cellLeft and $cellUpperLeft is useless as we already know the necessary value;

          • added: if strings are equal the LD is known (zero) and we can stop;



        • Get-LevenshteinDistance      (~ ): PoSH array implementation: instead of two-dimensional comparison matrix used a rectangular jagged array.

        Comparison table: .CR164518test.ps1 | Format-Table -AutoSize



        cmdlet similarity runtime (ms) lengths LD
        ------ ---------- ------------ ------- --
        Get-LevenshteinDistanceOrig equal 6829.1053 106 106 0
        Get-LevenshteinDistanceOrig+ equal 2010.4538 106 106 0
        Get-LevenshteinDistanceTry1 equal 35.8835 106 106 0
        Get-LevenshteinDistanceTry2 equal .1795 106 106 0
        Get-LevenshteinDistance equal .1539 106 106 0
        Get-LevenshteinDistanceOrig stochastic 6556.4117 106 102 79
        Get-LevenshteinDistanceOrig+ stochastic 1932.6051 106 102 79
        Get-LevenshteinDistanceTry1 stochastic 33.9023 106 102 79
        Get-LevenshteinDistanceTry2 stochastic 28.3165 106 102 79
        Get-LevenshteinDistance stochastic 13.4852 106 102 79
        Get-LevenshteinDistanceOrig similar 6640.5884 106 102 4
        Get-LevenshteinDistanceOrig+ similar 2023.8179 106 102 4
        Get-LevenshteinDistanceTry1 similar 31.5843 106 102 4
        Get-LevenshteinDistanceTry2 similar 14.9307 106 102 4
        Get-LevenshteinDistance similar 8.4234 106 102 4
        Get-LevenshteinDistanceOrig different 6613.5267 106 102 106
        Get-LevenshteinDistanceOrig+ different 1943.7630 106 102 106
        Get-LevenshteinDistanceTry1 different 35.1824 106 102 106
        Get-LevenshteinDistanceTry2 different 27.1371 106 102 106
        Get-LevenshteinDistance different 13.8924 106 102 106


        Column explanation:




        • cmdlet       - function name


        • similarity   - in brief "like" similarity of input strings


        • runtime (ms) - TotalMilliseconds


        • lengths      - lengths of input strings, space delimited


        • LD           - the Levenshtein Distance of input strings

        Comparison script 164518test.ps1:



        Function TestLD 
        param ([string]$Similarity = '')

        $aux = 0
        Write-Progress -Activity "LD info ($Similarity strings)" `
        -CurrentOperation 'Start' -PercentComplete $aux
        $cmdletTails = 'Orig','Orig+','Try1','Try2',''
        foreach ( $cmdletTail in $cmdletTails )
        $cmdlet = "Get-LevenshteinDistance$cmdletTail"
        $scriptBlock = $LevenshteinDistance = & $cmdlet `
        -CompareString $strC -DifferenceString $strD
        $TimeSpan = Measure-Command -Expression $scriptBlock
        $aux += [int](100 / $cmdletTails.Count)
        Write-Progress -Activity "LD info ($Similarity strings)" `
        -CurrentOperation $cmdlet -PercentComplete $aux
        [PSCustomObject]@
        'cmdlet' = $cmdlet
        'similarity' = $Similarity
        'runtime (ms)' = $($TimeSpan.TotalMilliseconds.
        ToString('#####.###0',$cultureinfo).
        PadLeft(10))
        'lengths' = "$($strC.Length) $($strD.Length)"
        'LD' = $LevenshteinDistance





        . D:PShellCR164518.ps1 # reload the functions
        $cultureinfo = [cultureinfo]::InvariantCulture # my one is different
        $strC='Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat leo ut turpis viverra lacinia'
        $strD=$strC
        TestLD -Similarity 'equal'
        $strD='Cras efficitur nec orci et posuere. Suspendisse potenti. Quisque blandit auctor purus id facilisis est'
        TestLD -Similarity 'stochastic'
        $strC='a' * $strC.Length
        $strD='a' * $strD.Length
        TestLD -Similarity 'similar'
        $strD='x' * $strD.Length
        TestLD -Similarity 'different'


        The main script 164518.ps1 (functions are defined here in the reverse order):



        function Get-LevenshteinDistance
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength
        # If strings are equal the LD is known and we can stop
        if ($CompareString -ceq $DifferenceString) return 0

        # Build the comparison matrix as a (rectangular) jagged array
        # $comparisonMatrix = New-Object 'object[]' ($differenceStringLength+1)
        $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1))
        for($index=0; $index -le $differenceStringLength; $index++)
        # Create row
        # $comparisonMatrix[$index]=New-Object 'object[]' ($compareStringLength+1)
        $comparisonMatrix[$index]=[System.Array]::CreateInstance([System.Object], ($compareStringLength+1))
        # Populate the first column
        $comparisonMatrix[$index][0]=$index

        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0][$index]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1] # multiple use => to a variable

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
        # Cycle each character in the list (a variable for the only use)
        # $differenceCharacter = $differenceString[$rowIndex-1] # the only use

        # Calculate the cost
        if($compareCharacter -ceq $differenceString[$rowIndex-1])
        $comparisonMatrix[$rowIndex][$columnIndex] = $comparisonMatrix[($rowIndex -1)][($columnIndex -1)]
        else
        # $cost = 1 # unneeded variable for the only use
        # The cell immediately above plus 1
        $cellAbove = $comparisonMatrix[($rowIndex-1)][$columnIndex] + 1

        # The cell immediately to the left plus 1
        $cellLeft = $comparisonMatrix[$rowIndex][($columnIndex-1)] + 1

        # # The cell diagonally above and to the left plus the cost ↓
        $cellUpperLeft = $comparisonMatrix[($rowIndex-1)][($columnIndex-1)] + 1

        # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
        $comparisonMatrix[$rowIndex][$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength][$compareStringLength]


        function Get-LevenshteinDistanceTry2
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength
        # If strings are equal the LD is known and we can stop
        if ($CompareString -ceq $DifferenceString) return 0
        # Build the comparison matrix and populate the first column and first row.
        $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)
        # $comparisonMatrix = [System.Array]::CreateInstance([System.Object], ($differenceStringLength+1),($compareStringLength+1))
        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0,$index]=$index

        # Populate the first column
        for($index=0; $index -le $differenceStringLength; $index++)
        $comparisonMatrix[$index,0]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1]

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
        # Cycle each character in the list
        $differenceCharacter = $differenceString[$rowIndex-1]

        # Calculate the cost
        if($compareCharacter -ceq $differenceCharacter)
        $comparisonMatrix[$rowIndex,$columnIndex] = $comparisonMatrix[($rowIndex -1),($columnIndex -1)]
        else Measure-Object -Minimum).Minimum
        $comparisonMatrix[$rowIndex,$columnIndex] = [math]::Min([math]::Min($cellAbove,$cellLeft),$cellUpperLeft)



        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength,$compareStringLength]


        function Get-LevenshteinDistanceTry1
        #[cmdletbinding()]
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length
        ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
        ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength

        # Build the comparison matrix and populate the first column and first row.
        $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0,$index]=$index


        # Populate the first column
        for($index=0; $index -le $differenceStringLength; $index++)
        $comparisonMatrix[$index,0]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1]

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
        # Cycle each character in the list
        $differenceCharacter = $differenceString[$rowIndex-1]

        ##Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
        ##Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
        # Calculate the cost
        $cost=if($compareCharacter -ceq $differenceCharacter)0else1
        ##Write-Verbose "Cost: $cost"

        # The cell immediately above plus 1
        $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
        ##Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

        # The cell immediately to the left plus 1
        $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
        ##Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

        # The cell diagonally above and to the left plus the cost
        $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
        ##Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

        # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
        # $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength,$compareStringLength]


        function Get-LevenshteinDistanceOrig+
        [cmdletbinding()]
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length
        ##Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
        ##Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength

        # Build the comparison matrix and populate the first column and first row.
        $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0,$index]=$index


        # Populate the first column
        for($index=0; $index -le $differenceStringLength; $index++)
        $comparisonMatrix[$index,0]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1]

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++) select -ExpandProperty Minimum


        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength,$compareStringLength]


        function Get-LevenshteinDistanceOrig
        [cmdletbinding()]
        param(
        [string]$CompareString,
        [string]$DifferenceString
        )

        # Collect the string lengths
        $compareStringLength = $CompareString.Length
        $differenceStringLength = $DifferenceString.Length
        Write-Verbose "Comparision String: '$CompareString' with length '$compareStringLength'"
        Write-Verbose "Difference String: '$DifferenceString' with length '$differenceStringLength'"

        # If either of the strings are empty the LD is known and we can stop
        if($compareStringLength -eq 0)return $differenceStringLength
        if($differenceStringLength -eq 0)return $compareStringLength

        # Build the comparison matrix and populate the first column and first row.
        $comparisonMatrix = New-Object 'object[,]' ($differenceStringLength+1),($compareStringLength+1)

        # Populate the first row
        for($index=0; $index -le $compareStringLength; $index++)
        $comparisonMatrix[0,$index]=$index


        # Populate the first column
        for($index=0; $index -le $differenceStringLength; $index++)
        $comparisonMatrix[$index,0]=$index


        # Calculate the Levenshtein distance by working through each position in the matrix.
        # Working the columns
        for($columnIndex=1; $columnIndex -le $compareStringLength; $columnIndex++)
        # Cycle each character in the list
        $compareCharacter = $compareString[$columnIndex-1]

        # Working the rows
        for($rowIndex=1; $rowIndex -le $differenceStringLength; $rowIndex++)
        # Cycle each character in the list
        $differenceCharacter = $differenceString[$rowIndex-1]

        Write-Verbose "Matrix location: [$rowIndex, $columnIndex]"
        Write-Verbose "Compare character: $compareCharacter - Difference character: $differenceCharacter"
        # Calculate the cost
        $cost=if($compareCharacter -ceq $differenceCharacter)0else1
        Write-Verbose "Cost: $cost"

        # The cell immediately above plus 1
        $cellAbove = $comparisonMatrix[($rowIndex-1), $columnIndex] + 1
        Write-Verbose "Cell Above: [$($rowIndex-1), $columnIndex] + 1 = $cellAbove"

        # The cell immediately to the left plus 1
        $cellLeft = $comparisonMatrix[$rowIndex,($columnIndex-1)] + 1
        Write-Verbose "Cell Left: [$rowIndex,$($columnIndex-1)] + 1 = $cellLeft"

        # The cell diagonally above and to the left plus the cost
        $cellUpperLeft = $comparisonMatrix[($rowIndex-1),($columnIndex-1)] + $cost
        Write-Verbose "Cell Upper Left: [$($rowIndex-1),$($columnIndex-1)] + cost($cost) = $cellUpperLeft"

        # Select minumum of the of the last 3 cells calculations and assign it to the current matrix position.
        $comparisonMatrix[$rowIndex,$columnIndex] = $cellAbove,$cellLeft,$cellUpperLeft

        # The cacluated LD will now be in the bottom right of the matrix.
        return $comparisonMatrix[$differenceStringLength,$compareStringLength]



        Please note case sensitivity of the above functions. For case insensitive Levenshtein Distance:




        • permanently: change -ceq to -eq in their two occurrences, or


        • ad hoc, in a particular case: use .ToUpper() or .ToLower() functions, e.g. as

        Get-LevenshteinDistance -CompareString $strC.ToUpper() -DifferenceString $strD.ToUpper()
        # or
        Get-LevenshteinDistance -CompareString $strC.ToLower() -DifferenceString $strD.ToLower()





        share











        share


        share










        answered 7 mins ago









        JosefZJosefZ

        251210




        251210



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f164518%2fget-levenshtein-distance%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            名間水力發電廠 目录 沿革 設施 鄰近設施 註釋 外部連結 导航菜单23°50′10″N 120°42′41″E / 23.83611°N 120.71139°E / 23.83611; 120.7113923°50′10″N 120°42′41″E / 23.83611°N 120.71139°E / 23.83611; 120.71139計畫概要原始内容臺灣第一座BOT 模式開發的水力發電廠-名間水力電廠名間水力發電廠 水利署首件BOT案原始内容《小檔案》名間電廠 首座BOT水力發電廠原始内容名間電廠BOT - 經濟部水利署中區水資源局

            格濟夫卡 參考資料 导航菜单51°3′40″N 34°2′21″E / 51.06111°N 34.03917°E / 51.06111; 34.03917ГезівкаПогода в селі 编辑或修订