`
xitong
  • 浏览: 6188711 次
文章分类
社区版块
存档分类
最新评论

字符串编辑距离(Edit Distance)分析和源代码

 
阅读更多

1. 字符串编辑距离定义

定义:字符串编辑距离指的是字符串s1变为s2最少需要多少次替换,插入,删除操作。

Ed(s1,s2) = minimum # of operations (insertion, deletion,substitution) to change s1 to s2
Example:

s1: Tom Hanks

s2: Ton Hank

ed(s1, s2) = 2


2. 字符串编辑距离求解

动态规划求解,时间复杂度O(mn)

A = a1 a2...am
B = b1 b2...bn
Di,j = edit distance of a1 a2 ... ai and b1 b2... bj.

Di,0 = i, 0<=i <= m

D0,j = j, 0 <= j<= n

Di,j = min{Di-1, j + 1,

Di, j-1 + 1,

Di-1, j-1 + ti, j}, 1 <= i <=m, 1 <=j <= n

where ti, j = 0 if ai = bj and ti, j = 1 if ai != bj.

2. 源程序

int getMin(int a, int b, int c)
{
	int min1 = (a<=b)?a:b;
	int min2 =(min1<=c)?min1:c;
	return min2;
}

int getED(const char*str1, const char*str2) {
	int m = strlen(str1);
	int n = strlen(str2);
	int **dis = new int*[m+1];
	for(int i=0; i<m+1; i++) {
		dis[i] = new int[n+1];
	}
	for(int i=0; i<=m; i++) {
		dis[i][0]=i;
	}
	for(int j=0; j<=n; j++) {
		dis[0][j]=j;
	}
	for(int i=1; i<=m; i++) {
		for(int j=1; j<=n; j++) {	
			int tij=(str1[i-1]==str2[j-1])?0:1;		
			dis[i][j] = getMin(dis[i-1][j]+1, dis[i][j-1]+1, dis[i-1][j-1]+tij);
		}
	}
	return dis[m][n];
}



分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics