by José Abreu, Juan Rico-Juan

Abstract:

This paper presents a new fast algorithm to compute an approximation to the median between two strings of characters representing a 2D shape and its application to a new classification scheme to decrease its error rate. The median string results from the application of certain edit operations from the minimum cost edit sequence to one of the original strings. The new dataset editing scheme relaxes the criterion to delete instances proposed by the Wilson Editing Procedure. In practice, not all instances misclassified by its near neighbors are pruned. Instead, an artificial instance is added to the dataset expecting to successfully classify the instance on the future. The new artificial instance is the median from the misclassified sample and its same-class nearest neighbor. The experiments over two widely used datasets of handwritten characters show this preprocessing scheme can reduce the classification error in about 78% of trials.

View PDF

Reference:

A new editing scheme based on a Fast two-String median computation applied to OCR (José Abreu, Juan Rico-Juan), Chapter in Structural, Syntactic, and Statistical Pattern Recognition (Edwin Hancock, Richard Wilson, Terry Windeatt, Ilkay Ulusoy, Francisco Escolano, eds.), Springer Berlin / Heidelberg, volume 6218, 2010. (10.1007/978-3-642-14980-1_74)

Bibtex Entry:

@InCollection{AbreuSalas2010,
Title = {A new editing scheme based on a Fast two-String median computation applied to OCR},
Author = {José Abreu and Juan Rico-Juan},
Booktitle = {Structural, Syntactic, and Statistical Pattern Recognition},
Publisher = {Springer Berlin / Heidelberg},
Year = {2010},
Editor = {Hancock, Edwin and Wilson, Richard and Windeatt, Terry and Ulusoy, Ilkay and Escolano, Francisco},
Note = {10.1007/978-3-642-14980-1_74},
Pages = {748-756},
Series = {Lecture Notes in Computer Science},
Volume = {6218},
Abstract = {This paper presents a new fast algorithm to compute an approximation to the median between two strings of characters representing a 2D shape and its application to a new classification scheme to decrease its error rate. The median string results from the application of certain edit operations from the minimum cost edit sequence to one of the original strings. The new dataset editing scheme relaxes the criterion to delete instances proposed by the Wilson Editing Procedure. In practice, not all instances misclassified by its near neighbors are pruned. Instead, an artificial instance is added to the dataset expecting to successfully classify the instance on the future. The new artificial instance is the median from the misclassified sample and its same-class nearest neighbor. The experiments over two widely used datasets of handwritten characters show this preprocessing scheme can reduce the classification error in about 78% of trials.},
Affiliation = {Universidad de Matanzas, Cuba},
Doi = {http://dx.doi.org/10.1007/978-3-642-14980-1_74},
ISBN = {978-3-642-14979-5},
Keyword = {Computer Science},
Keywords = {OK},
Owner = {2010_LNCS_SSPR},
Url = {http://link.springer.com/chapter/10.1007%2F978-3-642-14980-1_74}
}