rolisz's site

Character segmentation overfitting

I'm doing a project about doing OCR on receipts and today, while trying to do character seg­men­ta­tion, I made a pretty stupid mistake that led to my model over­fit­ting almost perfectly pretty neatly (in some cases I got 100% correct clas­si­fi­ca­tion accuracy).

I already had my own data about letters (with the help of my parents, I labeled 7000 letters, with their bounding boxes in about 25 receipts) and my classifier (a simple linear SVM) on individual letters did pretty good: between 90-94% accuracy. For something obtained with almost 0 fiddling, it's pretty good, and good enough for my purposes. Also it's pretty much impossible to tell apart 0 continue.