I've been trying to understand how to configure tesseract options for page segmentation. I attempted to use tessedit_pageseg_mode: '1'
, but encountered a halt in the text recognition process. If I input it as number 1, the process completes, but the mode remains default SINGLE_BLOCK
.
Currently, I'm running Tesseract version: 1.0.19. I also experimented with 2.0.0-alpha.13, with identical results.
this.tesseract = Tesseract.create({
workerPath: '../../assets/tesseract/worker.js',
langPath: '../../assets/tesseract/trained-data',
corePath: '../../assets/tesseract/core.js',
});
this.tesseract.recognize(this.image, {
lang: 'eng',
tessedit_pageseg_mode: '1'
})
.progress((p) => {
console.log('progress', p);
this.ocrResult = p.status + ', Progress: ' + Math.round(p.progress * 100) + '%';
})
.then((data) => {
console.log(data.psm);
this.ocrResult = data.text;
}).catch((err) => {
console.error('Error occurred while recognizing text', err);
});
Your assistance would be highly appreciated. Thank you!
Update: I managed to identify the problem. My previous code was written like this
window.Tesseract = Tesseract.create({...
but was directly using Tesseract.recognize(..
which led to downloading the worker, language, and core files from the internet instead of utilizing the provided file path. The file download was somehow getting interrupted when I included the option tessedit_pageseg_mode: '1'
.
Uncaught abort() at Error at Na (https://cdn.jsdelivr.net/gh/naptha/[email protected]/index.js:36:26)
Therefore, I am now attempting to utilize local files by adding file:///android_asset/www
(as described in this blog post) within the assets folder. However, I am encountering an issue where it fails with this message.
Failed to load file:///android_asset/www/assets/plugins/tesseract/worker.js: Cross origin requests are only supported for protocol schemes: http, data, chrome, https.`