What is the correct way to specify Tesseract options for page segmentation?

Question

What is the correct way to specify Tesseract options for page segmentation?

I've been trying to understand how to configure tesseract options for page segmentation. I attempted to use tessedit_pageseg_mode: '1', but encountered a halt in the text recognition process. If I input it as number 1, the process completes, but the mode remains default SINGLE_BLOCK.

Currently, I'm running Tesseract version: 1.0.19. I also experimented with 2.0.0-alpha.13, with identical results.

this.tesseract = Tesseract.create({
    workerPath: '../../assets/tesseract/worker.js',
    langPath: '../../assets/tesseract/trained-data',
    corePath: '../../assets/tesseract/core.js',
});
this.tesseract.recognize(this.image, {
    lang: 'eng',
    tessedit_pageseg_mode: '1'
})
.progress((p) => {
    console.log('progress', p);
    this.ocrResult = p.status + ', Progress: ' + Math.round(p.progress * 100) + '%';
})
.then((data) => {
    console.log(data.psm);
    this.ocrResult = data.text;
}).catch((err) => {
    console.error('Error occurred while recognizing text', err);
});

Your assistance would be highly appreciated. Thank you!

Update: I managed to identify the problem. My previous code was written like this

window.Tesseract = Tesseract.create({...

but was directly using Tesseract.recognize(.. which led to downloading the worker, language, and core files from the internet instead of utilizing the provided file path. The file download was somehow getting interrupted when I included the option tessedit_pageseg_mode: '1'.

Uncaught abort() at Error at Na (https://cdn.jsdelivr.net/gh/naptha/[email protected]/index.js:36:26)

Therefore, I am now attempting to utilize local files by adding file:///android_asset/www (as described in this blog post) within the assets folder. However, I am encountering an issue where it fails with this message.

Failed to load file:///android_asset/www/assets/plugins/tesseract/worker.js: Cross origin requests are only supported for protocol schemes: http, data, chrome, https.`

javascript angular typescript ionic4 tesseract.js

Answer 1

Answer №1

Consider utilizing localhost instead of a relative path, as shown here:


this.tesseract = Tesseract.create({
    workerPath: 'http://localhost/assets/tesseract/worker.js', 
    langPath: 'http://localhost/assets/tesseract/trained-data',
    corePath: 'http://localhost/assets/tesseract/core.js',
});

Source

Answer 2

Consider utilizing localhost instead of a relative path, as shown here:


this.tesseract = Tesseract.create({
    workerPath: 'http://localhost/assets/tesseract/worker.js', 
    langPath: 'http://localhost/assets/tesseract/trained-data',
    corePath: 'http://localhost/assets/tesseract/core.js',
});

Source

What is the correct way to specify Tesseract options for page segmentation?

Answer №1

Similar questions

Modify Ripple Color on Material UI < Button /> Click Event

Resetting the AngularJS ui-select component to clear all selections can be achieved without the red color error highlight that appears when the ui-select is marked

jquery click event triggers changes on several elements sharing a common class

Use AppScript to exclude the first row (which contains the column names) when filtering a table

Encountered an Error in Express.js: Unable to POST /users

Whenever I attempt to execute yarn build within next.js, an error always seems to occur

Struggling with Angular Components?

Scroll positioning determines the height of an entity

Executing React's useEffect hook twice

State does not contain the specified property - Navigating with React Router Hooks in TypeScript

Is there a way to incorporate margins into a React component using TypeScript?

Customizing the Style of Mat-Form-Field

Steps for personalizing the dataset on a PrimeNG bar graph

typescript code: transforming object values into keys in typescript

Using Azure AD for authentication: Implementing Msal authentication in a React Next.js application with TypeScript and App Router

Problem with IE off-canvas scrolling

The reCAPTCHA feature in Next.js form is returning an undefined window error, possibly due to an issue with

Creating an animated background slide presentation

Retrieving text from Node.js with the help of regular expressions

"Every time ajax is called, it will always generate