[SOLVED] Tesseract OCR?
location: linuxquestions.com - date: August 19, 2015
I've installed Tesseract OCR including Tesseract-gui from the Sourceforge tar packet.
I also installed tesseract3, Tesseract EQU and Tesseract OSD from the Synaptic repositories. (Did I do the right thing?)
After installation I rebooted and ran the gui which opened flawlessly, unfortunately there are no instructions or tutorial,
and so it's fairly guesswork and trial by error on my part. I watched an old tutorial on YouTube which said the file to recognise
has to be a .tiff file. I followed the instructions carefully and all appeared to go well as the new text file popped up on my desktop as planned.
The problem is there was no script ... it was blank.
I attempted the procedure differently several times except from the Terminal (I don't feel confident using the terminal yet) and
I still couldn't get it to work.
Perhaps I have a necessary file missing or I've confused the program by adding a segment that isn't necessary, I don't know!
I would love to get this program working as it would
Use of tesseract ocr command
location: linuxquestions.com - date: December 16, 2013
i am using tesseract 3.01 command for conversion of my screenshot to text file but its not making the text format as the screenshot image displays
i am taking screenshot with scrot (scrot version 0.8) like
[SOLVED] tesseract language problem
location: linuxquestions.com - date: May 28, 2014
tessearct-3.02: fresh install on Slackware-14.0 with english data. leptonica is there all fine. Returns this error
Building portable Tesseract OCR libraries in Linux
location: linuxexchange.com - date: June 4, 2015
Is there a way to build and use Tesseract library and corresponding Leptonica library (because Tesseract depends on Leptonica) as it can be done in Windows?
I compiled these libraries according their instructions, but it seems that libtesseract.so.3.0.2 includes a fixed path to Leptonica shared library:
$ ldd libtesseract.so.3.0.2
linux-vdso.so.1 => (0x00007fffbc5ff000)
**liblept.so.4 => /usr/local/lib/liblept.so.4 (0x00007fa8400fd000)**
libpng12.so.0 => /usr/lib64/libpng12.so.0 (0x00007fa83fcae000)
libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x00007fa83fa5e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa83f5e4000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fa83f2de000)
libm.so.6 => /lib64/libm.so.6 (0x00007fa83f059000)
libc.so.6 => /lib64/libc.so.6 (0x00007fa83ecc5000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa83eaaf000)
It results in the OSError while running an application on a
image processing to improve tesseract OCR accuracy
location: linuxexchange.com - date: January 1, 1970
I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.
What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.
Any general tips for someone who is a novice at image processing?
[SOLVED] tesseract stopped. error withut explanation.
location: linuxquestions.com - date: November 14, 2013
I like tesseract. I downloaded version 3.02, and started using it right away with excellent results. Then it stopped working, with the following error:
tesseract ltr-996h-p1.png ltr-996h-p1
Tesseract Open Source OCR Engine v3.02 with Leptonica
Cannot open input file: ltr-996h-p1.png
I was using the same programs to scan and prepare the png, or tif files (gimp and xsane). No description of what the error means. the help feature doesn't help, and man tesseract is no help either - maybe because I don't understand their termnology.
If you know where I should go from here, pleae help me out. Or maybe you now where there is some really thourough and understandable documentation on tesseract.
How to create a TIFF file that's readable by tesseract OCR?
- date: March 25, 2012
I want to let tesseract ORC run over an image file, to scan the content.
The problem seems to be that tesseract not only requires TIFF, but it also requires the tiff file to be in a certain format.
With just a normal tiff file, I get:
:~/Desktop# tesseract crap.tif crap.txt
Tesseract Open Source OCR Engine
check_legal_image_size:Error:Only 1,2,4,5,6,8 bpp are supported:32
So far I have managed to find an antidote.
It consists of using GIMP, going to Image > Mode > Indexes, and setting "Generate Optimum Palette", "maximum number of colors" to 256.
then I have to do one more trick before "Save As".
Going to Layer > Transparency > Remove Alpha Channel,
which will remove transparency, because TIF images cannot have transparency.
Now the problem is my input image comes from C#, and is preprocessed with AFORGE.NET image analysis filters.
I have also found a .NET port of LibTiff, and an example of how to write an image with color palette here:
OCR for linux w/ ABBYY engine? Tesseract? Anybody?
location: ubuntuforums.com - date: February 3, 2007
I saw that ABBYY recently released its latest 8.0 OCR engine for linux (SDK), and though I'm all for an open-source alternative, that's so far off, there needs to be something, and Finereader is incredible. But are there any plans to use this to make a professional-level general user app? Or what about HP's Tesseract for that matter? I need high-quality scanning of articles, complete with page layout and picture/text/table recognition. Right now working on ABBYY 8 in WINE 0.9.30, but there are plenty of buggy problems with it. Often I still boot up the almost-useless Windows partition just for that purpose (another would be printing, though my discovery of gtklp recently nixed 95% of that) Where's the killer OCR app for linux?!
Apologies if I am way off base; don't even really know how an SDK works, though my understanding is that it is the command-line to the gui (in other words, the real stuff).
OCR with ocrfeeder and tesseract (or tesseractocr)
location: ubuntuforums.com - date: June 30, 2013
I'm running Kubuntu 13.04 (raring) on a Lenovo laptop. I have a scan of a page of typed text, in both pdf and tiff formats. I'm trying to use OCR (optical character recognition) to turn the image of that scan into the actual text.
From what I read, the best tool for this is ocrfeeder, together with tesseract. It appears that the correct procedure is to go into ocrfeeder, then select File / Import PDF.
Doing that, I get a screen that displays the scanned image, though for some reason the file listing to the left shows it as a jpg, not a pdf. I now go to Document/Recognize Document. What shows up as the text, when the recognition is done, is just a single capital A. It appears that tesseract is being used for the recognition.
What's particularly frustrating is that when I first started fiddling with this, I actually did get the text of the document to display, but I didn't know how to save it at that point. Now I can't get the text to display at all. The documentation on
[SOLVED] Ubuntu VS. Xubuntu
location: ubuntuforums.com - date: December 3, 2008
I understand that xubuntu is bassically a lighter version of ubuntu.
But I was wondering when it comes down to it what will I be missing?
What type of things will be different as far as an average user is concerned. Any different interaction?
So far I love Ubuntu 8.10 and have everything working the way I want it to.
This is really just out of curiosity.
Page: 1 2 3 4 5 6 7 8 9 10