-
OCR to help generate search text in scan document
Posted on April 25th, 2009 No commentsDigitizing a magazine article or a printed contract is often a common needs. We could either spend hours retyping and then correcting misprints or we could convert all the required materials into digital format in several minutes using a scanner and Optical Character Recognition (OCR) software. Although scanning pages would be an expensive and time-consuming undertaking, the benefits are huge.
OCR is a process of converting different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into text or word processing files that can be easily edited and stored.OCR is a field of research in pattern recognition, artificial intelligence and machine vision. It has been used to enter data automatically into a computer for dissemination and processing. This technology has enabled such materials to be stored using much less storage space than the hard copy materials. OCR technology has made a huge impact on the way information is stored, shared and edited. Prior to Optical Character Recognition, if someone wanted to turn a book into a word processing file, each page would have to be typed word for word.
Read the rest of this entry » -
Page Description Languages – PDL
Posted on March 21st, 2009 No commentsEarlier on in history, printers viewed text as text, and graphics as graphics. A page was created from a combination of separate entities. The page images can be created by having raw text like, the word “Hello World”, combined with Escape codes and possibly embedded graphic pictures. Different programs have different file formats.
To overcome this problem, Page Description Language (PDL) is developed. PDL is a language to describe the graphical representation of ink and toner on sheets of paper (or other output devices, like monitors, photo typesetters, etc) in a higher level than an actual output bitmap.Instead of sending raw text to the printer, a PDL output file is created and sent to the printer. Basically PDL instructs the printing device exactly how to handle text, graphics, and pictures in reproducing the page layout created by a computer user. The ‘page’ could be of any size, color, or resolution the printing device can handle.
By having PDL, an application programmer could concentrate on making his program to output result in a standard PDL – with description of his printable page. The printing device developers could focus on making their devices with related PDL literate.
Read the rest of this entry »

