WO2007082536A1 - Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech - Google Patents

Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech Download PDF

Info

Publication number
WO2007082536A1
WO2007082536A1 PCT/DK2007/000020 DK2007000020W WO2007082536A1 WO 2007082536 A1 WO2007082536 A1 WO 2007082536A1 DK 2007000020 W DK2007000020 W DK 2007000020W WO 2007082536 A1 WO2007082536 A1 WO 2007082536A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
mobile unit
image
unit according
database
Prior art date
Application number
PCT/DK2007/000020
Other languages
French (fr)
Inventor
Lars Ballieu Christensen
Flemming Ast
John Kristensen
Original Assignee
Motto S.A.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from LU91213A external-priority patent/LU91213B1/en
Application filed by Motto S.A. filed Critical Motto S.A.
Priority to EP07700159A priority Critical patent/EP1979858A1/en
Publication of WO2007082536A1 publication Critical patent/WO2007082536A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/12Detection or correction of errors, e.g. by rescanning the pattern
    • G06V30/127Detection or correction of errors, e.g. by rescanning the pattern with the intervention of an operator
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1463Orientation detection or correction, e.g. rotation of multiples of 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/268Lexical context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech
  • the present invention relates to a mobile unit with a computer and a camera, the computer being configured to receive captured images as digital data from the camera and to extract text in the captured images by optical character recognition (OCR) routine and to convert the text from an image, for example for subsequent conversion into comprehensible speech.
  • OCR optical character recognition
  • Dyslexia can have different degrees of missing ability to read and write.
  • dyslexic persons may be able to read and even write, though having difficulty with correct spelling.
  • Modern aids, such as spell check in computers have helped many dyslexic people to live without severe difficulties.
  • the result may be an inability to move around and travel without the assistance of a person who can read. This lack of ability to read and write, often creates frustration and reduced self esteem with an aggressive or unsure behaviour in daily life.
  • OCR optical character recognition
  • a mobile unit with a computer and a camera
  • the computer being configured to receive captured images as digital data from the camera and to extract text in the captured images by an optical character recognition (OCR) routine and to convert the text from an image format into a text format.
  • OCR optical character recognition
  • the mobile unit further comprises a text database with text words, and the computer is configured to compare the converted text with words in the text database and only to accept the converted text as resembling the imaged text in case of agreement with words in the database.
  • the text is translated into synthetic speech using a text-to-speech engine.
  • the invention is preferably implemented in a mobile telephone having a camera and a generator of synthetic speech.
  • the invention is of more general character and can be implemented in other mobile units, such as a PDA without mobile telephone.
  • the advantage of the mobile unit according to the invention is the additional routine of checking potential text extracted from images against words in a database.
  • the term words also includes one letter words or parts of longer word, such as syllables.
  • text has been extracted from an image, it may be that some of the characters in the text have been recognised erroneously, rendering the final text is meaningless. This may happen in case of poor focus, in case of partly obscured text, in case that the text is in a different language, or if the extraction routine misinterprets.the image and considers part of the image as text even though there is no text in this part of the image or at least no recognisable text.
  • the converted text is accepted as resembling the text in the image, this may be indicated by the mobile unit.
  • the camera according to the invention is implemented in a mobile telephone.
  • the mobile unit according to the invention may comprise a synthetic speech generator for submitting the extracted text to the user by a synthetic voice.
  • the synthetic speech can be listened to through earphones, which already are widely used in connection with mobile phones.
  • the earphones may be wireless, for example by utilising Bluetooth technology.
  • the dyslexic may use an important aid without the risk of being revealed as being disabled.
  • the computer may comprise routines that check whether the converted text as a whole makes sense, for example, whether the grammar is correct and whether the words are related to each other.
  • names of products or companies may imply words that are not found in the database, but which the user nevertheless is familiar with.
  • the mobile unit may be configured to request the user to indicate whether a phrase despite a missing counterpart in the database shall be accepted nevertheless.
  • the mobile unit according to the invention may be configured in case of missing acceptance to amend the initially converted text slightly in order to make it fit to existing words, letters, sequences of words, combined words, and/or parts of sentences in the database.
  • the initially converted text and the amended converted text may be presented to the user as options among which the user may choose the apparently most correct version.
  • the mobile unit may present several possibilities and request the user to indicate, whether there is a version, which seems to resemble the text in the image. The selection is subsequently stored in the database.
  • the mobile unit may be configured to base such proposals or to base the amendment or acceptance of the text on earlier choices by the user.
  • the photographed text may contain special words such as technical terms. This may induce problems, if these terms are specialised in certain fields, where the meaning of the terms is different from the normal meaning of the word. For example, a word as "leg” means a different thing for medical treatment than for mechanical fittings.
  • the database may contain special kind of technical dictionaries with such kind of special terms. The user may indicate, whether the special dictionaries shall be used during the comparison with the converted text.
  • the unit also comprises a database with traffic sign text, and where the mobile unit is configured upon specific request from a user of the mobile unit to compare the extracted text from a captured image with the traffic sign text in the database.
  • the mobile unit is configured upon conversion of the extracted text to request an action from a user of the mobile unit for storing the converted text in a database or data memory in text format.
  • the mobile unit is configured in case of missing accept of the converted text to request an indication from a user as to whether the converted text is to be stored in the database or data memory.
  • the optics are typically of a quality with image distortions near the image edge. This implies that the photographed text may be curved, making the recognition by the software more difficult. Therefore, in a further embodiment, the camera is configured with an image distortion correction routine to correct the image in such a way that distortions are reduced, especially, curved parts of the images are straightened out.
  • the camera may be configured to perform the distortion correction with an algorithm performed on every taken image.
  • the algorithm may be constructed such that the correction is performed in dependence of the performance of the optics. If the optics are known, the algorithm can be adjusted to perform the correction in a specific way related to the specific type of optics.
  • the image resolution is not very high, which is due to the number of pixels in the CCD chip of the camera and partly due to the limited amount of memory available which causes the software of the camera to store images in lower resolution format. If necessary, the number of pixels may artificially be increased in a software routine.
  • the camera according to the invention is configured, by suitable software routines, for example by using Fourier analysis and/or high pass filtering, to compensate for low resolution in the image due to defocusing. If application of the corresponding software routine does not result in a satisfactory image, a new image has to be taken.
  • the invention may be based on a Windows® CE platform.
  • This widely used platform for handheld units for example mobile telephones or PDA (personal digital assistant)
  • PDA personal digital assistant
  • Windows® programs on stationary computers which in turn is widely used, as well.
  • an implementation of the OCR program in a mobile unit according to the invention is s further help for dyslexic people in as much as they are not forced to learn a new platform, which is a much more tedious task for dyslexic people than for others.
  • images may contain text passages that are partly obscured by objects such as dirt or rain on text boards or on traffic signs containing the text.
  • the camera according to the invention includes a routine that corrects image obstructions due to such kind of objects.
  • Images may be captured, where the text is not truly horizontal but deviates by a certain angle from the horizontal.
  • Commercially available software programs are configured to recognise letters nevertheless. This is also implemented in the invention. However, when letters in the image are deviating by angles of more than 40 degrees from the horizontal, proper letter recognition often fails, because in this case, the software is configured to assume a vertical text instead.
  • the camera may be programmed to rotate the entire captured image successively by a certain angle, for example 30 degrees or 45 degrees, if a proper extraction fails. After each rotation about this predetermined angle, a new attempt for extraction is performed, until the image has been rotated by 360 degrees. Alternatively, the image is rotated in one, two or three 90 degrees steps.
  • the mobile unit according to the invention is combined with route planner and/or navigation software, for example a commercial product
  • the implemented program comprises routines that use the recognised text in a captured image, such as a text indicative for a location, for example a sign of the road and a house number, in combination with the route planner and/or navigation software.
  • the dyslexic person may image a road sign and a building number and as a result receive a synthetic voice message explaining the way from the actual location to a certain other location, for example the home of the person.
  • the route planner in a mobile unit according to the invention may be configured to show the location and the route on a map, or even to explain the route by means of buildings which the dyslexic finds on the route.
  • the mobile unit according to the invention may comprise a GPS (global positioning system) such that the dyslexic person can be guided to the desired location. For example, when visiting a location, the dyslexic may have photographed - for example from a separate tourist brochure - a number of names of locations to visit on a tour.
  • GPS Global Positining System
  • the text recognition routine stores the location names in a memory, after which on request, the location names are matched by a built-in route planner in the mobile unit such that, automatically, a route is planned and by display on a map or by synthetic speech presented to the dyslexic.
  • the GPS system in the mobile unit keeps track of the actual location of the dyslexic and the route planner guides the dyslexic along the planned route and back to the point of origin for the tour or to another final point of interest.
  • the mobile unit according to the invention may in the database comprise a number of dictionaries with different languages.
  • One additional function may be the translation of imaged text, for example as disclosed in US patent application No. 2001/0056342.
  • the mobile unit has a high advantage for dyslexic people, it may also be of interest for non-disabled people.
  • the mobile unit may comprise a route planner but no synthetic speech. This would still be of high interest in certain cases as illustrated in the following.
  • a photographing of a street sign, for example with Chinese characters may be imaged and extracted by using Chinese character setup and a Chinese dictionary.
  • the location may be indicated in the display and a route proposed.
  • a synthetic speech generator may be an additional convenient, but not absolutely necessary feature. The advantage would be that a person not familiar with Chinese characters would still be able to find his way through a town in China.
  • the mobile unit according to the invention comprises a microphone to record voice messages from the user.
  • the mobile unit may comprise a routine for phonetic translation. Words and Phrases are stored as audio files in a database and are translated into other languages. This means for the person using the apparatus according to the invention that the person may speak into the microphone and get this speech translated into another language, either as an audio datafile with the message spoken in another language or as a text file. The phonetic translation may be performed simultaneously with the speaking of the person.
  • the apparatus according to the invention can be used to simplify daily arrangements.
  • a system may be arranged, where brochures and other information may be ordered by sending an SMS (Short Message System) with a certain code from a mobile telephone to a preselected telephone number.
  • SMS Short Message System
  • this may be simplified by imaging the code, for example from an advertisement, and sending the converted code as chracters/digits by SMS to the preselected telephone number.
  • FIG. 1 is a flow diagram describing the overall functioning of the mobile unit in a concrete embodiment according to the invention
  • FIG. 2 is an illustration of compensation for blurred images
  • FIG. 3 is an illustration of image rotation
  • FIG. 4 is an illustration of the cleanup effect
  • FIG. 5 is an illustration of correction of curvature
  • FIG. 6 is an illustration of angular compensation.
  • the mobile unit according to the invention may be configured to have several modes.
  • One of the modes is the image capture, text conversion mode and speech mode.
  • ATR/TTS mode where the abbreviations refer to automated text recognition (ATR) and test-to-speech process (TTS process).
  • FIG. 1 is a flow diagram illustrating the overall functioning of the mobile unit according to the invention in concrete embodiment of the invention within the ATR/TTS mode.
  • the ATR/TTS process is divided into three major phases:
  • Steps 1A, 1B, 1C, 1 D as illustrated in FIG. 2 belong to the Launch Phase.
  • the ATR/TTS process is launched by one of four potential user events:
  • step 1A The user has clicked on the camera release whilst the device is in ATR/TTS -mode (step 1A);
  • step 1 B the user has clicked the scalable on-screen release whilst the device is in ATR/TTS-mode
  • step 1C the user has activated the on-screen release using a voice command whilst the device is in ATR/TTS-mode (step 1C); or
  • step 1D the user has open an image from the device store using the image browser.
  • the ATR/TTS process has successfully recognised text within the image, has produced a speech file and has played the speech file (step 18); or - The ATR/TTS process has failed to recognise text within the image and has played a pre-recorded error message back to the user (step 19).
  • the user interface used to control the ATR/TTS application is menu-driven.
  • the menu may be based on images and/or sound indications such that each function in the menu has its own sound, for example a voice message reading the name of function.
  • the user interface furthermore, may comprise icon-based menu items that can be activated by clicking on the display or by using a corresponding set of voice-commands entered into the mobile unit by the user through a microphone.
  • the icon-based menu interface is scalable and may be resized to accommodate user preferences.
  • the Recognition Phase comprises steps 2-15 and step 19.
  • Steps 2 -> 3 The ATR/TTS application will immediately attempt to recognise text within the image using the OCR module (Optical Character Recognition - step 2). On success (step 3), the ATR/TTS process will resume at the Clean-up 16 and Text-to-Speech Phase 17. Otherwise (step 3), the ATR/TTS process will continue at step 4 to improve the image.
  • Step 4 ⁇ 5 ⁇ 6 Once the initial recognition has failed, the ATR/TTS process will attempt to improve the image quality using a variety of image manipulation techniques and technologies.
  • the image is processed by the simulated autofocus module (step 4) resulting in a clearer image, which also is illustrated in FIG. 2.
  • This software routine may, for instance, use high pass filtering and Fourier transformation in order to make optical edges sharper.
  • the number of pixels in the image is do not fulfil the requirements for handling by the OCR module, the number of pixels is subsequently increased artificially in order to match the requirements of the OCR module (step 5 and step 6)
  • Step 7 -) 8 ⁇ 9 ⁇ 10 Once the image quality has been improved, the ATR/TTS process will make another attempt to recognise text within the image (step 7). On success (step 8), the ATR/TTS process will resume at the Clean-up 16 and Text-to- Speech Phase 17. Otherwise (step 8), the ATR/TTS process will continue through a succession of 90° image rotations (step 9), each time attempting to recognise text within the image (step 7) until the image has been rotated by a total of 90°, 180° and 270° (step 10). This is illustrated in more detail in FIG. 3, where the original image not correctly oriented and the first rotation results in an image that is upside down.
  • Step 11 If the simulated autofocus and increase in the image resolution fail to render an image that can be successfully processed, the ATR/TTS process will attempt to increase the contrast between the text and the background to the extend of dividing the image into two parts using scaled binary threshold value: (1) Text; and (2) everything else. This is illustrated further in FIG. 4, where spots in the image are removed to make the image more clear and increase the contrast. Furthermore, the ATR/TTS process will attempt to compensate for (a) any optical curving, which is illustrated in FIG. 5, and/or (b) any other optical distortions caused as a result of the image not being captured at an angle of 90°, which is illustrated in FIG. 6.
  • Step 12 ⁇ 13 -> 14 ⁇ 15 Once the image quality has been improved, the ATR/TTS process will make another attempt to recognise text within the image (step 12). On success (step 13), the ATR/TTS process will resume at the Clean-up and Text-to-Speech Phase. Otherwise (step 13), the ATR/TTS process will continue through a succession of 90° image rotations (step 14), each time attempting to recognise text within the image (step 12) until the image has been rotated by a total of 90°, 180° and 270° (step 15).
  • Step 19 If this does not result in successful recognition, the ATR/TTS process is terminated with an error message (step 19).
  • the Clean-up and Text-To-Speech Phase comprises step 16-18.
  • Step 16 Once text has been recognised in one of the recognition attempts, the text is passed on for clean-up (step 16).
  • the clean-up task will remove non-printable characters and other characters not part of the current language setting; furthermore, the text will be matched against a database with common words and parts of words, and names of locations to increase the quality of the text.
  • step 16 the text is converted from the image format into a text format, which can be used in other applications, for example as shown in Step 17, where the cleaned up text is subsequently passed on to the text-to-speech engine and the resulting synthetic speech is stored in an audio file (step 17).
  • Step 18 Finally, the audio file is played back and control passed back to the user (step 18).
  • the text file may be used in other applications as well, for example for translation into other languages or for interaction with a route planner in order to display locations on a map, to show routes on a map, or by synthetic voice to guide a person around in the environment.

Abstract

A mobile unit with a computer and a camera, the computer being configured to receive captured images as digital data from the camera and to extract text in the captured images by an optical character recognition (OCR) routine and to convert the text from an image format into a text format. The mobile unit further comprises a text database with text words, and the computer is configured to compare the converted text with words in the text database and only to accept the converted text as resembling the imaged text in case of agreement with words in the database. The converted text may than be transformed into synthetic speech.

Description

Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech
FIELD OF THE INVENTION
The present invention relates to a mobile unit with a computer and a camera, the computer being configured to receive captured images as digital data from the camera and to extract text in the captured images by optical character recognition (OCR) routine and to convert the text from an image, for example for subsequent conversion into comprehensible speech.
BACKGROUND OF THE INVENTION
Dyslexia can have different degrees of missing ability to read and write. In the mild form, dyslexic persons may be able to read and even write, though having difficulty with correct spelling. Modern aids, such as spell check in computers, have helped many dyslexic people to live without severe difficulties. However, in more pronounced dyslexic cases, where the people are not able to read or understand written text in public space, the result may be an inability to move around and travel without the assistance of a person who can read. This lack of ability to read and write, often creates frustration and reduced self esteem with an aggressive or unsure behaviour in daily life.
Wide acceptance of the use of mobile telephones not only for speaking but also in connection with text messages has been achieved among dyslexic people, because the word processor has a built-in spelling aid. In patent application US 2001/0056342 by Piehn et al., a camera is disclosed that is able from a taken image to transform the text in the image into synthetic speech. The camera does also have software routines that translate text from one language to another.
Often, images taken with text are difficult to transform correctly into text files that can be converted into synthetic speech, because poor focus, image distortion or objects obscuring the text reduce the chances for correct character recognition. As a result, the synthetic speech may differ from the text in the image. Thus, there is still a need for improvements in connection with reading aids for dyslexic people. Especially, there is a need for the technical improvement for text recognition from a captured image into text format.
DESCRIPTION / SUMMARY OF THE INVENTION
It is therefore the object of the invention to provide a mobile unit with a camera, sufficient image optimisation capabilities and optical character recognition (OCR), where the probability for a correct recognition of the text is increased, for example in order to increase the correctness of a subsequent speech generated from the image.
This object is achieved with a mobile unit with a computer and a camera, the computer being configured to receive captured images as digital data from the camera and to extract text in the captured images by an optical character recognition (OCR) routine and to convert the text from an image format into a text format. The mobile unit further comprises a text database with text words, and the computer is configured to compare the converted text with words in the text database and only to accept the converted text as resembling the imaged text in case of agreement with words in the database. As an option, the text is translated into synthetic speech using a text-to-speech engine.
The invention is preferably implemented in a mobile telephone having a camera and a generator of synthetic speech. However, the invention is of more general character and can be implemented in other mobile units, such as a PDA without mobile telephone.
The advantage of the mobile unit according to the invention is the additional routine of checking potential text extracted from images against words in a database. The term words also includes one letter words or parts of longer word, such as syllables. In the case that text has been extracted from an image, it may be that some of the characters in the text have been recognised erroneously, rendering the final text is meaningless. This may happen in case of poor focus, in case of partly obscured text, in case that the text is in a different language, or if the extraction routine misinterprets.the image and considers part of the image as text even though there is no text in this part of the image or at least no recognisable text. In case that the converted text is accepted as resembling the text in the image, this may be indicated by the mobile unit.
Thus, with the invention, the possibility that a user is confronted with meaningless text is drastically reduced.
In modern life in the industrialised world, it is customary to possess and use a mobile telephone on a daily basis. Modern mobile telephones comprise built-in cameras with a rather high optical resolution and zooming properties. Therefore, in a preferred embodiment of the invention, the camera according to the invention is implemented in a mobile telephone. This implies that a dyslexic person does not need to carry additional equipment apart from the mobile phone, which is carried along most of the time anyway. Also, use of a mobile phone for photographing text or signs would not be recognised as something remarkable. In addition, the mobile unit according to the invention may comprise a synthetic speech generator for submitting the extracted text to the user by a synthetic voice. The synthetic speech can be listened to through earphones, which already are widely used in connection with mobile phones. For convenience, the earphones may be wireless, for example by utilising Bluetooth technology. Thus, the dyslexic may use an important aid without the risk of being revealed as being disabled.
In addition, the computer may comprise routines that check whether the converted text as a whole makes sense, for example, whether the grammar is correct and whether the words are related to each other.
In certain cases, names of products or companies may imply words that are not found in the database, but which the user nevertheless is familiar with. In cases of missing acceptance, the mobile unit may be configured to request the user to indicate whether a phrase despite a missing counterpart in the database shall be accepted nevertheless.
Furthermore, the mobile unit according to the invention may be configured in case of missing acceptance to amend the initially converted text slightly in order to make it fit to existing words, letters, sequences of words, combined words, and/or parts of sentences in the database. The initially converted text and the amended converted text may be presented to the user as options among which the user may choose the apparently most correct version. For example, the mobile unit may present several possibilities and request the user to indicate, whether there is a version, which seems to resemble the text in the image. The selection is subsequently stored in the database.
In addition, the mobile unit may be configured to base such proposals or to base the amendment or acceptance of the text on earlier choices by the user.
In certain situations, the photographed text may contain special words such as technical terms. This may induce problems, if these terms are specialised in certain fields, where the meaning of the terms is different from the normal meaning of the word. For example, a word as "leg" means a different thing for medical treatment than for mechanical fittings. In this case, in a further embodiment of the invention, the database may contain special kind of technical dictionaries with such kind of special terms. The user may indicate, whether the special dictionaries shall be used during the comparison with the converted text.
Special words or names may among others belong to traffic signs, which is one of the important entities for the dyslexic to read. Therefore, in a further embodiment, the unit also comprises a database with traffic sign text, and where the mobile unit is configured upon specific request from a user of the mobile unit to compare the extracted text from a captured image with the traffic sign text in the database.
Once the extracted text is converted, the user may want to store it for later use, for example to send it as an SMS or IP message to another user. Therefore, the mobile unit is configured upon conversion of the extracted text to request an action from a user of the mobile unit for storing the converted text in a database or data memory in text format.
If the extraction of text has worked in principle by using the OCR routine, but the converted text does not make sense according to the corresponding control routines in the computer, the user may wish to store the result in order to control the meaning at a later stage, possibly with help from others. Therefore, in a further embodiment, the mobile unit is configured in case of missing accept of the converted text to request an indication from a user as to whether the converted text is to be stored in the database or data memory.
Typically, in cameras, the optics are typically of a quality with image distortions near the image edge. This implies that the photographed text may be curved, making the recognition by the software more difficult. Therefore, in a further embodiment, the camera is configured with an image distortion correction routine to correct the image in such a way that distortions are reduced, especially, curved parts of the images are straightened out. For example, the camera may be configured to perform the distortion correction with an algorithm performed on every taken image.
The algorithm may be constructed such that the correction is performed in dependence of the performance of the optics. If the optics are known, the algorithm can be adjusted to perform the correction in a specific way related to the specific type of optics.
Another difficulty in mass produced cheap cameras, such as for mobile telephones, the image resolution is not very high, which is due to the number of pixels in the CCD chip of the camera and partly due to the limited amount of memory available which causes the software of the camera to store images in lower resolution format. If necessary, the number of pixels may artificially be increased in a software routine.
In order to increase the applicability of the apparatus according of the invention in cases where images of text are taken slightly out of focus, in a further embodiment, the camera according to the invention is configured, by suitable software routines, for example by using Fourier analysis and/or high pass filtering, to compensate for low resolution in the image due to defocusing. If application of the corresponding software routine does not result in a satisfactory image, a new image has to be taken.
In order for the camera according to the invention and the text to speech translation program according to the invention to be as user friendly as possible, the invention may be based on a Windows® CE platform. This widely used platform for handheld units, for example mobile telephones or PDA (personal digital assistant), is in structure very similar to Windows® programs on stationary computers, which in turn is widely used, as well. As dyslexic people often are familiar with the Windows® programs on stationary computers, an implementation of the OCR program in a mobile unit according to the invention is s further help for dyslexic people in as much as they are not forced to learn a new platform, which is a much more tedious task for dyslexic people than for others.
In some cases, images may contain text passages that are partly obscured by objects such as dirt or rain on text boards or on traffic signs containing the text. In a further embodiment, the camera according to the invention includes a routine that corrects image obstructions due to such kind of objects.
Sometimes text is not horizontally written on a text board or likewise but vertically. This is especially true for many advertisements, for example names of shops and hotels on building walls. Commercially available OCR programs are able to recognise this and to use it. This is also implemented in the invention. This however requires that the letters are correctly oriented and placed one letter below the other. In case that a text has a large angle, for example 90 degrees, relatively to the camera, commercially available OCR routine are typically programmed to try to recognise the letters as a vertical text, where one letter is placed below the other. However, in this case, the recognition will fail. According to the invention, the camera is programmed to rotate the entire captured image successively to 90 degrees, 180 degrees and also 270 degrees if a proper reading fails. Images may be captured, where the text is not truly horizontal but deviates by a certain angle from the horizontal. Commercially available software programs are configured to recognise letters nevertheless. This is also implemented in the invention. However, when letters in the image are deviating by angles of more than 40 degrees from the horizontal, proper letter recognition often fails, because in this case, the software is configured to assume a vertical text instead. In order to solve this problem, the camera may be programmed to rotate the entire captured image successively by a certain angle, for example 30 degrees or 45 degrees, if a proper extraction fails. After each rotation about this predetermined angle, a new attempt for extraction is performed, until the image has been rotated by 360 degrees. Alternatively, the image is rotated in one, two or three 90 degrees steps.
In a further embodiment, the mobile unit according to the invention is combined with route planner and/or navigation software, for example a commercial product
TomTom®. The implemented program comprises routines that use the recognised text in a captured image, such as a text indicative for a location, for example a sign of the road and a house number, in combination with the route planner and/or navigation software. For example, the dyslexic person may image a road sign and a building number and as a result receive a synthetic voice message explaining the way from the actual location to a certain other location, for example the home of the person. Alternatively or in addition, the route planner in a mobile unit according to the invention may be configured to show the location and the route on a map, or even to explain the route by means of buildings which the dyslexic finds on the route.
In addition, if the user images a name of a location, for example a street name, a GPS (Global Positining System) signal receiver and location routine may be used for finding possible location names at the actual GPS location, for example in a digital name database or in a digital map. By comparison of the imaged name converted in to text with the possible location names, the correct location name may be found quickly. In order to advantageously follow the dyslexic through the town, the mobile unit according to the invention may comprise a GPS (global positioning system) such that the dyslexic person can be guided to the desired location. For example, when visiting a location, the dyslexic may have photographed - for example from a separate tourist brochure - a number of names of locations to visit on a tour. The text recognition routine stores the location names in a memory, after which on request, the location names are matched by a built-in route planner in the mobile unit such that, automatically, a route is planned and by display on a map or by synthetic speech presented to the dyslexic. The GPS system in the mobile unit keeps track of the actual location of the dyslexic and the route planner guides the dyslexic along the planned route and back to the point of origin for the tour or to another final point of interest.
In a further embodiment, the mobile unit according to the invention may in the database comprise a number of dictionaries with different languages. One additional function may be the translation of imaged text, for example as disclosed in US patent application No. 2001/0056342.
Though the mobile unit has a high advantage for dyslexic people, it may also be of interest for non-disabled people. For example, the mobile unit may comprise a route planner but no synthetic speech. This would still be of high interest in certain cases as illustrated in the following. A photographing of a street sign, for example with Chinese characters may be imaged and extracted by using Chinese character setup and a Chinese dictionary. In combination with the route planner, the location may be indicated in the display and a route proposed. A synthetic speech generator may be an additional convenient, but not absolutely necessary feature. The advantage would be that a person not familiar with Chinese characters would still be able to find his way through a town in China.
In a further embodiment, the mobile unit according to the invention comprises a microphone to record voice messages from the user. In addition, the mobile unit may comprise a routine for phonetic translation. Words and Phrases are stored as audio files in a database and are translated into other languages. This means for the person using the apparatus according to the invention that the person may speak into the microphone and get this speech translated into another language, either as an audio datafile with the message spoken in another language or as a text file. The phonetic translation may be performed simultaneously with the speaking of the person.
In a further embodiment, the apparatus according to the invention can be used to simplify daily arrangements. For example, in connection with information providers, a system may be arranged, where brochures and other information may be ordered by sending an SMS (Short Message System) with a certain code from a mobile telephone to a preselected telephone number. For the dyslexic, this may be simplified by imaging the code, for example from an advertisement, and sending the converted code as chracters/digits by SMS to the preselected telephone number.
SHORT DESCRIPTION OF THE DRAWINGS
The invention will be explained in more detail with reference to the drawing, where
FIG. 1 is a flow diagram describing the overall functioning of the mobile unit in a concrete embodiment according to the invention, FIG. 2 is an illustration of compensation for blurred images,
FIG. 3 is an illustration of image rotation,
FIG. 4 is an illustration of the cleanup effect,
FIG. 5 is an illustration of correction of curvature,
FIG. 6 is an illustration of angular compensation.
DETAILED DESCRIPTION / PREFERRED EMBODIMENT
The mobile unit according to the invention may be configured to have several modes. One of the modes is the image capture, text conversion mode and speech mode. In the following called ATR/TTS mode, where the abbreviations refer to automated text recognition (ATR) and test-to-speech process (TTS process). FIG. 1 is a flow diagram illustrating the overall functioning of the mobile unit according to the invention in concrete embodiment of the invention within the ATR/TTS mode.
The ATR/TTS process is divided into three major phases:
- The Launch Phase,
- The Recognition Phase, and
- The Clean-up and Text-To-Speech Phase
Steps 1A, 1B, 1C, 1 D as illustrated in FIG. 2 belong to the Launch Phase. The ATR/TTS process is launched by one of four potential user events:
- The user has clicked on the camera release whilst the device is in ATR/TTS -mode (step 1A);
- the user has clicked the scalable on-screen release whilst the device is in ATR/TTS-mode (step 1 B);
- the user has activated the on-screen release using a voice command whilst the device is in ATR/TTS-mode (step 1C); or
- the user has open an image from the device store using the image browser (step 1D).
In either case, the ATR/TTS process is launched and will complete when one of the following states is reached:
- The ATR/TTS process has successfully recognised text within the image, has produced a speech file and has played the speech file (step 18); or - The ATR/TTS process has failed to recognise text within the image and has played a pre-recorded error message back to the user (step 19).
The user interface used to control the ATR/TTS application is menu-driven. In order to facilitate the use of the menu by dyslexic persons, the menu may be based on images and/or sound indications such that each function in the menu has its own sound, for example a voice message reading the name of function. The user interface furthermore, may comprise icon-based menu items that can be activated by clicking on the display or by using a corresponding set of voice-commands entered into the mobile unit by the user through a microphone. The icon-based menu interface is scalable and may be resized to accommodate user preferences.
The Recognition Phase comprises steps 2-15 and step 19.
Steps 2 -> 3: The ATR/TTS application will immediately attempt to recognise text within the image using the OCR module (Optical Character Recognition - step 2). On success (step 3), the ATR/TTS process will resume at the Clean-up 16 and Text-to-Speech Phase 17. Otherwise (step 3), the ATR/TTS process will continue at step 4 to improve the image.
Step 4 → 5 → 6: Once the initial recognition has failed, the ATR/TTS process will attempt to improve the image quality using a variety of image manipulation techniques and technologies. First, the image is processed by the simulated autofocus module (step 4) resulting in a clearer image, which also is illustrated in FIG. 2. This software routine may, for instance, use high pass filtering and Fourier transformation in order to make optical edges sharper. In case that the number of pixels in the image is do not fulfil the requirements for handling by the OCR module, the number of pixels is subsequently increased artificially in order to match the requirements of the OCR module (step 5 and step 6)
Step 7 -) 8 → 9 → 10: Once the image quality has been improved, the ATR/TTS process will make another attempt to recognise text within the image (step 7). On success (step 8), the ATR/TTS process will resume at the Clean-up 16 and Text-to- Speech Phase 17. Otherwise (step 8), the ATR/TTS process will continue through a succession of 90° image rotations (step 9), each time attempting to recognise text within the image (step 7) until the image has been rotated by a total of 90°, 180° and 270° (step 10). This is illustrated in more detail in FIG. 3, where the original image not correctly oriented and the first rotation results in an image that is upside down.
Step 11 : If the simulated autofocus and increase in the image resolution fail to render an image that can be successfully processed, the ATR/TTS process will attempt to increase the contrast between the text and the background to the extend of dividing the image into two parts using scaled binary threshold value: (1) Text; and (2) everything else. This is illustrated further in FIG. 4, where spots in the image are removed to make the image more clear and increase the contrast. Furthermore, the ATR/TTS process will attempt to compensate for (a) any optical curving, which is illustrated in FIG. 5, and/or (b) any other optical distortions caused as a result of the image not being captured at an angle of 90°, which is illustrated in FIG. 6.
Step 12 → 13 -> 14 → 15: Once the image quality has been improved, the ATR/TTS process will make another attempt to recognise text within the image (step 12). On success (step 13), the ATR/TTS process will resume at the Clean-up and Text-to-Speech Phase. Otherwise (step 13), the ATR/TTS process will continue through a succession of 90° image rotations (step 14), each time attempting to recognise text within the image (step 12) until the image has been rotated by a total of 90°, 180° and 270° (step 15).
Step 19: If this does not result in successful recognition, the ATR/TTS process is terminated with an error message (step 19).
The Clean-up and Text-To-Speech Phase comprises step 16-18.
Step 16: Once text has been recognised in one of the recognition attempts, the text is passed on for clean-up (step 16). The clean-up task will remove non-printable characters and other characters not part of the current language setting; furthermore, the text will be matched against a database with common words and parts of words, and names of locations to increase the quality of the text.
After step 16, the text is converted from the image format into a text format, which can be used in other applications, for example as shown in Step 17, where the cleaned up text is subsequently passed on to the text-to-speech engine and the resulting synthetic speech is stored in an audio file (step 17).
Step 18: Finally, the audio file is played back and control passed back to the user (step 18). After step 16, the text file may be used in other applications as well, for example for translation into other languages or for interaction with a route planner in order to display locations on a map, to show routes on a map, or by synthetic voice to guide a person around in the environment.

Claims

1. A mobile unit with a computer and a camera, the computer being configured to receive captured images as digital data from the camera and to extract text in the captured images by an optical character recognition (OCR) routine and to convert the text from an image format into a text format, characterised in that the mobile unit further comprises a text database with text words, and wherein the computer is configured to compare the converted text with words in the text database and only to accept the converted text as resembling the imaged text in case of agreement with words in the database.
2. A mobile unit according to claim 1, wherein the mobile unit is configured to check in accordance with pre-programmed rules in computer routines whether the converted text as a whole implies a correct grammar according to these rules and whether the words are correctly related to each other according to these rules.
3. A mobile unit according to any preceding claim, wherein the mobile unit is configured in case of missing acceptance to amend the initially converted text to such a degree that there is achieved congruence to existing words, letters, sequences of words, combined words, and/or parts of sentences from the database in accordance with the pre-programmed rules.
4. A mobile unit according to claim 3, wherein the mobile unit is configured to present the initially converted text and the amended converted text to the user as options among which the user may choose the apparently most correct version.
5. A mobile unit according to any preceding claim, wherein the mobile unit also comprises a database with traffic sign text and geographical names, and where the mobile unit is configured upon specific request from a user of the mobile unit to compare the extracted text from a captured image with traffic sign text or geographical names in the database.
6. A mobile unit according to any preceding claim, wherein the mobile unit is configured upon conversion of the extracted text to request an action from a user of the mobile unit for storing the converted text in a database or data memory in text format.
7. A mobile unit according to claim 6, wherein the mobile unit is configured in case of missing accept of the converted text to request an indication from a user as to whether the converted text is to be stored in the database or data memory.
8. A mobile unit according to any preceding claim, wherein the mobile unit is configured in the case of missing recognition of text characters or in the case that the converted text does not correspond to any text in the text database, to request the capture of another image.
9. A mobile unit according to any preceding claim, wherein the mobile unit is configured to transform the converted text into synthetic speech.
10. A mobile unit according to any preceding claim, wherein the computer comprises a routine for correcting bending distortions of the captured image before text extraction.
11. A mobile unit according to any preceding claim, wherein the computer comprises a routine for checking whether objects obscure text in the captured image, and in the affirmative to replace the obscuring objects in the image by second objects resembling parts of text characters in order to restore the partly obscured text.
12. A mobile unit according to any preceding claim, wherein the computer comprises a routine for automatic rotating the image by a certain angle, for example 30, 45, or 90 degrees, if a first attempt of text extraction fails, and where the routine is configured, if a second attempt after the first rotation fails, to initiate successive rotation of the image until the text is extracted or until a preset maximum rotation has been performed.
13. A mobile unit according to any preceding claim, wherein the mobile unit comprises a route planner program and is configured for using the extracted and converted text in combination with the route planner for finding locations or routes to locations, and wherein the mobile unit is configured for indicating the location or route to the location.
14. A mobile unit according to claim 13, wherein the mobile unit is configured for indicating the location or route to the location by synthetic speech.
15. A mobile unit according to any preceding claim, wherein the mobile unit comprises a route planner program and a GPS signal receiver functionally connected to the route planner program.
16. A mobile unit according to any preceding claim, wherein the mobile unit comprises a telephone.
17. A mobile unit according to claim 16, wherein the mobile unit is configured to send converted text as SMS or IP messages upon indication by a user of the mobile unit.
PCT/DK2007/000020 2006-01-17 2007-01-17 Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech WO2007082536A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP07700159A EP1979858A1 (en) 2006-01-17 2007-01-17 Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US75938706P 2006-01-17 2006-01-17
LU91213A LU91213B1 (en) 2006-01-17 2006-01-17 Mobile unit with camera and optical character recognition, optionnally for conversion of imaged textinto comprehensible speech
LU91213 2006-01-17
US60/759,387 2006-01-17

Publications (1)

Publication Number Publication Date
WO2007082536A1 true WO2007082536A1 (en) 2007-07-26

Family

ID=39739724

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK2007/000020 WO2007082536A1 (en) 2006-01-17 2007-01-17 Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech

Country Status (2)

Country Link
EP (1) EP1979858A1 (en)
WO (1) WO2007082536A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2189926A1 (en) * 2008-11-21 2010-05-26 beyo GmbH Method for providing camera-based services using a portable communication device of a user and portable communication device of a user
EP2410465A1 (en) 2010-07-21 2012-01-25 beyo GmbH Camera based method for mobile communication devices for text detection, recognition and further processing
US8988543B2 (en) 2010-04-30 2015-03-24 Nuance Communications, Inc. Camera based method for text input and keyword detection
US9811171B2 (en) 2012-03-06 2017-11-07 Nuance Communications, Inc. Multimodal text input by a keyboard/camera text input module replacing a conventional keyboard text input module on a mobile device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5235651A (en) * 1991-08-06 1993-08-10 Caere Corporation Rotation of images for optical character recognition
US5859929A (en) * 1995-12-01 1999-01-12 United Parcel Service Of America, Inc. System for character preserving guidelines removal in optically scanned text
US6219453B1 (en) * 1997-08-11 2001-04-17 At&T Corp. Method and apparatus for performing an automatic correction of misrecognized words produced by an optical character recognition technique by using a Hidden Markov Model based algorithm
US20030164819A1 (en) * 2002-03-04 2003-09-04 Alex Waibel Portable object identification and translation system
US20040017482A1 (en) * 2000-11-17 2004-01-29 Jacob Weitman Application for a mobile digital camera, that distinguish between text-, and image-information in an image

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995015535A1 (en) * 1993-12-01 1995-06-08 Motorola Inc. Combined dictionary based and likely character string method of handwriting recognition
US20020156816A1 (en) * 2001-02-13 2002-10-24 Mark Kantrowitz Method and apparatus for learning from user self-corrections, revisions and modifications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5235651A (en) * 1991-08-06 1993-08-10 Caere Corporation Rotation of images for optical character recognition
US5859929A (en) * 1995-12-01 1999-01-12 United Parcel Service Of America, Inc. System for character preserving guidelines removal in optically scanned text
US6219453B1 (en) * 1997-08-11 2001-04-17 At&T Corp. Method and apparatus for performing an automatic correction of misrecognized words produced by an optical character recognition technique by using a Hidden Markov Model based algorithm
US20040017482A1 (en) * 2000-11-17 2004-01-29 Jacob Weitman Application for a mobile digital camera, that distinguish between text-, and image-information in an image
US20030164819A1 (en) * 2002-03-04 2003-09-04 Alex Waibel Portable object identification and translation system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FUJISAWA H ET AL: "Information capturing camera and developmental issues", DOCUMENT ANALYSIS AND RECOGNITION, 1999. ICDAR '99. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BANGALORE, INDIA 20-22 SEPT. 1999, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 20 September 1999 (1999-09-20), pages 205 - 208, XP010351192, ISBN: 0-7695-0318-7 *
See also references of EP1979858A1 *
SOLON A ET AL: "Design of a tourist driven bandwidth determined multimodal mobile presentation system", MOBILITY AWARE TECHNOLOGIES AND APPLICATIONS. FIRST INTERNATIONAL WORKSHOP, MATA 2004. PROCEEDINGS (LECTURE NOTES IN COMPUTER SCIENCE VOL.3284) SPRINGER-VERLAG BERLIN, GERMANY, 2004, pages 331 - 338, XP002374595, ISBN: 3-540-23423-3 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2189926A1 (en) * 2008-11-21 2010-05-26 beyo GmbH Method for providing camera-based services using a portable communication device of a user and portable communication device of a user
US8988543B2 (en) 2010-04-30 2015-03-24 Nuance Communications, Inc. Camera based method for text input and keyword detection
US9589198B2 (en) 2010-04-30 2017-03-07 Nuance Communications, Inc. Camera based method for text input and keyword detection
EP2410465A1 (en) 2010-07-21 2012-01-25 beyo GmbH Camera based method for mobile communication devices for text detection, recognition and further processing
US9811171B2 (en) 2012-03-06 2017-11-07 Nuance Communications, Inc. Multimodal text input by a keyboard/camera text input module replacing a conventional keyboard text input module on a mobile device
US10078376B2 (en) 2012-03-06 2018-09-18 Cüneyt Göktekin Multimodel text input by a keyboard/camera text input module replacing a conventional keyboard text input module on a mobile device

Also Published As

Publication number Publication date
EP1979858A1 (en) 2008-10-15

Similar Documents

Publication Publication Date Title
WO2007082534A1 (en) Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech
US8478578B2 (en) Mobile speech-to-speech interpretation system
CN1116770C (en) Automatic hostel server using speech recognition
JP4356745B2 (en) Machine translation system, machine translation method and program
KR100220960B1 (en) Character and acoustic recognition translation system
KR100819234B1 (en) Method and apparatus for setting destination in navigation terminal
ES2233954T3 (en) METHOD FOR AUTOMATIC VOICE RECOGNITION OF ARBITRARY SPEECH WORDS.
CA2280331C (en) Web-based platform for interactive voice response (ivr)
US6377925B1 (en) Electronic translator for assisting communications
JP2004534268A (en) System and method for preprocessing information used by an automatic attendant
US20130338997A1 (en) Language translation of visual and audio input
US20040210444A1 (en) System and method for translating languages using portable display device
US20030164819A1 (en) Portable object identification and translation system
US20080059199A1 (en) In-vehicle apparatus
JPWO2005066882A1 (en) Character recognition device, mobile communication system, mobile terminal device, fixed station device, character recognition method, and character recognition program
US8484028B2 (en) Systems and methods for document navigation with a text-to-speech engine
WO2007082536A1 (en) Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech
US7428491B2 (en) Method and system for obtaining personal aliases through voice recognition
JP2012168349A (en) Speech recognition system and retrieval system using the same
CN1217808A (en) Automatic speech recognition
AV et al. Penpal-electronic pen aiding visually impaired in reading and visualizing textual contents
CN111933131A (en) Voice recognition method and device
DE102007016887B3 (en) Method for operating navigation system, involves entering address information into corresponding phonetic address information, where address information is decoded in phonetic code
US20080133240A1 (en) Spoken dialog system, terminal device, speech information management device and recording medium with program recorded thereon
Tiwari et al. Audio to Sign Language Converter

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2007700159

Country of ref document: EP