WO2015025330A1

WO2015025330A1 - A system to enable user to interact with an electronic processing device using voice of the user

Info

Publication number: WO2015025330A1
Application number: PCT/IN2014/000499
Authority: WO
Inventors: Aaditya Kishore KALE; Aditya S. DESHPANDE
Original assignee: Kale Aaditya Kishore
Priority date: 2013-08-21
Filing date: 2014-07-28
Publication date: 2015-02-26

Abstract

Disclosed is a system to enable user to interact with an electronic processing device using voice of the user, the system being inbuilt in the electronic processing device, the system comprising: a speech input unit to receive, from the user, speech command in respect of task to be performed in the electronic processing device; a storage unit for storing data and instructions, the data includes list of targets, the instructions include list of tasks and description about the tasks; speech recognition unit being connected to the speech input unit and the storage unit, the speech recognition unit configured to receive output of the speech input unit and convert the speech command into best proximal text, the text includes information about the task to be performed and target on which task being performed; a task processing and performing unit being connected to the speech recognition unit and the storage unit, the task processing and performing unit being configured to: receive, from the speech recognition unit, the text equivalent of the speech command of the user, compare the text with data and instructions stored in the storage unit, identify the task and the target from the text on the basis of comparison, and perform the task on the target; and a task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target.

Description

A SYSTEM TO ENABLE USER TO INTERACT WITH AN ELECTRONIC PROCESSING DEVICE USING VOICE OF THE USER

A system to enable user to interact with an electronic processing

device using voice of the user

Field of the Invention:

The present invention relates to interaction between human and electronic device. More specifically it pertains to interaction between human and electronic device using voice commands given by the user. Even more particularly it relates to mechanism to enable human to interact with electronic device like computing device, mobile phones and like using human voice.

Prior Art:

Several patents are there on human machine interaction. Some of them are discussed herein below for the reference.

US Patent No. 8275617 discloses an interactive computer controlled display system with speech command input recognition and visual feedback, implementations are provided for predetermining a plurality of speech commands for respectively initiating each of a corresponding plurality of system actions in combination with implementations for providing for each of said plurality of commands, an associated set of speech terms, each term having relevance to its associated command.

l Also included are implementations for detecting speech command and speech terms. The system provides an implementation responsive to a detected speech command for displaying said command, and an implementation responsive to a detected speech term having relevance to one of said commands for displaying the relevant command. The system further comprehends an interactive implementation for selecting a displayed command to thereby initiate a system action; this selecting implementation is preferably through a speech command input. The system preferably displays the basic speech commands simultaneously along with the relevant commands.

US Patent Publication No. 20120232906 discloses an electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.

US Patent No. 71257531 discloses synthesis of automated speech recognition (voice to text) technology and a knowledge-based analysis of the concepts and contexts of the free text therefrom enable a directed- vocabulary look up index to be used in conjunction with the speech recognition technology thus enabling medical dictation to be transcribed in real time without elaborate training of the dictator or the speech recognition technology.

US Patent No. 8340975 discloses a self-contained wireless interactive speech recognition control device and system that integrates with automated systems and appliances to provide totally hands-free speech control capabilities for a given space. Preferably, each device comprises a systemmable microcontroller having embedded speech recognition and audio output capabilities, a microphone, a speaker and a wireless communication system through which a plurality of devices can communicate with each other and with one or more system controllers or automated mechanisms. The device may be enclosed in a stand-alone housing or within a standard electrical wall box. Several devices may be installed in close proximity to one another to ensure hands-free coverage throughout the space. When two or more devices are triggered simultaneously by the same speech command, real time coordination ensures that only one device will respond to the command.

US Patent No. 6088671 discloses a method for use in recognizing continuous speech, signals are accepted corresponding to interspersed speech elements including text elements corresponding to text to be recognized and command elements corresponding to commands to be executed. The elements are recognized. The recognized elements are acted on in a manner which depends on whether they represent text or commands.

US Patent No. 8306819 discloses a speech-to-text conversion system is provided which comprises at least one user terminal for recording speech, at least one automatic speech recognition processor to generate text from a recorded speech file, and communication means operative to return a corresponding text file to a user, in which said at least one user terminal is remote from said at least one automatic speech recognition processor, and a server is provided remote from said at least one user terminal to control the transfer of recorded speech files to a selected automatic speech recognition processor.

US Patent No. 6173259 discloses a method for developing a voice user interface for a statistical semantic system is described. A set of semantic meanings is defined that reflect semantic classification of a user input dialog. Then, a set of speech dialog prompts is automatically developed from an annotated transcription corpus for directing user inputs to corresponding final semantic meanings. The statistical semantic system may be callrouting applications where the semantic meanings are call routing destinations.

US Patent No. 8165887 discloses a synthesis of automated speech recognition (voice to text) technology and a knowledge-based analysis of the concepts and contexts of the free text therefrom enable a directed- vocabulary look up index to be used in conjunction with the speech recognition technology thus enabling medical dictation to be transcribed in real time without elaborate training of the dictator or the speech recognition technology. Thus, caregivers can create and review Computer-Based Patient Records in the necessary timeframe consistent with good patient care. The Computer-Based Patient Records can be linked to other applications such as prescription cross checking, lab test results, payer regulations, etc. US Patent No. 6728676 discloses an inventory method can include assigning a tracking number to each movable item in a set of movable items and speaking the tracking number and a corresponding item description and condition to a speech enabled inventory application in a computing device. The speech enabled inventory application can speech-

• to-text convert the spoken tracking number and corresponding item description and condition. Moreover, the speech enabled inventory application can provide at least one statistically alternate recognized word for each of the spoken tracking number and corresponding item description and condition. One tracking number, one item description and

^■ one item condition can be selected from among the spoken and alternate tracking numbers, the spoken and alternate item descriptions, and the spoken and alternative item conditions. Finally, the speech enabled inventory application can store the selected tracking number, item description and condition in an inventory database.

The above mentioned mechanisms provide some sort of human machine interaction. However these mechanisms suffer from at least one of the following mentioned deficiencies: i. They have limited interactive capabilities. They can only convert voice command into text. However they can not move file/folder from one location to other.

ii. ^~ They are not efficient enough to operate electronic device on the basis of voice command.

iii. Accuracy in speech recognition is average as along with voice command external noise also get inputted to speech recognition mechanism. There is no provision to remove such external noise. iv. They can not support dictation of programming language.

Therefore there is a need of some new and innovative mechanism which can improve human machine interaction using voice commands of the user.

Object of the invention:

The main objective of the present invention is to provide a system to perform various operations on the electronic processing device including searching and opening files/folders and launching user specified websites.

Another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system has optimum accuracy in speech recognition.

Still another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system can support dictations for writing the application development systems.

Yet another object of invention is to provide the system to enable user to interact with an electronic processing device using voice of the user which system is flexible to make any addition, deletion or modification in existing system.

Yet another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system is user friendly and supports all operating systems. Statement of the Invention:

In order to achieve the above discussed objects, the present invention provides a system to enable user to interact with an electronic processing device using voice of the user, the system being inbuilt in the electronic processing device, the system comprising: a speech input unit to receive, from the user, speech command in respect of task to be performed in the electronic processing device; a storage unit for storing data and instructions, the data includes list of targets, the instructions include list of tasks and description about the tasks; speech recognition unit being connected to the speech input unit and the storage unit, the speech recognition unit is configured to receive output of the speech input unit and convert the speech command into best proximal text, the text includes information about the , task to be performed and target on which task being performed; a task processing and performing unit being connected to the speech recognition unit and the storage unit, the task processing and performing unit being configured to:

- receive, from the speech recognition unit, the text equivalent of the speech command of the user, - compare the text with data and instructions stored in the storage unit,

- identify the task and the target from the text on the basis of comparison, and

- perform the task on the target; and a task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target.

Brief description of drawing:

Fig 1 : Shows block diagram of the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention.

Fig 2 : Shows exploded process block diagram of the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention.

Detailed Description of the invention:

The invention and its further features and advantages are explained in more detail on the basis of the exemplary embodiments schematically represented in the figures.

Referring to figure 1, the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention is illustrated. The system is inbuilt in the electronic processing device. The system includes a speech input unit, a storage unit, speech recognition unit, a task processing and performing unit and a task output unit.

The speech input unit receives from the user, speech command in respect of task to be performed in the electronic processing device. The speech input unit may be microphone or similar devices.

The storage unit i.e. memory stores data and instructions. The data includes list of targets and instructions. The target includes files, folder and web address. The files are mainly but not limited to audio, video, audio-visual, image or text related. The instructions include list of tasks and description about the tasks. The task includes at least any of the following activities on the target: opening, copying, moving, modifying, searching and deleting. The invention is not limited to these activities only and may be extended to other activities also.

The speech recognition unit is connected to the speech input unit and the storage unit. The speech recognition unit is configured to receive output of the speech input unit and convert the speech command into best proximal text. The text includes information about the task to be performed and target on which task being performed.

The task processing and performing unit is connected to the speech recognition unit and the storage unit. The task processing and performing unit being configured to:

- perform the task on the target.

The task processing and performing unit is further configured to add new tasks and targets in the storage unit and take dictations compatible for all programming languages.

The task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target. ,

Here the electronic processing device may include user computer, laptop, smart phones and similar computing devices. However the invention is by no way limited to said electronic processing devices only. The disclosed invention is useful in executing operations in Microsoft windows XP/Vista/7/8 onwards, Windows phone 8 onwards, Mac-OS 10.5 onwards, iOS4 onwards and Android-OS 2.2 onwards to the latest versions of all the supported operating systems through human speech recognition.

The invention provides the User the ability to operate the computer/smart-phohes by giving vocal commands. It includes enhanced feature to search and open any file or folder stored on the hard drive/storage memory of the user. When human voice command is given, the speech recognition unit scans the storage unit having the list of files that are supported in the application. The list mentioned here consists of two items filename and its path. This list is generated by scanning the computer/smart-phones hard drive/storage memory based on the file extension and getting the file-names and its path. The list also contains all the folder-names and their paths too. To get better results of speech recognition this list is provided as a grammar to the speech recognition unit. When user gives a speech command the speech recognition unit scans the grammar for a desired result. When speech recognition unit gets that result, the human speech is actually converted to text. Examining that text system performs the operations that are requested by the user.

Example 1:-

On a windows 7os computer user wants to open a file called "Summer of 69.mp3"which has location "d :/abc/Bryan Adams/ Summer of 69.mp3" is the user will say " play Summer of 69" and the file will get opened in the default application set to that extension.

Initially when the system is loading the list of filenames is provided to the speech recognition unit. After the speech command is inputted the speech recognition unit converts the speech into the best suited text which is present in the grammar. After the speech recognition unit converts the speech to text that text is analysed and examined against the list that contains the filenames. The result of the examination is the actual name of the file user wants to open. So now the filename is also obtained. The only task remaining is to find the path for that filename and open that file using that path.

Example 2:-

Consider on a windows 7os machine that user wants to move a file called "Summer of 69.mp3" which has location "d :/abc/Bryan Adams/ Summer of 69.mp3" to location "e:/xyz/Old Classics" all that user has to do is say " move Summer of 69 to Old Classics " and the file will get moved to location "e:/xyz/Old Classics/ Summer of 69.mp3".

Above examples briefly explains the basic functioning of the system. By making changes specific to an operating system the system will work on the other operating systems as well.

The r system/devices which the users intend to operate using voice command receives the human voice inputted through a microphone. The Human voice stream is then converted to text by the Microsoft speech SDK. The speech recognition unit recognizes the text from the containing grammars. Once the words are recognized respective operations are performed.

The invention also facilitates to add names of new web-sites to the speech recognition unit grammar thereby providing the functionality to open the web-site of user choice. The user is allowed to add user defined words or phrases to the existing grammar. The system provides the additional feature to take dictations for^" writing the application development systems regardless of which developing environment user uses on user specific editors like Eclipse, Visual Studio, Xcode, SQL-Server etc.

This invention is designed to be installed on any Microsoft windows Vista/7/8 onwards, Windows phone 8 onwards, Mac-OS 10.5 onwards, iOS4 onwards and Android-OS 2.2 onwards to the latest versions of all the supported operating systems.

This application is also equipped with new feature that will minimize the extraneous sound. By doing so the input that is provided to the speech recognition unit carries pure human voice and partially some other sound but at very reduced intensity.

When the user gives speech command there is always a possibility that there might be some extraneous sound in the surroundings. When the speech command is inputted to the speech recognition unit, the full input is human-voice + extraneous sound. By using the frequency high-pass and low-pass filter which can be set to get output as the human voice only applying to the microphone input will result in the pure human voice with minimum extraneous sound. This enables speech recognition unit to have better accuracy of speech recognition. For this purpose the system includes speech filtering unit to filter out the human speech portion from the speech command.

The invention is basically divided into 5 different sections:

- Sound filtering

- Handling of the operating system

- Opening, moving, adding, deleting and modifying files and folders

- Normal dictation

- Programming dictation

Further enhancements can be done independently on any/all sections independently. Example 3:

When the user says 'update directory information', the system finds all the claimed folders and indexes them into the database. Once the folders are indexed the user then says the file name as open document hello.txt' the unit recognises the filename said and acquires the file path after which the system opens the file hello.txt in the default text editor i.e. notepad.

Following are the examples of some of the tasks to be performed by human speech instructions:

Open file "Name of any file" Opens the specified file in its default application.

Open website "Name of any Opens the specified website in website" default web browser.

Modify (Update) _^ Directory Updates the list of directories Information stored on hard drive/storage memory

Modify (Update) Files Information Updates the list of files stored on hard drive/storage memory

Modify (Update) Websites Updates the list of websites

Information

Now referring to figure 2, the detailed process block diagram associated with the system of present invention is illustrated. The details of the blocks are as follow:

1. Starts or stops the speech input unit e.g. microphone

2. Microphone to capture the human voice input

3. Speech recognition unit containing grammars

4. Storage Unit i.e. Database for the indexing of files and folders

5. Selecting the stated modes of operation

6. Operating and managing the operating system

7. Normal Dictation or programming dictation input

8. Searching files/folders and opening files/folders, websites.

9. Perform the user requested operation

10. Write the text

11. Open the requested file/folder/web-site

12. Wait for next human voice command While various embodiments of the present invention have been described above, it should be understood that they have beep presented by way of example only, and not limitation. It will be understood by those skilled in the relevant art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

We Claim:

1. -A system to enable user to interact with an electronic processing device using voice of the user, the system being inbuilt in the electronic processing device, the system comprising: a speech input unit to receive, from the user, speech command in respect of task to be performed in the electronic processing device; a storage unit for storing data and instructions, the data includes list of targets, the instructions include list of tasks and description about the tasks; speech recognition unit being connected to the speech input unit and the storage unit, the speech recognition unit configured to receive output of the speech input unit and convert the speech command into best proximal text, the text includes information about the task to be performed and target on which task being performed; a task processing and performing unit being connected to the speech recognition unit and the storage unit, the task processing and performing unit being configured to:

- receive, from the speech recognition unit, the text equivalent of the speech command of the user,

- compare the text with data and instructions stored in the storage unit, - identify the task and the target from the text on the basis of comparison, and

- perform the ask on the target; and a task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target.

2. The system to enable user to interact with an electronic processing device using voice of the user as claimed in claim 1 wherein target includes files, folder and web address.

3. The system to enable user to interact with an electronic processing device using voice of the user as claimed in claim 2 wherein files are audio, video, audio-visual, image or text related.

4. The system to enable user to interact with an electronic processing device using voice of the user as claimed in any of the claims 1 to 3 wherein task includes at least any of the following activities on the target:

- opening,

- copying,

- moving, .

- modifying,

- searching and

- deleting.

5. _The system to enable user to interact with an electronic processing device using voice of the user as claimed in any of the claims 1 to 4 wherein the task processing and performing unit is further configured to add new tasks and targets in the storage unit.

6. The system to enable user to interact with an electronic processing device using voice of the user as claimed in any of the claims 1 to 5 wherein the task processing and performing unit is further configured to take dictations compatible for all programming languages.

7. The system to enable user to interact with an electronic processing device using voice of the user as claimed in any of the claims 1 to 6 further includes speech filtering unit to filter out the human speech portion from the speech command.