WO2015025330A1 - A system to enable user to interact with an electronic processing device using voice of the user - Google Patents

A system to enable user to interact with an electronic processing device using voice of the user Download PDF

Info

Publication number
WO2015025330A1
WO2015025330A1 PCT/IN2014/000499 IN2014000499W WO2015025330A1 WO 2015025330 A1 WO2015025330 A1 WO 2015025330A1 IN 2014000499 W IN2014000499 W IN 2014000499W WO 2015025330 A1 WO2015025330 A1 WO 2015025330A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
task
speech
unit
processing device
Prior art date
Application number
PCT/IN2014/000499
Other languages
French (fr)
Inventor
Aaditya Kishore KALE
Aditya S. DESHPANDE
Original Assignee
Kale Aaditya Kishore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kale Aaditya Kishore filed Critical Kale Aaditya Kishore
Publication of WO2015025330A1 publication Critical patent/WO2015025330A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present invention relates to interaction between human and electronic device. More specifically it pertains to interaction between human and electronic device using voice commands given by the user. Even more particularly it relates to mechanism to enable human to interact with electronic device like computing device, mobile phones and like using human voice.
  • US Patent No. 8275617 discloses an interactive computer controlled display system with speech command input recognition and visual feedback, implementations are provided for predetermining a plurality of speech commands for respectively initiating each of a corresponding plurality of system actions in combination with implementations for providing for each of said plurality of commands, an associated set of speech terms, each term having relevance to its associated command.
  • implementations for detecting speech command and speech terms The system provides an implementation responsive to a detected speech command for displaying said command, and an implementation responsive to a detected speech term having relevance to one of said commands for displaying the relevant command.
  • the system further comprehends an interactive implementation for selecting a displayed command to thereby initiate a system action; this selecting implementation is preferably through a speech command input.
  • the system preferably displays the basic speech commands simultaneously along with the relevant commands.
  • US Patent Publication No. 20120232906 discloses an electronic device may capture a voice command from a user.
  • the electronic device may store contextual information about the state of the electronic device when the voice command is received.
  • the electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server.
  • the computing equipment may perform a speech recognition operation on the voice command and may process the contextual information.
  • the computing equipment may respond to the voice command.
  • the computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.
  • US Patent No. 71257531 discloses synthesis of automated speech recognition (voice to text) technology and a knowledge-based analysis of the concepts and contexts of the free text therefrom enable a directed- vocabulary look up index to be used in conjunction with the speech recognition technology thus enabling medical dictation to be transcribed in real time without elaborate training of the dictator or the speech recognition technology.
  • US Patent No. 8340975 discloses a self-contained wireless interactive speech recognition control device and system that integrates with automated systems and appliances to provide totally hands-free speech control capabilities for a given space.
  • each device comprises a systemmable microcontroller having embedded speech recognition and audio output capabilities, a microphone, a speaker and a wireless communication system through which a plurality of devices can communicate with each other and with one or more system controllers or automated mechanisms.
  • the device may be enclosed in a stand-alone housing or within a standard electrical wall box.
  • Several devices may be installed in close proximity to one another to ensure hands-free coverage throughout the space. When two or more devices are triggered simultaneously by the same speech command, real time coordination ensures that only one device will respond to the command.
  • US Patent No. 6088671 discloses a method for use in recognizing continuous speech, signals are accepted corresponding to interspersed speech elements including text elements corresponding to text to be recognized and command elements corresponding to commands to be executed. The elements are recognized. The recognized elements are acted on in a manner which depends on whether they represent text or commands.
  • US Patent No. 8306819 discloses a speech-to-text conversion system which comprises at least one user terminal for recording speech, at least one automatic speech recognition processor to generate text from a recorded speech file, and communication means operative to return a corresponding text file to a user, in which said at least one user terminal is remote from said at least one automatic speech recognition processor, and a server is provided remote from said at least one user terminal to control the transfer of recorded speech files to a selected automatic speech recognition processor.
  • US Patent No. 6173259 discloses a method for developing a voice user interface for a statistical semantic system is described.
  • a set of semantic meanings is defined that reflect semantic classification of a user input dialog.
  • a set of speech dialog prompts is automatically developed from an annotated transcription corpus for directing user inputs to corresponding final semantic meanings.
  • the statistical semantic system may be callrouting applications where the semantic meanings are call routing destinations.
  • US Patent No. 8165887 discloses a synthesis of automated speech recognition (voice to text) technology and a knowledge-based analysis of the concepts and contexts of the free text therefrom enable a directed- vocabulary look up index to be used in conjunction with the speech recognition technology thus enabling medical dictation to be transcribed in real time without elaborate training of the dictator or the speech recognition technology.
  • caregivers can create and review Computer-Based Patient Records in the necessary timeframe consistent with good patient care.
  • the Computer-Based Patient Records can be linked to other applications such as prescription cross checking, lab test results, payer regulations, etc.
  • an inventory method can include assigning a tracking number to each movable item in a set of movable items and speaking the tracking number and a corresponding item description and condition to a speech enabled inventory application in a computing device.
  • the speech enabled inventory application can speech-
  • the speech enabled inventory application can provide at least one statistically alternate recognized word for each of the spoken tracking number and corresponding item description and condition.
  • one item condition can be selected from among the spoken and alternate tracking numbers, the spoken and alternate item descriptions, and the spoken and alternative item conditions.
  • the speech enabled inventory application can store the selected tracking number, item description and condition in an inventory database.
  • the main objective of the present invention is to provide a system to perform various operations on the electronic processing device including searching and opening files/folders and launching user specified websites.
  • Another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system has optimum accuracy in speech recognition.
  • Still another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system can support dictations for writing the application development systems.
  • Yet another object of invention is to provide the system to enable user to interact with an electronic processing device using voice of the user which system is flexible to make any addition, deletion or modification in existing system.
  • Yet another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system is user friendly and supports all operating systems.
  • the present invention provides a system to enable user to interact with an electronic processing device using voice of the user, the system being inbuilt in the electronic processing device, the system comprising: a speech input unit to receive, from the user, speech command in respect of task to be performed in the electronic processing device; a storage unit for storing data and instructions, the data includes list of targets, the instructions include list of tasks and description about the tasks; speech recognition unit being connected to the speech input unit and the storage unit, the speech recognition unit is configured to receive output of the speech input unit and convert the speech command into best proximal text, the text includes information about the , task to be performed and target on which task being performed; a task processing and performing unit being connected to the speech recognition unit and the storage unit, the task processing and performing unit being configured to:
  • a task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target.
  • Fig 1 Shows block diagram of the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention.
  • Fig 2 Shows exploded process block diagram of the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention.
  • the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention is illustrated.
  • the system is inbuilt in the electronic processing device.
  • the system includes a speech input unit, a storage unit, speech recognition unit, a task processing and performing unit and a task output unit.
  • the speech input unit receives from the user, speech command in respect of task to be performed in the electronic processing device.
  • the speech input unit may be microphone or similar devices.
  • the storage unit i.e. memory stores data and instructions.
  • the data includes list of targets and instructions.
  • the target includes files, folder and web address.
  • the files are mainly but not limited to audio, video, audio-visual, image or text related.
  • the instructions include list of tasks and description about the tasks.
  • the task includes at least any of the following activities on the target: opening, copying, moving, modifying, searching and deleting. The invention is not limited to these activities only and may be extended to other activities also.
  • the speech recognition unit is connected to the speech input unit and the storage unit.
  • the speech recognition unit is configured to receive output of the speech input unit and convert the speech command into best proximal text.
  • the text includes information about the task to be performed and target on which task being performed.
  • the task processing and performing unit is connected to the speech recognition unit and the storage unit.
  • the task processing and performing unit being configured to:
  • the task processing and performing unit is further configured to add new tasks and targets in the storage unit and take dictations compatible for all programming languages.
  • the task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target.
  • the electronic processing device may include user computer, laptop, smart phones and similar computing devices.
  • the invention is by no way limited to said electronic processing devices only.
  • the disclosed invention is useful in executing operations in Microsoft windows XP/Vista/7/8 onwards, Windows phone 8 onwards, Mac-OS 10.5 onwards, iOS4 onwards and Android-OS 2.2 onwards to the latest versions of all the supported operating systems through human speech recognition.
  • the invention provides the User the ability to operate the computer/smart-phohes by giving vocal commands. It includes enhanced feature to search and open any file or folder stored on the hard drive/storage memory of the user.
  • the speech recognition unit scans the storage unit having the list of files that are supported in the application.
  • the list mentioned here consists of two items filename and its path. This list is generated by scanning the computer/smart-phones hard drive/storage memory based on the file extension and getting the file-names and its path.
  • the list also contains all the folder-names and their paths too.
  • this list is provided as a grammar to the speech recognition unit.
  • When user gives a speech command the speech recognition unit scans the grammar for a desired result.
  • speech recognition unit gets that result, the human speech is actually converted to text. Examining that text system performs the operations that are requested by the user.
  • the speech recognition unit converts the speech into the best suited text which is present in the grammar. After the speech recognition unit converts the speech to text that text is analysed and examined against the list that contains the filenames. The result of the examination is the actual name of the file user wants to open. So now the filename is also obtained. The only task remaining is to find the path for that filename and open that file using that path.
  • the r system/devices which the users intend to operate using voice command receives the human voice inputted through a microphone.
  • the Human voice stream is then converted to text by the Microsoft speech SDK.
  • the speech recognition unit recognizes the text from the containing grammars. Once the words are recognized respective operations are performed.
  • the invention also facilitates to add names of new web-sites to the speech recognition unit grammar thereby providing the functionality to open the web-site of user choice.
  • the user is allowed to add user defined words or phrases to the existing grammar.
  • the system provides the additional feature to take dictations for " writing the application development systems regardless of which developing environment user uses on user specific editors like Eclipse, Visual Studio, Xcode, SQL-Server etc.
  • This invention is designed to be installed on any Microsoft windows Vista/7/8 onwards, Windows phone 8 onwards, Mac-OS 10.5 onwards, iOS4 onwards and Android-OS 2.2 onwards to the latest versions of all the supported operating systems.
  • This application is also equipped with new feature that will minimize the extraneous sound. By doing so the input that is provided to the speech recognition unit carries pure human voice and partially some other sound but at very reduced intensity.
  • the system includes speech filtering unit to filter out the human speech portion from the speech command.
  • the invention is basically divided into 5 different sections:
  • the system finds all the claimed folders and indexes them into the database. Once the folders are indexed the user then says the file name as open document hello.txt' the unit recognises the filename said and acquires the file path after which the system opens the file hello.txt in the default text editor i.e. notepad.
  • Open file "Name of any file” Opens the specified file in its default application.
  • Speech recognition unit containing grammars
  • Storage Unit i.e. Database for the indexing of files and folders

Abstract

Disclosed is a system to enable user to interact with an electronic processing device using voice of the user, the system being inbuilt in the electronic processing device, the system comprising: a speech input unit to receive, from the user, speech command in respect of task to be performed in the electronic processing device; a storage unit for storing data and instructions, the data includes list of targets, the instructions include list of tasks and description about the tasks; speech recognition unit being connected to the speech input unit and the storage unit, the speech recognition unit configured to receive output of the speech input unit and convert the speech command into best proximal text, the text includes information about the task to be performed and target on which task being performed; a task processing and performing unit being connected to the speech recognition unit and the storage unit, the task processing and performing unit being configured to: receive, from the speech recognition unit, the text equivalent of the speech command of the user, compare the text with data and instructions stored in the storage unit, identify the task and the target from the text on the basis of comparison, and perform the task on the target; and a task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target.

Description

A SYSTEM TO ENABLE USER TO INTERACT WITH AN ELECTRONIC PROCESSING DEVICE USING VOICE OF THE USER
A system to enable user to interact with an electronic processing
device using voice of the user
Field of the Invention:
The present invention relates to interaction between human and electronic device. More specifically it pertains to interaction between human and electronic device using voice commands given by the user. Even more particularly it relates to mechanism to enable human to interact with electronic device like computing device, mobile phones and like using human voice.
Prior Art:
Several patents are there on human machine interaction. Some of them are discussed herein below for the reference.
US Patent No. 8275617 discloses an interactive computer controlled display system with speech command input recognition and visual feedback, implementations are provided for predetermining a plurality of speech commands for respectively initiating each of a corresponding plurality of system actions in combination with implementations for providing for each of said plurality of commands, an associated set of speech terms, each term having relevance to its associated command.
l Also included are implementations for detecting speech command and speech terms. The system provides an implementation responsive to a detected speech command for displaying said command, and an implementation responsive to a detected speech term having relevance to one of said commands for displaying the relevant command. The system further comprehends an interactive implementation for selecting a displayed command to thereby initiate a system action; this selecting implementation is preferably through a speech command input. The system preferably displays the basic speech commands simultaneously along with the relevant commands.
US Patent Publication No. 20120232906 discloses an electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.
US Patent No. 71257531 discloses synthesis of automated speech recognition (voice to text) technology and a knowledge-based analysis of the concepts and contexts of the free text therefrom enable a directed- vocabulary look up index to be used in conjunction with the speech recognition technology thus enabling medical dictation to be transcribed in real time without elaborate training of the dictator or the speech recognition technology.
US Patent No. 8340975 discloses a self-contained wireless interactive speech recognition control device and system that integrates with automated systems and appliances to provide totally hands-free speech control capabilities for a given space. Preferably, each device comprises a systemmable microcontroller having embedded speech recognition and audio output capabilities, a microphone, a speaker and a wireless communication system through which a plurality of devices can communicate with each other and with one or more system controllers or automated mechanisms. The device may be enclosed in a stand-alone housing or within a standard electrical wall box. Several devices may be installed in close proximity to one another to ensure hands-free coverage throughout the space. When two or more devices are triggered simultaneously by the same speech command, real time coordination ensures that only one device will respond to the command.
US Patent No. 6088671 discloses a method for use in recognizing continuous speech, signals are accepted corresponding to interspersed speech elements including text elements corresponding to text to be recognized and command elements corresponding to commands to be executed. The elements are recognized. The recognized elements are acted on in a manner which depends on whether they represent text or commands.
US Patent No. 8306819 discloses a speech-to-text conversion system is provided which comprises at least one user terminal for recording speech, at least one automatic speech recognition processor to generate text from a recorded speech file, and communication means operative to return a corresponding text file to a user, in which said at least one user terminal is remote from said at least one automatic speech recognition processor, and a server is provided remote from said at least one user terminal to control the transfer of recorded speech files to a selected automatic speech recognition processor.
US Patent No. 6173259 discloses a method for developing a voice user interface for a statistical semantic system is described. A set of semantic meanings is defined that reflect semantic classification of a user input dialog. Then, a set of speech dialog prompts is automatically developed from an annotated transcription corpus for directing user inputs to corresponding final semantic meanings. The statistical semantic system may be callrouting applications where the semantic meanings are call routing destinations.
US Patent No. 8165887 discloses a synthesis of automated speech recognition (voice to text) technology and a knowledge-based analysis of the concepts and contexts of the free text therefrom enable a directed- vocabulary look up index to be used in conjunction with the speech recognition technology thus enabling medical dictation to be transcribed in real time without elaborate training of the dictator or the speech recognition technology. Thus, caregivers can create and review Computer-Based Patient Records in the necessary timeframe consistent with good patient care. The Computer-Based Patient Records can be linked to other applications such as prescription cross checking, lab test results, payer regulations, etc. US Patent No. 6728676 discloses an inventory method can include assigning a tracking number to each movable item in a set of movable items and speaking the tracking number and a corresponding item description and condition to a speech enabled inventory application in a computing device. The speech enabled inventory application can speech-
• to-text convert the spoken tracking number and corresponding item description and condition. Moreover, the speech enabled inventory application can provide at least one statistically alternate recognized word for each of the spoken tracking number and corresponding item description and condition. One tracking number, one item description and
one item condition can be selected from among the spoken and alternate tracking numbers, the spoken and alternate item descriptions, and the spoken and alternative item conditions. Finally, the speech enabled inventory application can store the selected tracking number, item description and condition in an inventory database.
The above mentioned mechanisms provide some sort of human machine interaction. However these mechanisms suffer from at least one of the following mentioned deficiencies: i. They have limited interactive capabilities. They can only convert voice command into text. However they can not move file/folder from one location to other.
ii. ~ They are not efficient enough to operate electronic device on the basis of voice command.
iii. Accuracy in speech recognition is average as along with voice command external noise also get inputted to speech recognition mechanism. There is no provision to remove such external noise. iv. They can not support dictation of programming language.
Therefore there is a need of some new and innovative mechanism which can improve human machine interaction using voice commands of the user.
Object of the invention:
The main objective of the present invention is to provide a system to perform various operations on the electronic processing device including searching and opening files/folders and launching user specified websites.
Another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system has optimum accuracy in speech recognition.
Still another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system can support dictations for writing the application development systems.
Yet another object of invention is to provide the system to enable user to interact with an electronic processing device using voice of the user which system is flexible to make any addition, deletion or modification in existing system.
Yet another object of the invention is to provide system to enable user to interact with an electronic processing device using voice of the user which system is user friendly and supports all operating systems. Statement of the Invention:
In order to achieve the above discussed objects, the present invention provides a system to enable user to interact with an electronic processing device using voice of the user, the system being inbuilt in the electronic processing device, the system comprising: a speech input unit to receive, from the user, speech command in respect of task to be performed in the electronic processing device; a storage unit for storing data and instructions, the data includes list of targets, the instructions include list of tasks and description about the tasks; speech recognition unit being connected to the speech input unit and the storage unit, the speech recognition unit is configured to receive output of the speech input unit and convert the speech command into best proximal text, the text includes information about the , task to be performed and target on which task being performed; a task processing and performing unit being connected to the speech recognition unit and the storage unit, the task processing and performing unit being configured to:
- receive, from the speech recognition unit, the text equivalent of the speech command of the user, - compare the text with data and instructions stored in the storage unit,
- identify the task and the target from the text on the basis of comparison, and
- perform the task on the target; and a task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target.
Brief description of drawing:
Fig 1 : Shows block diagram of the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention.
Fig 2 : Shows exploded process block diagram of the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention.
Detailed Description of the invention:
The invention and its further features and advantages are explained in more detail on the basis of the exemplary embodiments schematically represented in the figures.
Referring to figure 1, the system to enable user to interact with an electronic processing device using voice of the user in accordance to present invention is illustrated. The system is inbuilt in the electronic processing device. The system includes a speech input unit, a storage unit, speech recognition unit, a task processing and performing unit and a task output unit.
The speech input unit receives from the user, speech command in respect of task to be performed in the electronic processing device. The speech input unit may be microphone or similar devices.
The storage unit i.e. memory stores data and instructions. The data includes list of targets and instructions. The target includes files, folder and web address. The files are mainly but not limited to audio, video, audio-visual, image or text related. The instructions include list of tasks and description about the tasks. The task includes at least any of the following activities on the target: opening, copying, moving, modifying, searching and deleting. The invention is not limited to these activities only and may be extended to other activities also.
The speech recognition unit is connected to the speech input unit and the storage unit. The speech recognition unit is configured to receive output of the speech input unit and convert the speech command into best proximal text. The text includes information about the task to be performed and target on which task being performed.
The task processing and performing unit is connected to the speech recognition unit and the storage unit. The task processing and performing unit being configured to:
- receive, from the speech recognition unit, the text equivalent of the speech command of the user, - compare the text with data and instructions stored in the storage unit,
- identify the task and the target from the text on the basis of comparison, and
- perform the task on the target.
The task processing and performing unit is further configured to add new tasks and targets in the storage unit and take dictations compatible for all programming languages.
The task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target. ,
Here the electronic processing device may include user computer, laptop, smart phones and similar computing devices. However the invention is by no way limited to said electronic processing devices only. The disclosed invention is useful in executing operations in Microsoft windows XP/Vista/7/8 onwards, Windows phone 8 onwards, Mac-OS 10.5 onwards, iOS4 onwards and Android-OS 2.2 onwards to the latest versions of all the supported operating systems through human speech recognition.
The invention provides the User the ability to operate the computer/smart-phohes by giving vocal commands. It includes enhanced feature to search and open any file or folder stored on the hard drive/storage memory of the user. When human voice command is given, the speech recognition unit scans the storage unit having the list of files that are supported in the application. The list mentioned here consists of two items filename and its path. This list is generated by scanning the computer/smart-phones hard drive/storage memory based on the file extension and getting the file-names and its path. The list also contains all the folder-names and their paths too. To get better results of speech recognition this list is provided as a grammar to the speech recognition unit. When user gives a speech command the speech recognition unit scans the grammar for a desired result. When speech recognition unit gets that result, the human speech is actually converted to text. Examining that text system performs the operations that are requested by the user.
Example 1:-
On a windows 7os computer user wants to open a file called "Summer of 69.mp3"which has location "d :/abc/Bryan Adams/ Summer of 69.mp3" is the user will say " play Summer of 69" and the file will get opened in the default application set to that extension.
Initially when the system is loading the list of filenames is provided to the speech recognition unit. After the speech command is inputted the speech recognition unit converts the speech into the best suited text which is present in the grammar. After the speech recognition unit converts the speech to text that text is analysed and examined against the list that contains the filenames. The result of the examination is the actual name of the file user wants to open. So now the filename is also obtained. The only task remaining is to find the path for that filename and open that file using that path.
Example 2:-
Consider on a windows 7os machine that user wants to move a file called "Summer of 69.mp3" which has location "d :/abc/Bryan Adams/ Summer of 69.mp3" to location "e:/xyz/Old Classics" all that user has to do is say " move Summer of 69 to Old Classics " and the file will get moved to location "e:/xyz/Old Classics/ Summer of 69.mp3".
Above examples briefly explains the basic functioning of the system. By making changes specific to an operating system the system will work on the other operating systems as well.
The r system/devices which the users intend to operate using voice command receives the human voice inputted through a microphone. The Human voice stream is then converted to text by the Microsoft speech SDK. The speech recognition unit recognizes the text from the containing grammars. Once the words are recognized respective operations are performed.
The invention also facilitates to add names of new web-sites to the speech recognition unit grammar thereby providing the functionality to open the web-site of user choice. The user is allowed to add user defined words or phrases to the existing grammar. The system provides the additional feature to take dictations for" writing the application development systems regardless of which developing environment user uses on user specific editors like Eclipse, Visual Studio, Xcode, SQL-Server etc.
This invention is designed to be installed on any Microsoft windows Vista/7/8 onwards, Windows phone 8 onwards, Mac-OS 10.5 onwards, iOS4 onwards and Android-OS 2.2 onwards to the latest versions of all the supported operating systems.
This application is also equipped with new feature that will minimize the extraneous sound. By doing so the input that is provided to the speech recognition unit carries pure human voice and partially some other sound but at very reduced intensity.
When the user gives speech command there is always a possibility that there might be some extraneous sound in the surroundings. When the speech command is inputted to the speech recognition unit, the full input is human-voice + extraneous sound. By using the frequency high-pass and low-pass filter which can be set to get output as the human voice only applying to the microphone input will result in the pure human voice with minimum extraneous sound. This enables speech recognition unit to have better accuracy of speech recognition. For this purpose the system includes speech filtering unit to filter out the human speech portion from the speech command.
The invention is basically divided into 5 different sections:
- Sound filtering
- Handling of the operating system
- Opening, moving, adding, deleting and modifying files and folders
- Normal dictation
- Programming dictation
Further enhancements can be done independently on any/all sections independently. Example 3:
When the user says 'update directory information', the system finds all the claimed folders and indexes them into the database. Once the folders are indexed the user then says the file name as open document hello.txt' the unit recognises the filename said and acquires the file path after which the system opens the file hello.txt in the default text editor i.e. notepad.
Following are the examples of some of the tasks to be performed by human speech instructions:
Figure imgf000016_0001
Open file "Name of any file" Opens the specified file in its default application.
Open website "Name of any Opens the specified website in website" default web browser.
Modify (Update) ^ Directory Updates the list of directories Information stored on hard drive/storage memory
Modify (Update) Files Information Updates the list of files stored on hard drive/storage memory
Modify (Update) Websites Updates the list of websites
Information
Now referring to figure 2, the detailed process block diagram associated with the system of present invention is illustrated. The details of the blocks are as follow:
1. Starts or stops the speech input unit e.g. microphone
2. Microphone to capture the human voice input
3. Speech recognition unit containing grammars
4. Storage Unit i.e. Database for the indexing of files and folders
5. Selecting the stated modes of operation
6. Operating and managing the operating system
7. Normal Dictation or programming dictation input
8. Searching files/folders and opening files/folders, websites.
9. Perform the user requested operation
10. Write the text
11. Open the requested file/folder/web-site
12. Wait for next human voice command While various embodiments of the present invention have been described above, it should be understood that they have beep presented by way of example only, and not limitation. It will be understood by those skilled in the relevant art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

We Claim:
1. -A system to enable user to interact with an electronic processing device using voice of the user, the system being inbuilt in the electronic processing device, the system comprising: a speech input unit to receive, from the user, speech command in respect of task to be performed in the electronic processing device; a storage unit for storing data and instructions, the data includes list of targets, the instructions include list of tasks and description about the tasks; speech recognition unit being connected to the speech input unit and the storage unit, the speech recognition unit configured to receive output of the speech input unit and convert the speech command into best proximal text, the text includes information about the task to be performed and target on which task being performed; a task processing and performing unit being connected to the speech recognition unit and the storage unit, the task processing and performing unit being configured to:
- receive, from the speech recognition unit, the text equivalent of the speech command of the user,
- compare the text with data and instructions stored in the storage unit, - identify the task and the target from the text on the basis of comparison, and
- perform the ask on the target; and a task output unit being connected to the task processing and performing unit and configured to realize the performance of the task on the target.
2. The system to enable user to interact with an electronic processing device using voice of the user as claimed in claim 1 wherein target includes files, folder and web address.
3. The system to enable user to interact with an electronic processing device using voice of the user as claimed in claim 2 wherein files are audio, video, audio-visual, image or text related.
4. The system to enable user to interact with an electronic processing device using voice of the user as claimed in any of the claims 1 to 3 wherein task includes at least any of the following activities on the target:
- opening,
- copying,
- moving, .
- modifying,
- searching and
- deleting.
5. _The system to enable user to interact with an electronic processing device using voice of the user as claimed in any of the claims 1 to 4 wherein the task processing and performing unit is further configured to add new tasks and targets in the storage unit.
6. The system to enable user to interact with an electronic processing device using voice of the user as claimed in any of the claims 1 to 5 wherein the task processing and performing unit is further configured to take dictations compatible for all programming languages.
7. The system to enable user to interact with an electronic processing device using voice of the user as claimed in any of the claims 1 to 6 further includes speech filtering unit to filter out the human speech portion from the speech command.
PCT/IN2014/000499 2013-08-21 2014-07-28 A system to enable user to interact with an electronic processing device using voice of the user WO2015025330A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN2727MU2013 2013-08-21
IN2727/MUM/2013 2013-08-21

Publications (1)

Publication Number Publication Date
WO2015025330A1 true WO2015025330A1 (en) 2015-02-26

Family

ID=51743525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2014/000499 WO2015025330A1 (en) 2013-08-21 2014-07-28 A system to enable user to interact with an electronic processing device using voice of the user

Country Status (1)

Country Link
WO (1) WO2015025330A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9779735B2 (en) 2016-02-24 2017-10-03 Google Inc. Methods and systems for detecting and processing speech signals
US9792914B2 (en) 2014-07-18 2017-10-17 Google Inc. Speaker verification using co-location information
US9972320B2 (en) 2016-08-24 2018-05-15 Google Llc Hotword detection on multiple devices
US10134398B2 (en) 2014-10-09 2018-11-20 Google Llc Hotword detection on multiple devices
US10395650B2 (en) 2017-06-05 2019-08-27 Google Llc Recorded media hotword trigger suppression
US10497364B2 (en) 2017-04-20 2019-12-03 Google Llc Multi-user authentication on a device
US10692496B2 (en) 2018-05-22 2020-06-23 Google Llc Hotword suppression
US10867600B2 (en) 2016-11-07 2020-12-15 Google Llc Recorded media hotword trigger suppression
US11676608B2 (en) 2021-04-02 2023-06-13 Google Llc Speaker verification using co-location information
US11942095B2 (en) 2014-07-18 2024-03-26 Google Llc Speaker verification using co-location information

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6088671A (en) 1995-11-13 2000-07-11 Dragon Systems Continuous speech recognition of text and commands
US6173259B1 (en) 1997-03-27 2001-01-09 Speech Machines Plc Speech to text conversion
US20040044516A1 (en) * 2002-06-03 2004-03-04 Kennewick Robert A. Systems and methods for responding to natural language speech utterance
US6728676B1 (en) 2000-10-19 2004-04-27 International Business Machines Corporation Using speech recognition to improve efficiency of an inventory task
US6871179B1 (en) * 1999-07-07 2005-03-22 International Business Machines Corporation Method and apparatus for executing voice commands having dictation as a parameter
US20060206339A1 (en) * 2005-03-11 2006-09-14 Silvera Marja M System and method for voice-enabled media content selection on mobile devices
US7257531B2 (en) 2002-04-19 2007-08-14 Medcom Information Systems, Inc. Speech to text system using controlled vocabulary indices
US20070233725A1 (en) * 2006-04-04 2007-10-04 Johnson Controls Technology Company Text to grammar enhancements for media files
US20100169098A1 (en) * 2007-05-17 2010-07-01 Kimberly Patch System and method of a list commands utility for a speech recognition command system
US8165887B2 (en) 2008-12-08 2012-04-24 Nuance Communications, Inc. Data-driven voice user interface
US20120232906A1 (en) 2008-10-02 2012-09-13 Lindahl Aram M Electronic Devices with Voice Command and Contextual Data Processing Capabilities
US8275617B1 (en) 1998-12-17 2012-09-25 Nuance Communications, Inc. Speech command input recognition system for interactive computer display with interpretation of ancillary relevant speech query terms into commands
US8306819B2 (en) 2009-03-09 2012-11-06 Microsoft Corporation Enhanced automatic speech recognition using mapping between unsupervised and supervised speech model parameters trained on same acoustic training data
US8340975B1 (en) 2011-10-04 2012-12-25 Theodore Alfred Rosenberger Interactive speech recognition device and system for hands-free building control

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6088671A (en) 1995-11-13 2000-07-11 Dragon Systems Continuous speech recognition of text and commands
US6173259B1 (en) 1997-03-27 2001-01-09 Speech Machines Plc Speech to text conversion
US8275617B1 (en) 1998-12-17 2012-09-25 Nuance Communications, Inc. Speech command input recognition system for interactive computer display with interpretation of ancillary relevant speech query terms into commands
US6871179B1 (en) * 1999-07-07 2005-03-22 International Business Machines Corporation Method and apparatus for executing voice commands having dictation as a parameter
US6728676B1 (en) 2000-10-19 2004-04-27 International Business Machines Corporation Using speech recognition to improve efficiency of an inventory task
US7257531B2 (en) 2002-04-19 2007-08-14 Medcom Information Systems, Inc. Speech to text system using controlled vocabulary indices
US20040044516A1 (en) * 2002-06-03 2004-03-04 Kennewick Robert A. Systems and methods for responding to natural language speech utterance
US20060206339A1 (en) * 2005-03-11 2006-09-14 Silvera Marja M System and method for voice-enabled media content selection on mobile devices
US20070233725A1 (en) * 2006-04-04 2007-10-04 Johnson Controls Technology Company Text to grammar enhancements for media files
US20100169098A1 (en) * 2007-05-17 2010-07-01 Kimberly Patch System and method of a list commands utility for a speech recognition command system
US20120232906A1 (en) 2008-10-02 2012-09-13 Lindahl Aram M Electronic Devices with Voice Command and Contextual Data Processing Capabilities
US8165887B2 (en) 2008-12-08 2012-04-24 Nuance Communications, Inc. Data-driven voice user interface
US8306819B2 (en) 2009-03-09 2012-11-06 Microsoft Corporation Enhanced automatic speech recognition using mapping between unsupervised and supervised speech model parameters trained on same acoustic training data
US8340975B1 (en) 2011-10-04 2012-12-25 Theodore Alfred Rosenberger Interactive speech recognition device and system for hands-free building control

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10460735B2 (en) 2014-07-18 2019-10-29 Google Llc Speaker verification using co-location information
US9792914B2 (en) 2014-07-18 2017-10-17 Google Inc. Speaker verification using co-location information
US11942095B2 (en) 2014-07-18 2024-03-26 Google Llc Speaker verification using co-location information
US10147429B2 (en) 2014-07-18 2018-12-04 Google Llc Speaker verification using co-location information
US10986498B2 (en) 2014-07-18 2021-04-20 Google Llc Speaker verification using co-location information
US10134398B2 (en) 2014-10-09 2018-11-20 Google Llc Hotword detection on multiple devices
US11915706B2 (en) 2014-10-09 2024-02-27 Google Llc Hotword detection on multiple devices
US11557299B2 (en) 2014-10-09 2023-01-17 Google Llc Hotword detection on multiple devices
US10909987B2 (en) 2014-10-09 2021-02-02 Google Llc Hotword detection on multiple devices
US10593330B2 (en) 2014-10-09 2020-03-17 Google Llc Hotword detection on multiple devices
US10255920B2 (en) 2016-02-24 2019-04-09 Google Llc Methods and systems for detecting and processing speech signals
US10878820B2 (en) 2016-02-24 2020-12-29 Google Llc Methods and systems for detecting and processing speech signals
US11568874B2 (en) 2016-02-24 2023-01-31 Google Llc Methods and systems for detecting and processing speech signals
US10163443B2 (en) 2016-02-24 2018-12-25 Google Llc Methods and systems for detecting and processing speech signals
US9779735B2 (en) 2016-02-24 2017-10-03 Google Inc. Methods and systems for detecting and processing speech signals
US10163442B2 (en) 2016-02-24 2018-12-25 Google Llc Methods and systems for detecting and processing speech signals
US10249303B2 (en) 2016-02-24 2019-04-02 Google Llc Methods and systems for detecting and processing speech signals
US11887603B2 (en) 2016-08-24 2024-01-30 Google Llc Hotword detection on multiple devices
US11276406B2 (en) 2016-08-24 2022-03-15 Google Llc Hotword detection on multiple devices
US10714093B2 (en) 2016-08-24 2020-07-14 Google Llc Hotword detection on multiple devices
US10242676B2 (en) 2016-08-24 2019-03-26 Google Llc Hotword detection on multiple devices
US9972320B2 (en) 2016-08-24 2018-05-15 Google Llc Hotword detection on multiple devices
US10867600B2 (en) 2016-11-07 2020-12-15 Google Llc Recorded media hotword trigger suppression
US11798557B2 (en) 2016-11-07 2023-10-24 Google Llc Recorded media hotword trigger suppression
US11257498B2 (en) 2016-11-07 2022-02-22 Google Llc Recorded media hotword trigger suppression
US11087743B2 (en) 2017-04-20 2021-08-10 Google Llc Multi-user authentication on a device
US11727918B2 (en) 2017-04-20 2023-08-15 Google Llc Multi-user authentication on a device
US10522137B2 (en) 2017-04-20 2019-12-31 Google Llc Multi-user authentication on a device
US10497364B2 (en) 2017-04-20 2019-12-03 Google Llc Multi-user authentication on a device
US11238848B2 (en) 2017-04-20 2022-02-01 Google Llc Multi-user authentication on a device
US11721326B2 (en) 2017-04-20 2023-08-08 Google Llc Multi-user authentication on a device
US11244674B2 (en) 2017-06-05 2022-02-08 Google Llc Recorded media HOTWORD trigger suppression
US10395650B2 (en) 2017-06-05 2019-08-27 Google Llc Recorded media hotword trigger suppression
US11373652B2 (en) 2018-05-22 2022-06-28 Google Llc Hotword suppression
US10692496B2 (en) 2018-05-22 2020-06-23 Google Llc Hotword suppression
US11676608B2 (en) 2021-04-02 2023-06-13 Google Llc Speaker verification using co-location information

Similar Documents

Publication Publication Date Title
WO2015025330A1 (en) A system to enable user to interact with an electronic processing device using voice of the user
US10169329B2 (en) Exemplar-based natural language processing
JP6204982B2 (en) Contextual query tuning using natural motion input
JP3962763B2 (en) Dialogue support device
EP3528243A1 (en) System for processing user utterance and controlling method thereof
KR20180121097A (en) Voice data processing method and electronic device supporting the same
US20140019128A1 (en) Voice Based System and Method for Data Input
KR20180117485A (en) Electronic device for processing user utterance and method for operation thereof
US10664229B2 (en) Search-based dynamic voice activation
US20160372110A1 (en) Adapting voice input processing based on voice input characteristics
US20220148572A1 (en) Server supported recognition of wake phrases
US11580969B2 (en) Artificial intelligence device and method of operating artificial intelligence device
US10586528B2 (en) Domain-specific speech recognizers in a digital medium environment
CN111954864A (en) Automated presentation control
US20220148576A1 (en) Electronic device and control method
KR102630662B1 (en) Method for Executing Applications and The electronic device supporting the same
KR20180109465A (en) Electronic device and method for screen controlling for processing user input using the same
US10565317B1 (en) Apparatus for improving responses of automated conversational agents via determination and updating of intent
JP6069157B2 (en) Information processing apparatus, control method, and program
CN110827822A (en) Intelligent voice interaction method and device, travel terminal, equipment and medium
CN103631784B (en) Page content retrieval method and system
US11620328B2 (en) Speech to media translation
US20210055909A1 (en) Systems and methods for voice activated interface
US20220301549A1 (en) Electronic device and method for providing voice recognition service
CN108255917A (en) Image management method, equipment and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14786367

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14786367

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 14786367

Country of ref document: EP

Kind code of ref document: A1