US20140172953A1 - Response Endpoint Selection - Google Patents

Response Endpoint Selection Download PDF

Info

Publication number
US20140172953A1
US20140172953A1 US13/715,741 US201213715741A US2014172953A1 US 20140172953 A1 US20140172953 A1 US 20140172953A1 US 201213715741 A US201213715741 A US 201213715741A US 2014172953 A1 US2014172953 A1 US 2014172953A1
Authority
US
United States
Prior art keywords
user
computer
computing device
response
recited
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/715,741
Other versions
US9271111B2 (en
Inventor
Scott Ian Blanksteen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Rawles LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Assigned to RAWLES LLC reassignment RAWLES LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BLANKSTEEN, SCOTT IAN
Priority to US13/715,741 priority Critical patent/US9271111B2/en
Application filed by Rawles LLC filed Critical Rawles LLC
Priority to EP13861696.6A priority patent/EP2932371B1/en
Priority to CN201380063208.1A priority patent/CN105051676B/en
Priority to PCT/US2013/071488 priority patent/WO2014092980A1/en
Priority to JP2015544158A priority patent/JP2016502192A/en
Publication of US20140172953A1 publication Critical patent/US20140172953A1/en
Assigned to AMAZON TECHNOLOGIES, INC. reassignment AMAZON TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RAWLES LLC
Priority to US15/049,914 priority patent/US10778778B1/en
Publication of US9271111B2 publication Critical patent/US9271111B2/en
Application granted granted Critical
Priority to US17/016,769 priority patent/US20210165630A1/en
Priority to US18/149,127 priority patent/US20230141659A1/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L67/42
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/18Commands or executable codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • H04L67/306User profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/33Services specially adapted for particular environments, situations or purposes for indoor environments, e.g. buildings

Definitions

  • Homes, offices and other places are becoming more connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices.
  • computing devices such as desktops, tablets, entertainment systems, and portable communication devices.
  • many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, gesture, and even through natural language input such as speech.
  • computing devices As computing devices evolve, users are expected to rely more and more on such devices to assist them in routine tasks. Today, it is commonplace for computing devices to help people buy tickets, shop for goods and services, check the weather, find and play entertainment, and so forth. However, with the growing ubiquity of computing devices, it is not uncommon for users to have many devices, such as a smartphone, e-book reader, a tablet, a computer, an entertainment system, and so forth.
  • One of the challenges for multi-device users is how to perform tasks effectively when working with multiple devices. Coordinating a task among multiple devices is non-trivial.
  • FIG. 1 illustrates an environment in which multiple computing devices, including voice controlled devices, are ubiquitous and coordinated to assist a person in handling routine tasks.
  • FIG. 2 shows a representative scenario of a person using the computing environment to assist with the task.
  • FIG. 2 includes a functional block diagram of select components of computing devices in the environment as well as remote cloud services accessible via a network.
  • FIG. 3 shows how devices are selected to engage the person during performance of the task.
  • FIG. 4 shows a block diagram of selected components of computing devices that may be used in the environment.
  • FIG. 5 is a flow diagram showing an illustrative process for aiding the person in performing a task, including receiving a request from the person via one device and delivering a response to the person via another device.
  • FIG. 6 is a flow diagram showing an illustrative process for determining a location of the person.
  • FIG. 7 is a flow diagram showing an illustrative process for determining a device to which to deliver the response to the person.
  • Described herein are techniques to leverage various computing devices to assist in routine tasks. As computing devices become ubiquitous in homes, offices, and other places, users are less likely to differentiate among them when thinking about and performing these routine tasks. The users will increasingly expect the devices to intelligently help, regardless of where the users are located and what the users might currently be doing. To implement this intelligence, a computing system is architected to organize task management across multiple devices with which the user may interact.
  • the computing system is constructed as a cloud service that uses a variety of implicit and explicit signals to determine presence of a user in a location and to decide which, if any, assistance or responses to provide to one or more devices within that location.
  • the signals may represent any number of indicia that can help ascertain the whereabouts of the user and how best to interact with the person at that time, and at that location.
  • Representative signals may include audio input (e.g., sound of a user's voice), how recently the user interacted with a device, presence of a mobile device associated with the user, visual recognition of the user, and so forth.
  • the computing system may ask the computing system, via a first device, to remind him at a future time to do the household chore or work task.
  • the computing system may then subsequently, at the future time, remind the user via a second device that is appropriate in the current circumstances to deliver that message.
  • the computing system understands who is making the request, determines when to provide the reminder to the user, ascertains where the user is when it is time to remind him, discovers which devices are available to deliver the reminder, and evaluates which of the available devices is best to deliver the reminder.
  • the computing system implements response functionality that includes intelligent selection of endpoint devices.
  • the various operations to implement this intelligence may be split among local devices and remote cloud computing systems.
  • different modules and functionality may reside locally in the devices proximal to the user, or remotely in the cloud servers. This disclosure provides one example implementation in which a significant portion of the response system resides in the remote cloud computing system.
  • this disclosure describes the techniques in the context of local computing devices that are primarily voice operated, such as dedicated voice controlled devices. Receiving verbal requests and providing audible responses introduce some additional challenges, which the system described below is configured to address. However, use of voice controlled devices is not intended to be limiting as other forms of engaging the user (e.g., gesture input, typed input, visual output, etc.) may be used by the computing system.
  • FIG. 1 shows an illustrative architecture of a computing system 100 that implements response functionality with intelligent endpoint selection.
  • the system 100 is described in the context of users going about their normal routines and interacting with the computing system 100 throughout the day.
  • the computing system 100 is configured to receive requests given by users at respective times and locations, process those requests, and return responses at other respective times, to locations at which the users are present, and to appropriate endpoint devices.
  • a house 102 is a primary residence for a family of three users, including a first user 104 (e.g., adult male, dad, husband, etc.), a second user 106 (e.g., adult female, mom, wife, etc.), and a third user 108 (e.g., daughter, child, girl, etc.).
  • the house is shown with five rooms including a master bedroom 110 , a bathroom 112 , a child's bedroom 114 , a living room 116 , and a kitchen 118 .
  • the users 104 - 108 are located in different rooms in the house 102 , with the first user 104 in the master bedroom 110 , the second user 106 in the living room 116 , and the third user 108 in the child's bedroom 114 .
  • the computing system 100 includes multiple local devices or endpoint devices 120 ( 1 ), . . . , 120 (N) positioned at various locations to interact with the users. These devices may take on any number of form factors, such as laptops, electronic book (eBook) reader devices, tablets, desktop computers, smartphones, voice controlled devices, entertainment device, augmented reality systems, and so forth.
  • the local devices include a voice controlled device 120 ( 1 ) residing in the bedroom 110 , a voice controlled device 120 ( 2 ) in the child's bedroom 114 , a voice controlled device 120 ( 3 ) in the living room 116 , a laptop 120 ( 4 ) in the living room 116 , and a voice controlled device 120 ( 5 ) in the kitchen 118 .
  • the computing system 100 may rely on other user-side devices found outside the home, such as in an automobile 122 (e.g., car phone, navigation system, etc.) or at the first user's office 124 (e.g., work computer, tablet, etc.) to convey information to the user.
  • automobile 122 e.g., car phone, navigation system, etc.
  • first user's office 124 e.g., work computer, tablet, etc.
  • Each of these endpoint devices 120 ( 1 )-(N) may receive input from a user and deliver responses to the same user or different users.
  • the input may be received in any number of ways, including as audio or verbal input, gesture input, and so forth.
  • the responses may also be delivered in any number of forms, including as audio output, visual output (e.g., pictures, UIs, videos, etc. depicted on the laptop 120 ( 4 ) or television 120 ( 9 )), haptic feedback (e.g., vibration of the smartphone 120 ( 6 ), etc.), and the like.
  • the computing system 100 further includes a remote computing system, such cloud services 130 supported by a collection of network-accessible devices or servers 132 .
  • the cloud services 130 generally refer to a network-accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network, such as the Internet. Cloud services 130 may not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated with cloud services include “on-demand computing”, “software as a service (SaaS)”, “platform computing”, “network accessible platform”, and so forth.
  • the cloud services 130 coordinate request input and response output among the various local devices 120 ( 1 )-(N).
  • a user such as the user 104
  • This request may be a verbal request, such as the user 104 speaking to the voice controlled device 120 ( 1 ) in the master bedroom 110 .
  • the user may say, “Please remind me to take out the garbage tomorrow morning.”
  • the voice controlled device 120 ( 1 ) is equipped with microphones to receive the audio input and a network interface to pass the request to the cloud services 130 .
  • the local device 120 ( 1 ) may optionally have natural language processing functionality to begin processing of the speech content.
  • the request is passed to the cloud services 130 over a network (not shown in FIG. 1 ) where the request is processed.
  • the request is parsed and interpreted.
  • the cloud services 130 determine that the user wishes to be reminded of the household chore to take out the garbage at a specified timeframe (i.e., tomorrow morning).
  • the cloud services 130 implements a task handler to define a task that schedules a reminder to be delivered to the user at the appropriate time (e.g., 7:00 AM).
  • the cloud services 130 determine where the target user who made the request, i.e., the first user 104 , is located.
  • the cloud services 130 may use any number of techniques to ascertain the user's whereabouts, such as polling devices in the area to get an audio, visual, or other biometric confirmation of presence, or locating a device that might be personal or associated with the user (e.g., smartphone 120 ( 6 )), or through other secondary indicia, such as the user's history of activity, receipt of other input from the user from a specific location, and so forth.
  • polling devices in the area to get an audio, visual, or other biometric confirmation of presence, or locating a device that might be personal or associated with the user (e.g., smartphone 120 ( 6 )), or through other secondary indicia, such as the user's history of activity, receipt of other input from the user from a specific location, and so forth.
  • the cloud services 130 may then determine which local device is suitable to deliver the response to the user. In some cases, there may be only a single device and hence the decision is straightforward. However, in other situations, the user may be located in an area having multiple local devices, any one of which may be used to convey the response. In such situations, the cloud services 130 may evaluate the various candidate devices, and select the best or more appropriate device in the circumstances to deliver the response.
  • the computing system 100 provides a coordinated response system that utilizes ubiquitous devices available in the user's environment to receive requests and deliver responses.
  • the endpoint devices used for receipt of the request and deliver of the response may be different.
  • the devices need not be associated with the user in any way, but rather generic endpoint devices that are used as needed to interact with the user. To illustrate the flexibility of the computing system, the following discussion continues the earlier example of a user asking to be reminded to perform a household chore.
  • FIG. 2 illustrates select devices in the computing system 100 to show a representative scenario of a person using the computing environment to assist with the task.
  • two endpoint devices are shown, with a first endpoint device in the form of the voice controlled assistant 120 ( 1 ) residing in the bedroom 110 and the second endpoint device in the form of the voice controlled assistant 120 ( 5 ) residing in the kitchen 118 .
  • the endpoint devices 120 ( 1 ) and 120 ( 5 ) are coupled to communicate with the remote cloud services 130 via a network 202 .
  • the network 202 may be representative of any number of network types, such as wired networks (e.g., cable, LAN, etc.) and/or wireless networks (e.g., Bluetooth, RF, cellular, satellite, etc.).
  • Each endpoint or local device as represented by the bedroom-based device 120 ( 1 ), is equipped with one or more processors 204 , computer-readable media 206 , one or more microphones 208 , and a network interface 210 .
  • the computer-readable media 206 may include volatile and nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data.
  • Local program modules 212 are shown stored in the media 206 for execution by the processor(s) 204 .
  • the local modules 206 provide basic functionality to receive and process audio input received via the microphones 208 .
  • the functionality may include filtering signals, analog-to-digital conversion, parsing sounds or words, and early analysis of the parsed sounds or words.
  • the local modules 212 may include a wake word recognition module to recognize wake words that are used to transition the voice controlled assistant 120 ( 1 ) to an awake state for receiving input from the user.
  • the local modules 212 may further include some natural language processing functionality to begin interpreting the voice input from the user. To continue the above example, suppose the user 104 makes a request to the voice controlled assistant 120 ( 1 ) in the bedroom 110 at a first time of 9:30 PM.
  • the request is for a reminder to perform a household chore in the morning.
  • the user 104 speaks a wake word to alert the device 120 ( 1 ) and then verbally gives the request, “Remind me to take out the garbage tomorrow morning” as indicated by the dialog bubble 213 .
  • the microphone(s) 208 receive the audio input and the local module(s) 212 process and recognize the wake word to initiate other modules.
  • the audio input may be parsed and partially analyzed, and/or packaged and sent via the interface 210 and network 202 to the cloud services 130 .
  • the cloud services 130 include one or more network-accessible devices, such as servers 132 .
  • the servers 132 may include one or more processors 214 and computer-readable media 216 .
  • the processor(s) 214 and the computer-readable media 216 of the servers 132 are physically separate from the processor(s) 204 and computer-readable media 206 of the device 120 ( 1 ), but may function jointly as part of a system that provides processing and memory in part on the device 120 and in part on the cloud services 130 .
  • These servers 132 may be arranged in any number of ways, such as server farms, stacks, and the like that are commonly used in data centers.
  • the servers 132 may store and execute any number of programs, data, applications, and the like to provide services to the user.
  • the servers 132 are shown to store and execute natural language processing (NLP) modules 218 , a task handler 222 , a person location module 224 , and various applications 224 .
  • the NLP modules 218 process the audio content received from the local device 120 ( 1 ) to interpret the request. If the local device is equipped with at least some NLP capabilities, the NLP modules 218 may take that partial results and complete the processing to interpret the user's verbal request.
  • the resulting interpretation is passed to the task handler 220 to handle the request.
  • the NLP modules 218 interpret the user's input as requesting a reminder to be scheduled and delivered at the appropriate time.
  • the task handler 220 defines a task to set a reminder to be delivered at a time period associated with “tomorrow morning”.
  • the task might include the contents (e.g., a reminder to “Don't forget to take out the garbage”), a time for delivery, and an expected location of delivery.
  • the delivery time and expected location may be ascertained from secondary indicia that the service 130 aggregates and searches. For instance, the task handler 220 may consult other indicia to better understand what “tomorrow morning” might mean for this particular user 104 .
  • One of the applications 224 may be a calendar that shows the user has a meeting at the office at 7:30 AM, and hence is expected to leave the house 102 by 7:00 AM. Accordingly, the task handler 220 may narrow the range of possible times to before 7:00 AM. The task handler 220 may further request activity history from a user profile application (another of the applications 224 ) to determine whether the user has a normal morning activity. Suppose, for example, that the user has shown a pattern of arising by 6:00 AM and having breakfast around 6:30 AM. From these additional indicia, the task handler 220 may decide an appropriate time to deliver the reminder to be around 6:30 AM on the next day.
  • the task handler 220 may further deduce that the user is likely to be in the kitchen at 6:30 AM the next day. From this analysis, the task handler 220 sets a task for this request.
  • a task is defined to deliver a reminder message at 6:30 AM on the next day to a target user 104 via an endpoint device proximal to the kitchen 118 . That is, the task might be structured as including data items of content, date/time, user identity, default endpoint device, and default location.
  • the cloud services 130 may return a confirmation to the user to be played by the first device 120 ( 1 ) that received the request while the user is still present.
  • the cloud services 130 might send a confirmation to be played by the bedroom device 120 ( 1 ), such as a statement “Okay Scott, I'll remind you”, as shown by dialog bubble 215 .
  • the user experience is one of a conversation with a computing system. The user casually makes a request and the system responds in conversation.
  • the statement may optionally include language such as “tomorrow at 6:30 am in the kitchen” to provide confirmation of the intent and an opportunity for the user to correct the system's understanding and plan.
  • the person location module 222 may further be used to help locate the user and an appropriate endpoint device when the time comes to deliver the response.
  • the task handler 220 might instruct the person location module 222 to help confirm a location of the user 104 as the delivery time of 6:30 AM approaches.
  • the person location module 222 may attempt to locate the user 104 by evaluating a location of a personal device that he carries, such as his smartphone 120 ( 6 ). Using information about the location of the smartphone 120 ( 6 ) (e.g., GPS, trilateration from cell towers, Wi-Fi base station proximity, etc.), the person location module 222 may be able to confirm that the user is indeed in the house 102 .
  • the person location module 222 may ask the local device 120 ( 5 ) to confirm that the target user 104 is in the kitchen 118 .
  • the person location module 222 may direct the local device 120 ( 5 ) to listen for voices and then attempt to confirm that one of them is the target user 104 .
  • the local device 120 ( 5 ) may provide a greeting to the target user, using the user's name, such as “Good morning Scott” as indicated by dialog bubble 226 . If the target user 104 is present, the user may answer “Good morning”, as indicated by the dialog bubble 228 .
  • the local device 120 ( 5 ) may be equipped with voice recognition functionality to identify the target user by capturing his voice in the environment.
  • the person location module 222 may request a visual image from the camera 120 ( 8 ) (See FIG. 1 ) in the kitchen to get a visual confirmation that the target user 104 is in the kitchen.
  • the task handler 220 engages an endpoint device to deliver the response.
  • the task handler 220 contacts the voice controlled assistant 120 ( 5 ) in the kitchen 118 to send the response.
  • the content from the reminder task is extracted and sent to the device 120 ( 5 ) for playback over the speaker.
  • the voice controlled assistant audibly emits the reminder, “Don't forget to take out the garbage” as indicated by the dialog bubble 230 .
  • the computing system 100 is capable of receiving user input from one endpoint or local device 120 , processing the user input, and providing a timely response via another endpoint or local device 120 .
  • the user need not remember which device he gave the request, or specify which device he receives the response. Indeed, it might be any number of devices. Instead, the user experience is enhanced by the ubiquity of the devices, and the user will merely assume that the computer-enabled assistant system intuitively listened to the request and provided a timely response.
  • the cloud services 130 may involve evaluating the various devices to find a best fit for the circumstances. Accordingly, one of the applications 224 may be an endpoint device selection module that attempts to identify the best local endpoint device for engaging the user.
  • One example scenario is provided next to illustrate possible techniques for ascertaining the best device.
  • FIG. 3 shows how local endpoint devices are selected to engage the target person during performance of the task.
  • four local endpoint devices 302 , 304 , 306 , and 308 are shown in four areas or zones A-D, respectively.
  • the zones A-D may represent different rooms, physical areas of a larger room, and so forth.
  • the target user 104 is in Zone D. But, he is not alone.
  • four other people are shown in the same zone D.
  • An endpoint device selector 310 is shown stored in the computer-readable media 216 for execution on the processor(s) 214 .
  • the endpoint device selector 310 is configured to identify available devices to engage the user 104 , and then analyze them to ascertain the most appropriate device in the circumstances.
  • anyone of the four devices 302 - 308 may be identified as “available” devices that are sufficient proximal to communicate with the user 104 .
  • available devices There are many ways to determine available devices, such as detecting devices known to be physically in or near areas proximal to the user, finding devices that pick up audio input from the user (e.g., casual conversation in a room), devices associated with the user, user preferences, and so forth.
  • the endpoint device selector 310 next evaluates which of the available devices is most appropriate under the circumstances. There are several ways to make this evaluation. In one approach, a distance analysis may be performed to determine the distances between a device and the target person. As shown in FIG. 3 , the voice controlled assistant 308 is physically closest to the target user 104 at a distance D 1 and the voice controlled assistant 306 is next closest at a distance D 2 . Using distance, the endpoint device selector 310 may choose the closest voice controlled assistant 308 to deliver the response. However, physical proximity may not be the best in all circumstances.
  • audio characteristics in the environment surrounding the user 104 may be analyzed. For instance, the signal-to-noise ratios are measured at various endpoint devices 302 - 308 to ascertain which one is best at hearing the user to the exclusion of other noise.
  • the background volume may be analyzed to determine whether the user is in an area of significant background noise, such as the result of a conversation of many people or background audio from a television or appliance.
  • Still another possibility is to analyze echo characteristics of the area, as well as perhaps evaluate Doppler characteristics that might be introduced as the user is moving throughout one or more areas. That is, verbal commands from the user may reach different devices in with more or less clarity and strength depending upon the movement and orientation of the user.
  • environment observations may be analyzed. For instance, a number of people in the vicinity may be counted based on data from cameras (if any) or recognition of distinctive voices.
  • a combination of physical proximity, sound volume-based determination, and/or visual observation may indicate that the closest endpoint device is actually physically separated from the target user by a structural impediment (e.g., the device is located on the other side of a wall in an adjacent room). In this case, even though the device is proximally the closest in terms of raw distance, the endpoint device selector 310 removes the device from consideration. These are but a few examples.
  • any one or more of these analyses may be performed to evaluate possible endpoint devices.
  • the endpoint device selector 310 determines that the noise level and/or number of people in zone D are too high to facilitate effective communication with the target user 104 .
  • the endpoint selector 310 may direct the voice controlled assistant 306 in zone C to communicate with the target user 104 .
  • the assistant 306 may first attempt to get the user's attention by playing a statement to draw the user closer, such as “Scott, I have a reminder for you” as represented by the dialog bubble 312 .
  • the user 104 may move closer to the device 306 in zone C, thereby shrinking the distance D 2 to a more suitable length. For instance, the user 104 may move from a first location in zone D to a new location in zone C as shown by an arrow labeled “scenario A”. Thereafter, the task handler 220 may deliver the reminder to take out the garbage.
  • these techniques for identifying the most suitable device for delivering the response may aid in delivery of confidential or sensitive messages. For instance, suppose the target user 104 sets a reminder to pick up an anniversary gift for his wife. In this situation, the endpoint device selector 310 will evaluate the devices in and near the user's current location in an effort to identify a device that can deliver the reminder without the user's wife being present to hear the message. For instance, suppose the user 104 moves from zone D to zone A for a temporary period of time (as illustrated by an arrow labeled “scenario B”), thereby leaving the other people (and his wife) in zone D. Once the user is detected as being alone in zone A, the task handler 220 may direct the voice controlled assistant 302 to deliver the reminder response to the user. This is shown, for example, by the statement “Don't forget to pick up your wife's anniversary present” in dialog bubble 314 .
  • aspects of the system described herein may be further used to support real time communication between two people. For example, consider a scenario where one user wants to send a message to another user in real time.
  • the first user may provide a message for delivery to the second user.
  • the first user may speak a message to a first endpoint device, which sends the message to the cloud services for processing.
  • the cloud services may then determine a location of the second user and select a second endpoint device that is available and suitable for delivery of the message to the second user.
  • the message may then be presented to the second user via the second endpoint device.
  • FIG. 4 shows selected functional components of devices 120 ( 1 )-(N) that may be used in the computing environment.
  • the devices may be implemented in any number of ways and form factors.
  • a device may be implemented as a standalone voice controlled device 120 ( 1 ) that is relatively simple in terms of functional capabilities with limited input/output components, memory, and processing capabilities.
  • the voice controlled device 120 ( 1 ) does not have a keyboard, keypad, or other form of mechanical input. Nor does it have a display or touch screen to facilitate visual presentation and user touch input.
  • the device 120 ( 1 ) may be implemented with the ability to receive and output audio, a network interface (wireless or wire-based), power, and processing/memory capabilities.
  • a limited set of one or more input components may be employed (e.g., a dedicated button to initiate a configuration, power on/off, etc.). Nonetheless, the primary and potentially only mode of user interaction with the device 120 ( 1 ) is through voice input and audible output.
  • the devices used in the system may also be implemented as a mobile device 120 ( 6 ) such as a smartphone or personal digital assistant.
  • the mobile device 120 ( 6 ) may include a touch-sensitive display screen and various buttons for providing input as well as additional functionality such as the ability to send and receive telephone calls.
  • Alternative implementations of the voice controlled device 100 may also include configuration as a computer, such as a laptop 120 ( 4 ).
  • the computer 120 ( 4 ) may include a keyboard, a mouse, a display screen, and any other hardware or functionality that is typically found on a desktop, notebook, netbook, or other personal computing devices.
  • the devices are merely examples and not intended to be limiting, as the techniques described in this disclosure may be used in essentially any device that has an ability to recognize speech input.
  • each of the devices 120 includes one or more processors 402 and computer-readable media 404 .
  • the computer-readable media 404 may include volatile and nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data.
  • Such memory includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, RAID storage systems, or any other medium which can be used to store the desired information and which can be accessed by a computing device.
  • the computer-readable media 404 may be implemented as computer-readable storage media (“CRSM”), which may be any available physical media accessible by the processor(s) 102 to execute instructions stored on the memory 404 .
  • CRSM may include random access memory (“RAM”) and Flash memory.
  • RAM random access memory
  • CRSM may include, but is not limited to, read-only memory (“ROM”), electrically erasable programmable read-only memory (“EEPROM”), or any other tangible medium which can be used to store the desired information and which can be accessed by the processor(s) 402 .
  • modules such as instruction, datastores, and so forth may be stored within the computer-readable media 404 and configured to execute on the processor(s) 402 .
  • a few example functional modules are shown as applications stored in the computer-readable media 404 and executed on the processor(s) 402 , although the same functionality may alternatively be implemented in hardware, firmware, or as a system on a chip (SOC).
  • SOC system on a chip
  • An operating system module 406 may be configured to manage hardware and services within and coupled to the device 120 for the benefit of other modules.
  • a wake word recognition module 408 and a speech recognition module 410 may employ any number of conventional speech recognition techniques such as use of natural language processing and extensive lexicons to interpret voice input.
  • the speech recognition module 410 may employ general speech recognition techniques and the wake word recognition module may include speech or phrase recognition particular to the wake word.
  • the wake word recognition module 408 may employ a hidden Markov model that represents the wake word itself. This model may be created in advance or on the fly depending on the particular implementation.
  • the speech recognition module 410 may initially be in a passive state in which the speech recognition module 410 does not recognize or respond to speech.
  • the wake word recognition module 408 may recognize or respond to wake words. Once the wake word recognition module 408 recognizes or responds to a wake word, the speech recognition module 410 may enter an active state in which the speech recognition module 410 operates to detect any of the natural language commands for which it is programmed or to which it is capable of responding. While in the particular implementation shown in FIG. 4 , the wake word recognition module 408 and the speech recognition module 410 are shown as separate modules; whereas in other implementations, these modules may be combined.
  • Other local modules 412 may also be present on the device, depending upon the implementation and configuration of the device. These modules may include more extensive speech recognition techniques, filters and echo cancellation modules, speaker detection and identification, and so forth.
  • the voice controlled device 100 may also include a plurality of applications 414 stored in the computer-readable media 404 or otherwise accessible to the device 120 .
  • the applications 414 are a music player 416 , a movie player 418 , a timer 420 , and a personal shopper 422 .
  • the voice controlled device 120 may include any number or type of applications and is not limited to the specific examples shown here.
  • the music player 416 may be configured to play songs or other audio files.
  • the movie player 418 may be configured to play movies or other audio visual media.
  • the timer 420 may be configured to provide the functions of a simple timing device and clock.
  • the personal shopper 422 may be configured to assist a user in purchasing items from web-based merchants.
  • Datastores may also be stored locally on the media 404 , including a content database 424 and one or more user profiles 426 of users that have interacted with the device 120 .
  • the content database 424 store various content that may be played or presented by the device, such as music, books, magazines, videos and so forth.
  • the user profile(s) 426 may include user characteristics, preferences (e.g., user specific wake words), usage history, library information (e.g., music play lists), online purchase history, and other information specific to an individual user.
  • the voice controlled device 120 has input devices 428 and output devices 430 .
  • the input devices 428 may include a keyboard, keypad, mouse, touch screen, joystick, control buttons, etc.
  • one or more microphones 432 may function as input devices to receive audio input, such as user voice input.
  • the input devices 428 may further include a camera to capture images of user gestures.
  • the output devices 430 may include a display, a light element (e.g., LED), a vibrator to create haptic sensations, or the like.
  • one a more speakers 434 may function as output devices to output audio sounds.
  • a user may interact with the device 120 by speaking to it, and the microphone 432 captures the user's speech.
  • the device 120 can communicate back to the user by emitting audible statements through the speaker 434 . In this manner, the user can interact with the voice controlled device 120 solely through speech, without use of a keyboard or display.
  • the voice controlled device 120 might further include a wireless unit 436 coupled to an antenna 438 to facilitate a wireless connection to a network.
  • the wireless unit 436 may implement one or more of various wireless technologies, such as Wi-Fi, Bluetooth, RF, and so on.
  • a USB port 440 may further be provided as part of the device 120 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks.
  • other forms of wired connections may be employed, such as a broadband connection.
  • the wireless unit 436 and USB 440 form two of many examples of possible interfaces used to connect the device 120 to the network 202 for interacting with the cloud services 130 .
  • the voice controlled device 120 ( 1 ) may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons. There may also be a simple light element (e.g., LED) to indicate a state such as, for example, when power is on.
  • a simple light element e.g., LED
  • the device 120 ( 1 ) may be implemented as an aesthetically appealing device with smooth and rounded surfaces, with one or more apertures for passage of sound waves.
  • the device 120 ( 1 ) may merely have a power cord and optionally a wired interface (e.g., broadband, USB, etc.). Once plugged in, the device may automatically self-configure, or with slight aid of the user, and be ready to use. As a result, the device 120 ( 1 ) may be generally produced at a low cost.
  • other I/O components may be added to this basic model, such as specialty buttons, a keypad, display, and the like.
  • FIG. 5 shows an example process 500 for aiding a person in performing a task, including receiving a request from the person via one device and delivering a response to the person via another device.
  • the process 500 may be implemented by the local endpoint devices 120 ( 1 )-(N) and server(s) 132 of FIG. 1 , or by other devices.
  • This process (along with the processes illustrated in FIGS. 6 and 7 ) is illustrated as a collection of blocks or actions in a logical flow graph. Some of the blocks represent operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the blocks represent computer-executable instructions stored on one or more computer-readable media that, when executed by one or more processors, perform the recited operations.
  • computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types.
  • routines programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types.
  • the order in which the operations are described is not intended to be construed as a limitation, and any number of the described blocks can be combined in any order or in parallel to implement the processes.
  • the blocks are arranged visually in FIG. 5 in columns beneath the endpoint devices 120 ( 1 )-(N) and server(s) 132 to illustrate that these devices of the system 100 may perform these operations. That is, actions defined by blocks arranged beneath the devices 120 ( 1 )-(N) may be performed by any one of the devices. In certain situations, part of the process, such as the request input part, may be performed by a first endpoint device and another part of the process, such as the response delivery part, may be performed by a second endpoint device, as illustrated by the dashed boxes about portions of the flow diagram. Similarly, actions defined by blocks arranged beneath the server(s) 132 may be performed by one or more server(s) 132 .
  • a first local endpoint device 120 receives speech input at the microphone(s) 208 / 434 .
  • the speech input may include a wake word to alert the device to intentional speech, or may be part of an ongoing discussion after the device is already awake and interacting with the user.
  • the speech input includes a request.
  • the speech recognition module 410 at the first local endpoint device 120 ( 1 ) attempts to discern whether the request in the speech input would benefit from knowing the identity of the person. Said another way, is the request general or more personal? If it is not personal (i.e., the “no” branch form 504 ) and person identity is not beneficial, the process 500 may proceed to some pre-processing of the speech input at 508 .
  • the speech input may be a question, “What is the weather today?” This request may be considered general in nature, and not personal, and hence the system need not remember who is making the request.
  • the user may make a personal request (i.e., the “yes” branch from 504 ) where person identity is beneficial, leading to an operation to identify the person at 506 .
  • a personal request i.e., the “yes” branch from 504
  • person identity is beneficial, leading to an operation to identify the person at 506 .
  • the speech input is “please remind me to take out the garbage tomorrow morning” or “remind me to pick up my wife's anniversary present.” Both of these are examples of personal requests, with the latter having a higher degree of sensitivity in how the reminder is conveyed.
  • the person is identified through use voice identification (e.g., person A is talking), interchange context (male voice asks to take out garbage while in master bedroom), secondary visual confirmation, and so forth.
  • the first device 120 ( 1 ) may optionally pre-process the speech input prior to sending it to the server. For instance, the device may apply natural language processing to the input, or compression algorithms to compress the data prior to sending it over to the servers 132 , or even encryption algorithms to encrypt the audio data.
  • the speech input is passed to the servers 132 along with an identity of the first device 120 ( 1 ) and an identity of the person, if known from 506 .
  • the identity of the device 120 ( 1 ) may be a serial number, a registration number or the like, and is provided so that the task handler operating at the servers 132 knows from where the user request originated.
  • a response may be immediately returned to the first device 120 ( 1 ), such as a response containing the current weather information.
  • the identity of the first device 120 ( 1 ) may help confirm the identity of the user.
  • the user's use of the first device to make a particular request at a particular time of day may be recorded in the user's profile as a way to track habits or patterns in the user's normal course of the day.
  • this association may be used in selecting a location and endpoint device through for delivery of responses to that identified user for a period of time shortly after receipt of the request, or for delivery of future responses.
  • the identity of the person may be determined by the servers 132 , rather than at the first device 120 ( 1 ).
  • the first device 120 ( 1 ) passes audio data representative of the speech input from the person, and the servers 132 use the audio data and possibly other indicia to identify the person.
  • the user may set a reminder for another person.
  • a first user e.g., the husband Scott
  • a second user e.g., his wife, Elyn
  • the request includes an identity of another user, which the servers at the cloud services will determine who that might be, based on the user profile data.
  • the servers 132 at the cloud services 130 processes in the speech input received from the first endpoint device 120 ( 1 ).
  • the processing may include decryption, decompression, and speech recognition.
  • the task handler 220 determines an appropriate response.
  • the task handler may consult any number of applications to generate the response. For instance, if the request is for a reminder to purchase airline tickets tomorrow, the task handler may involve a travel application as part of the solution of discovering airline prices when providing the reminder response tomorrow.
  • the cloud services 130 may also determine for whom the response is to be directed. The response is likely to be returned to the original requester, but in some cases, it can be delivered to another person (in which the location determination would be with respect to the second person).
  • an immediate confirmation may be optionally sent to indicate to the user that the request was received and will be handled. For instance, in response to a request for a reminder, the response might be “Okay Scott, I'll remind you.”
  • the servers 130 return the confirmation to the same endpoint device 120 ( 1 ) from which the request was received.
  • the first device 120 ( 1 ) receives and plays the confirmation so that the user experience is one of a conversation, where the computing system heard the request and acknowledged it.
  • the task handler 220 discerns from the request an appropriate time to respond to the request.
  • the user may use any number of ways to convey a desired answer. For instance, the user may ask for a reminder “before my company meeting” or “tomorrow morning” or at 5:00 PM on a date certain. Each of these has a different level of specificity. The latter is straightforward, with the task handler 220 setting a response for 5:00 PM. With respect to the two former examples, the task handler 220 may attempt to discern what “tomorrow morning” may be depending upon the request.
  • the timeframe associated with “tomorrow morning” is likely the time when the user is expected to be home in the morning (e.g., say at 6:30 AM as discussed above). If the request is for a reminder to “meet with marketing”, the timeframe for “tomorrow morning” may be more like to 9:00 AM or 10:00 AM. Finally, if the request is for “before my company meeting”, the task handler 220 may consult a calendar to see when the “company meeting” is scheduled and will set a reminder for a reasonable time period before that meeting is scheduled to start.
  • a location of the target person is determined in order to identify the place to which the response is to be timely sent. For instance, as the time for response approaches, the person location module 222 determines where the user may be located in order to deliver a timely response. There are many ways to make this determination. A more detailed discussion of this action is described below with reference to FIG. 6 . Further, the target user may be the initial requester or another person.
  • a device to which to send the response is determined.
  • an endpoint device selector 310 evaluates possible devices that might be available and then determines which endpoint device might be best in the circumstances to send the response. There are many techniques for evaluating possible devices and discerning the best fit. A more detailed discussion of this action is provided below with reference to FIG. 7 .
  • an appropriate response is timely sent to the best-fit device at the location of the target user.
  • the best-fit device is a different endpoint device, such as a second local device 120 ( 2 ), than the device 120 ( 1 ) from which the request was received.
  • the response is received and played (or otherwise manifested) for the target user.
  • the second device 120 receives the response, and plays it for the user who is believed to be in the vicinity.
  • the response may be in any form (e.g., audio, visual, haptic, etc.) and may include essentially any type of message, reminder, etc.
  • the response may be in an audio form, where it is played out through the speaker for the user to hear. With the continuing examples, the response may be “Don't forget to take out the garbage”, or “You have your company meeting in 15 minutes”.
  • the technique described above and illustrated in FIG. 5 is merely an example and implementations are not limited to this technique. Rather, other techniques for operating the devices 120 and servers 132 may be employed and the implementations of the system disclosed herein are not limited to any particular technique.
  • FIG. 6 shows a more detailed process for determining a location of the person, from act 520 of FIG. 5 .
  • an identity of the target person is received.
  • certain requests will include an identity of the person making the request, such as a unique user ID.
  • the person location module 222 might poll optical devices throughout an environment to attempt to visually locate the target person.
  • the optical devices such as cameras, may employ recognition software (e.g., facial recognition, feature recognition, etc.) to identify users.
  • recognition software e.g., facial recognition, feature recognition, etc.
  • “polling” refers to obtaining the optical information from the optical devices, which may involve actively requesting the information (e.g., a “pull” model) or receiving the information without request (e.g., a “push” model).
  • the person location module 222 may poll audio devices throughout the environment to gain voice confirmation that the target person is present. Audio tools may be used to evaluate audio input against pre-recorded vocal profiles to uniquely identify different people.
  • Another technique is to locate portable devices that may be associated with the target person, at 604 - 3 .
  • the person location module 222 may interact with location software modules that locate devices such as smartphones, tablets, or personal digital assistants via GPS data and/or cell tower trilateration data.
  • this technique may be used in cooperation with other approaches. For instance, this physical location data may help narrow a search for a person to a particular residence or office, and then polling audio or optical devices may be used to place the user in particular rooms or areas of the residence or office.
  • the person location module 222 may further consult with other applications in an effort to locate the user, such as a calendar application, at 604 - 4 .
  • the calendar application may specify where the user is scheduled to be located at a particular time. This is particularly useful when the user is in various meetings at the office. There are many other sources that may be consulted to provide other indicia of the target person's whereabouts, as represented by 604 -N.
  • the person location module 222 identifies multiple possible locations.
  • the possible locations may be optionally ranked. For instance, each location may be assigned a confidence score indicating how likely the user is to be located there.
  • Use of visual data may have a very high confidence score, whereas audio data has slightly less confidence associated with it.
  • Use of a calendar item may have a significantly lower confidence score attached as there is no guarantee that the user is following the schedule.
  • the person location module 222 may engage one or more local devices to interact with the target person to confirm his or her presence. For instance, suppose the person location module 222 initially believes the person is in a particular room. The person location module 222 may direct one of the devices in the room to engage the person, perhaps through asking a question (e.g., “Scott, do you need anything?”). If the person is present, the person may naturally respond (e.g., “No, nothing. Thanks”). The person location module 222 may then confirm that the target person is present.
  • a question e.g., “Scott, do you need anything?”
  • the person may naturally respond (e.g., “No, nothing. Thanks”). The person location module 222 may then confirm that the target person is present.
  • a location is chosen for delivery of the response to the user.
  • the choice may be based on the ranked possible locations of action 606 and/or on confirmation through a quick interaction of action 608 .
  • FIG. 7 shows a more detailed process for determining an appropriate device to return the response, from action 522 of FIG. 5 .
  • the location of the target person is received. This may be determined from the action 516 , as illustrated in FIG. 6 . Alternatively, the location of the target person may be pre-known or the user may have informed the system of where he or she was located.
  • possible devices proximal to the location of the target person are discovered as being available to deliver the response to the person. For example, if the user is found to be located in a room of a home or office, the computing endpoint device selector 310 discovers whether one or more devices reside in the room of the house. The selector 310 may consult the user's profile to see what devices are associated with the user, or may evaluate registration records that identify a residence or location in which the device is installed.
  • the available devices are evaluated to ascertain which might be the best device in the circumstances to return a response to the target person.
  • a distance from the endpoint device to the target person may be analyzed. If the endpoint device is equipped with depth sensors (e.g., time of flight sensors), the depth value may be used. If multiple devices are in a room, the timing difference of receiving verbal input from a user among the devices may be used to estimate the location of the person and which device might be closest.
  • depth sensors e.g., time of flight sensors
  • the background volume in an environment containing the target person may be analyzed.
  • High background volume may impact the ability of the device to communicate with the target user. For instance, suppose a room has a first device located near an appliance and a second device located across the room. If the appliance is operating, the background volume for the first device may be much greater than the background volume for the second device, thereby suggesting that the second device might be more appropriate in this case to communicate with the user.
  • the signal-to-noise ratios (SNRs) of various available devices are analyzed. Devices with strong SNRs are given a preference over those with weaker SNRs.
  • echo characteristics of the environment may be analyzed.
  • a baseline reading is taken when the room is empty of humans and moving objects to get an acoustical map of the surrounding environment, including location of surfaces and other objects that might cause sound echo.
  • the echo characteristics may be measured at the time of engagement with humans, including the target user, to determine whether people or objects might change the acoustical map. Depending upon the outcome of these measurements, certain available devices may become more appropriate for delivering the response to the target user.
  • Doppler characteristics of the environment may be analyzed.
  • a user may be moving through an environment from one part of a room to another part of the room, or from room to room.
  • the user is also speaking and conversing with the computing system 100 , there may be changing acoustics that affect which devices are the best to interact with the user, depending upon the direction of the user's movement, and orientation of the user's head when speaking.
  • the Doppler characteristics may therefore impact which device is may be best for responding in a given set of circumstances.
  • the environment may be analyzed, such as how many people are in the room, or who in particular is in the room, and so forth.
  • visual data received from cameras or other optical devices may provide insights as to numbers of people, or identification of people in the environment. This analysis may assist in determining which device is most appropriate to deliver a response. For instance, if a device is located in a room crowded with people, the system may feel another device away from the crowd might be better.
  • analyses There are many other types of analyses applied to evaluate possible devices for providing the response, as represented by 706 -M. For instance, another type of analysis is to review ownership or registration information to discover an association between the target user and personal devices. Devices that are more personal to the target user may receive a higher score.
  • the response is evaluated to determine whether there are any special criteria that might impact a decision of where to direct the response. For instance, in the scenario where the user asked for a reminder to pick up his wife's present, the response will include an element of privacy or sensitivity in that the system should not return a reminder to a location where the target person's wife may accidentally hear the reminder. Another example is where the user may be requesting information about a doctor appointment or personal financial data, which is not intended for general consumption. There are myriad examples of special criteria. Accordingly, at 708 , these criteria are evaluated and used in the decision making process of finding the best endpoint device under the circumstances.
  • the best endpoint device 120 is chosen. This decision may be based on scoring the various analyses 706 - 1 to 706 -M, ranking the results, and applying any special criteria to the results. In this example, the device with the highest score in the end, will be chosen.

Abstract

A computing system has multiple endpoint computing devices in local environments to receive verbal requests from various users and a central or remote system to process the requests. The remote system generates responses and uses a variety of techniques to determine where and when to return responses audibly to the users. For each request, the remote system understands who is making the request, determines when to provide the response to the user, ascertains where the user is when it is time to deliver the response, discovers which of the endpoint devices are available to deliver the response, and evaluates which of the available devices is best to deliver the response. The system then delivers the response to the best endpoint device for audible emission or other form of presentation to the user.

Description

    BACKGROUND
  • Homes, offices and other places are becoming more connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, gesture, and even through natural language input such as speech.
  • As computing devices evolve, users are expected to rely more and more on such devices to assist them in routine tasks. Today, it is commonplace for computing devices to help people buy tickets, shop for goods and services, check the weather, find and play entertainment, and so forth. However, with the growing ubiquity of computing devices, it is not uncommon for users to have many devices, such as a smartphone, e-book reader, a tablet, a computer, an entertainment system, and so forth. One of the challenges for multi-device users is how to perform tasks effectively when working with multiple devices. Coordinating a task among multiple devices is non-trivial.
  • Accordingly, there is a need for techniques to improve coordination of user activity in a ubiquitous computing device environment.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical components or features.
  • FIG. 1 illustrates an environment in which multiple computing devices, including voice controlled devices, are ubiquitous and coordinated to assist a person in handling routine tasks.
  • FIG. 2 shows a representative scenario of a person using the computing environment to assist with the task. FIG. 2 includes a functional block diagram of select components of computing devices in the environment as well as remote cloud services accessible via a network.
  • FIG. 3 shows how devices are selected to engage the person during performance of the task.
  • FIG. 4 shows a block diagram of selected components of computing devices that may be used in the environment.
  • FIG. 5 is a flow diagram showing an illustrative process for aiding the person in performing a task, including receiving a request from the person via one device and delivering a response to the person via another device.
  • FIG. 6 is a flow diagram showing an illustrative process for determining a location of the person.
  • FIG. 7 is a flow diagram showing an illustrative process for determining a device to which to deliver the response to the person.
  • DETAILED DESCRIPTION
  • Described herein are techniques to leverage various computing devices to assist in routine tasks. As computing devices become ubiquitous in homes, offices, and other places, users are less likely to differentiate among them when thinking about and performing these routine tasks. The users will increasingly expect the devices to intelligently help, regardless of where the users are located and what the users might currently be doing. To implement this intelligence, a computing system is architected to organize task management across multiple devices with which the user may interact.
  • In one implementation, the computing system is constructed as a cloud service that uses a variety of implicit and explicit signals to determine presence of a user in a location and to decide which, if any, assistance or responses to provide to one or more devices within that location. The signals may represent any number of indicia that can help ascertain the whereabouts of the user and how best to interact with the person at that time, and at that location. Representative signals may include audio input (e.g., sound of a user's voice), how recently the user interacted with a device, presence of a mobile device associated with the user, visual recognition of the user, and so forth.
  • As one example scenario, suppose a user wants to remember to do a simple household chore or work task. The user may ask the computing system, via a first device, to remind him at a future time to do the household chore or work task. The computing system may then subsequently, at the future time, remind the user via a second device that is appropriate in the current circumstances to deliver that message. In this case, the computing system understands who is making the request, determines when to provide the reminder to the user, ascertains where the user is when it is time to remind him, discovers which devices are available to deliver the reminder, and evaluates which of the available devices is best to deliver the reminder. In this manner, the computing system implements response functionality that includes intelligent selection of endpoint devices.
  • The various operations to implement this intelligence may be split among local devices and remote cloud computing systems. In various implementations, different modules and functionality may reside locally in the devices proximal to the user, or remotely in the cloud servers. This disclosure provides one example implementation in which a significant portion of the response system resides in the remote cloud computing system.
  • Further, this disclosure describes the techniques in the context of local computing devices that are primarily voice operated, such as dedicated voice controlled devices. Receiving verbal requests and providing audible responses introduce some additional challenges, which the system described below is configured to address. However, use of voice controlled devices is not intended to be limiting as other forms of engaging the user (e.g., gesture input, typed input, visual output, etc.) may be used by the computing system.
  • Illustrative Architecture
  • FIG. 1 shows an illustrative architecture of a computing system 100 that implements response functionality with intelligent endpoint selection. For discussion purposes, the system 100 is described in the context of users going about their normal routines and interacting with the computing system 100 throughout the day. The computing system 100 is configured to receive requests given by users at respective times and locations, process those requests, and return responses at other respective times, to locations at which the users are present, and to appropriate endpoint devices.
  • In this illustration, a house 102 is a primary residence for a family of three users, including a first user 104 (e.g., adult male, dad, husband, etc.), a second user 106 (e.g., adult female, mom, wife, etc.), and a third user 108 (e.g., daughter, child, girl, etc.). The house is shown with five rooms including a master bedroom 110, a bathroom 112, a child's bedroom 114, a living room 116, and a kitchen 118. The users 104-108 are located in different rooms in the house 102, with the first user 104 in the master bedroom 110, the second user 106 in the living room 116, and the third user 108 in the child's bedroom 114.
  • The computing system 100 includes multiple local devices or endpoint devices 120(1), . . . , 120(N) positioned at various locations to interact with the users. These devices may take on any number of form factors, such as laptops, electronic book (eBook) reader devices, tablets, desktop computers, smartphones, voice controlled devices, entertainment device, augmented reality systems, and so forth. In FIG. 1, the local devices include a voice controlled device 120(1) residing in the bedroom 110, a voice controlled device 120(2) in the child's bedroom 114, a voice controlled device 120(3) in the living room 116, a laptop 120(4) in the living room 116, and a voice controlled device 120(5) in the kitchen 118. Other types of local devices may also be leveraged by the computing system, such as a smartphone 120(6) of the first user 104, cameras 120(7) and 120(8), and a television screen 120(9). In addition, the computing system 100 may rely on other user-side devices found outside the home, such as in an automobile 122 (e.g., car phone, navigation system, etc.) or at the first user's office 124 (e.g., work computer, tablet, etc.) to convey information to the user.
  • Each of these endpoint devices 120(1)-(N) may receive input from a user and deliver responses to the same user or different users. The input may be received in any number of ways, including as audio or verbal input, gesture input, and so forth. The responses may also be delivered in any number of forms, including as audio output, visual output (e.g., pictures, UIs, videos, etc. depicted on the laptop 120(4) or television 120(9)), haptic feedback (e.g., vibration of the smartphone 120(6), etc.), and the like.
  • The computing system 100 further includes a remote computing system, such cloud services 130 supported by a collection of network-accessible devices or servers 132. The cloud services 130 generally refer to a network-accessible platform implemented as a computing infrastructure of processors, storage, software, data access, and so forth that is maintained and accessible via a network, such as the Internet. Cloud services 130 may not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated with cloud services include “on-demand computing”, “software as a service (SaaS)”, “platform computing”, “network accessible platform”, and so forth.
  • The cloud services 130 coordinate request input and response output among the various local devices 120(1)-(N). At any one of the local devices 120(1)-(N), a user, such as the user 104, may enter a request for the computing system 100 to handle. This request may be a verbal request, such as the user 104 speaking to the voice controlled device 120(1) in the master bedroom 110. For instance, the user may say, “Please remind me to take out the garbage tomorrow morning.” The voice controlled device 120(1) is equipped with microphones to receive the audio input and a network interface to pass the request to the cloud services 130. The local device 120(1) may optionally have natural language processing functionality to begin processing of the speech content.
  • The request is passed to the cloud services 130 over a network (not shown in FIG. 1) where the request is processed. The request is parsed and interpreted. In this example, the cloud services 130 determine that the user wishes to be reminded of the household chore to take out the garbage at a specified timeframe (i.e., tomorrow morning). The cloud services 130 implements a task handler to define a task that schedules a reminder to be delivered to the user at the appropriate time (e.g., 7:00 AM). When that time arrives, the cloud services 130 determine where the target user who made the request, i.e., the first user 104, is located. The cloud services 130 may use any number of techniques to ascertain the user's whereabouts, such as polling devices in the area to get an audio, visual, or other biometric confirmation of presence, or locating a device that might be personal or associated with the user (e.g., smartphone 120(6)), or through other secondary indicia, such as the user's history of activity, receipt of other input from the user from a specific location, and so forth.
  • Once the user is located, the cloud services 130 may then determine which local device is suitable to deliver the response to the user. In some cases, there may be only a single device and hence the decision is straightforward. However, in other situations, the user may be located in an area having multiple local devices, any one of which may be used to convey the response. In such situations, the cloud services 130 may evaluate the various candidate devices, and select the best or more appropriate device in the circumstances to deliver the response.
  • In this manner, the computing system 100 provides a coordinated response system that utilizes ubiquitous devices available in the user's environment to receive requests and deliver responses. The endpoint devices used for receipt of the request and deliver of the response may be different. Moreover, the devices need not be associated with the user in any way, but rather generic endpoint devices that are used as needed to interact with the user. To illustrate the flexibility of the computing system, the following discussion continues the earlier example of a user asking to be reminded to perform a household chore.
  • FIG. 2 illustrates select devices in the computing system 100 to show a representative scenario of a person using the computing environment to assist with the task. In this example, two endpoint devices are shown, with a first endpoint device in the form of the voice controlled assistant 120(1) residing in the bedroom 110 and the second endpoint device in the form of the voice controlled assistant 120(5) residing in the kitchen 118. The endpoint devices 120(1) and 120(5) are coupled to communicate with the remote cloud services 130 via a network 202. The network 202 may be representative of any number of network types, such as wired networks (e.g., cable, LAN, etc.) and/or wireless networks (e.g., Bluetooth, RF, cellular, satellite, etc.).
  • Each endpoint or local device, as represented by the bedroom-based device 120(1), is equipped with one or more processors 204, computer-readable media 206, one or more microphones 208, and a network interface 210. The computer-readable media 206 may include volatile and nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data.
  • Local program modules 212 are shown stored in the media 206 for execution by the processor(s) 204. The local modules 206 provide basic functionality to receive and process audio input received via the microphones 208. The functionality may include filtering signals, analog-to-digital conversion, parsing sounds or words, and early analysis of the parsed sounds or words. For instance, the local modules 212 may include a wake word recognition module to recognize wake words that are used to transition the voice controlled assistant 120(1) to an awake state for receiving input from the user. The local modules 212 may further include some natural language processing functionality to begin interpreting the voice input from the user. To continue the above example, suppose the user 104 makes a request to the voice controlled assistant 120(1) in the bedroom 110 at a first time of 9:30 PM. The request is for a reminder to perform a household chore in the morning. In this example, the user 104 speaks a wake word to alert the device 120(1) and then verbally gives the request, “Remind me to take out the garbage tomorrow morning” as indicated by the dialog bubble 213. The microphone(s) 208 receive the audio input and the local module(s) 212 process and recognize the wake word to initiate other modules. The audio input may be parsed and partially analyzed, and/or packaged and sent via the interface 210 and network 202 to the cloud services 130.
  • The cloud services 130 include one or more network-accessible devices, such as servers 132. The servers 132 may include one or more processors 214 and computer-readable media 216. The processor(s) 214 and the computer-readable media 216 of the servers 132 are physically separate from the processor(s) 204 and computer-readable media 206 of the device 120(1), but may function jointly as part of a system that provides processing and memory in part on the device 120 and in part on the cloud services 130. These servers 132 may be arranged in any number of ways, such as server farms, stacks, and the like that are commonly used in data centers.
  • The servers 132 may store and execute any number of programs, data, applications, and the like to provide services to the user. In this example architecture, the servers 132 are shown to store and execute natural language processing (NLP) modules 218, a task handler 222, a person location module 224, and various applications 224. The NLP modules 218 process the audio content received from the local device 120(1) to interpret the request. If the local device is equipped with at least some NLP capabilities, the NLP modules 218 may take that partial results and complete the processing to interpret the user's verbal request.
  • The resulting interpretation is passed to the task handler 220 to handle the request. In our example, the NLP modules 218 interpret the user's input as requesting a reminder to be scheduled and delivered at the appropriate time. The task handler 220 defines a task to set a reminder to be delivered at a time period associated with “tomorrow morning”. The task might include the contents (e.g., a reminder to “Don't forget to take out the garbage”), a time for delivery, and an expected location of delivery. The delivery time and expected location may be ascertained from secondary indicia that the service 130 aggregates and searches. For instance, the task handler 220 may consult other indicia to better understand what “tomorrow morning” might mean for this particular user 104. One of the applications 224 may be a calendar that shows the user has a meeting at the office at 7:30 AM, and hence is expected to leave the house 102 by 7:00 AM. Accordingly, the task handler 220 may narrow the range of possible times to before 7:00 AM. The task handler 220 may further request activity history from a user profile application (another of the applications 224) to determine whether the user has a normal morning activity. Suppose, for example, that the user has shown a pattern of arising by 6:00 AM and having breakfast around 6:30 AM. From these additional indicia, the task handler 220 may decide an appropriate time to deliver the reminder to be around 6:30 AM on the next day. Separately, the task handler 220 may further deduce that the user is likely to be in the kitchen at 6:30 AM the next day. From this analysis, the task handler 220 sets a task for this request. In this example, a task is defined to deliver a reminder message at 6:30 AM on the next day to a target user 104 via an endpoint device proximal to the kitchen 118. That is, the task might be structured as including data items of content, date/time, user identity, default endpoint device, and default location. Once the request is understood and a task is properly defined, the cloud services 130 may return a confirmation to the user to be played by the first device 120(1) that received the request while the user is still present. For instance, in response to the request for a reminder 213, the cloud services 130 might send a confirmation to be played by the bedroom device 120(1), such as a statement “Okay Scott, I'll remind you”, as shown by dialog bubble 215. In this manner, the user experience is one of a conversation with a computing system. The user casually makes a request and the system responds in conversation. The statement may optionally include language such as “tomorrow at 6:30 am in the kitchen” to provide confirmation of the intent and an opportunity for the user to correct the system's understanding and plan.
  • The person location module 222 may further be used to help locate the user and an appropriate endpoint device when the time comes to deliver the response. Continuing the example, the task handler 220 might instruct the person location module 222 to help confirm a location of the user 104 as the delivery time of 6:30 AM approaches. Initially, the person location module 222 may attempt to locate the user 104 by evaluating a location of a personal device that he carries, such as his smartphone 120(6). Using information about the location of the smartphone 120(6) (e.g., GPS, trilateration from cell towers, Wi-Fi base station proximity, etc.), the person location module 222 may be able to confirm that the user is indeed in the house 102. Since the default assumption is that the user will be in the kitchen 118, the person location module 222 may ask the local device 120(5) to confirm that the target user 104 is in the kitchen 118. In one implementation, the person location module 222 may direct the local device 120(5) to listen for voices and then attempt to confirm that one of them is the target user 104. For instance, the local device 120(5) may provide a greeting to the target user, using the user's name, such as “Good morning Scott” as indicated by dialog bubble 226. If the target user 104 is present, the user may answer “Good morning”, as indicated by the dialog bubble 228. In an alternative implementation, the local device 120(5) may be equipped with voice recognition functionality to identify the target user by capturing his voice in the environment. As still another implementation, the person location module 222 may request a visual image from the camera 120(8) (See FIG. 1) in the kitchen to get a visual confirmation that the target user 104 is in the kitchen.
  • When the delivery time arrives, the task handler 220 engages an endpoint device to deliver the response. In this example, the task handler 220 contacts the voice controlled assistant 120(5) in the kitchen 118 to send the response. The content from the reminder task is extracted and sent to the device 120(5) for playback over the speaker. Here, at 6:30 AM, the voice controlled assistant audibly emits the reminder, “Don't forget to take out the garbage” as indicated by the dialog bubble 230.
  • As illustrated by this example, the computing system 100 is capable of receiving user input from one endpoint or local device 120, processing the user input, and providing a timely response via another endpoint or local device 120. The user need not remember which device he gave the request, or specify which device he receives the response. Indeed, it might be any number of devices. Instead, the user experience is enhanced by the ubiquity of the devices, and the user will merely assume that the computer-enabled assistant system intuitively listened to the request and provided a timely response.
  • In some situations, there may be multiple devices to choose from when delivering the reminder. In this situation, the cloud services 130 may involve evaluating the various devices to find a best fit for the circumstances. Accordingly, one of the applications 224 may be an endpoint device selection module that attempts to identify the best local endpoint device for engaging the user. One example scenario is provided next to illustrate possible techniques for ascertaining the best device.
  • FIG. 3 shows how local endpoint devices are selected to engage the target person during performance of the task. In this illustration, four local endpoint devices 302, 304, 306, and 308 are shown in four areas or zones A-D, respectively. The zones A-D may represent different rooms, physical areas of a larger room, and so forth. In this example, the target user 104 is in Zone D. But, he is not alone. In addition, four other people are shown in the same zone D.
  • An endpoint device selector 310 is shown stored in the computer-readable media 216 for execution on the processor(s) 214. The endpoint device selector 310 is configured to identify available devices to engage the user 104, and then analyze them to ascertain the most appropriate device in the circumstances. Suppose, for discussion purposes, that anyone of the four devices 302-308 may be identified as “available” devices that are sufficient proximal to communicate with the user 104. There are many ways to determine available devices, such as detecting devices known to be physically in or near areas proximal to the user, finding devices that pick up audio input from the user (e.g., casual conversation in a room), devices associated with the user, user preferences, and so forth.
  • The endpoint device selector 310 next evaluates which of the available devices is most appropriate under the circumstances. There are several ways to make this evaluation. In one approach, a distance analysis may be performed to determine the distances between a device and the target person. As shown in FIG. 3, the voice controlled assistant 308 is physically closest to the target user 104 at a distance D1 and the voice controlled assistant 306 is next closest at a distance D2. Using distance, the endpoint device selector 310 may choose the closest voice controlled assistant 308 to deliver the response. However, physical proximity may not be the best in all circumstances.
  • Accordingly, in another approach, audio characteristics in the environment surrounding the user 104 may be analyzed. For instance, the signal-to-noise ratios are measured at various endpoint devices 302-308 to ascertain which one is best at hearing the user to the exclusion of other noise. As an alternative, the background volume may be analyzed to determine whether the user is in an area of significant background noise, such as the result of a conversation of many people or background audio from a television or appliance. Still another possibility is to analyze echo characteristics of the area, as well as perhaps evaluate Doppler characteristics that might be introduced as the user is moving throughout one or more areas. That is, verbal commands from the user may reach different devices in with more or less clarity and strength depending upon the movement and orientation of the user.
  • In still another approach, environment observations may be analyzed. For instance, a number of people in the vicinity may be counted based on data from cameras (if any) or recognition of distinctive voices. In yet another situation, a combination of physical proximity, sound volume-based determination, and/or visual observation may indicate that the closest endpoint device is actually physically separated from the target user by a structural impediment (e.g., the device is located on the other side of a wall in an adjacent room). In this case, even though the device is proximally the closest in terms of raw distance, the endpoint device selector 310 removes the device from consideration. These are but a few examples.
  • Any one or more of these analyses may be performed to evaluate possible endpoint devices. Suppose, for continuing discussion, that the endpoint device selector 310 determines that the noise level and/or number of people in zone D are too high to facilitate effective communication with the target user 104. As a result, instead of choosing the closest voice controlled assistant 308, the endpoint selector 310 may direct the voice controlled assistant 306 in zone C to communicate with the target user 104. In some instances, the assistant 306 may first attempt to get the user's attention by playing a statement to draw the user closer, such as “Scott, I have a reminder for you” as represented by the dialog bubble 312. In reaction to this message, the user 104 may move closer to the device 306 in zone C, thereby shrinking the distance D2 to a more suitable length. For instance, the user 104 may move from a first location in zone D to a new location in zone C as shown by an arrow labeled “scenario A”. Thereafter, the task handler 220 may deliver the reminder to take out the garbage.
  • In addition, these techniques for identifying the most suitable device for delivering the response may aid in delivery of confidential or sensitive messages. For instance, suppose the target user 104 sets a reminder to pick up an anniversary gift for his wife. In this situation, the endpoint device selector 310 will evaluate the devices in and near the user's current location in an effort to identify a device that can deliver the reminder without the user's wife being present to hear the message. For instance, suppose the user 104 moves from zone D to zone A for a temporary period of time (as illustrated by an arrow labeled “scenario B”), thereby leaving the other people (and his wife) in zone D. Once the user is detected as being alone in zone A, the task handler 220 may direct the voice controlled assistant 302 to deliver the reminder response to the user. This is shown, for example, by the statement “Don't forget to pick up your wife's anniversary present” in dialog bubble 314.
  • Aspects of the system described herein may be further used to support real time communication between two people. For example, consider a scenario where one user wants to send a message to another user in real time. In this scenario, the first user may provide a message for delivery to the second user. For instance, the first user may speak a message to a first endpoint device, which sends the message to the cloud services for processing. The cloud services may then determine a location of the second user and select a second endpoint device that is available and suitable for delivery of the message to the second user. The message may then be presented to the second user via the second endpoint device.
  • FIG. 4 shows selected functional components of devices 120(1)-(N) that may be used in the computing environment. As noted in FIG. 1, the devices may be implemented in any number of ways and form factors. In this example, a device may be implemented as a standalone voice controlled device 120(1) that is relatively simple in terms of functional capabilities with limited input/output components, memory, and processing capabilities. For instance, the voice controlled device 120(1) does not have a keyboard, keypad, or other form of mechanical input. Nor does it have a display or touch screen to facilitate visual presentation and user touch input. Instead, the device 120(1) may be implemented with the ability to receive and output audio, a network interface (wireless or wire-based), power, and processing/memory capabilities. In certain implementations, a limited set of one or more input components may be employed (e.g., a dedicated button to initiate a configuration, power on/off, etc.). Nonetheless, the primary and potentially only mode of user interaction with the device 120(1) is through voice input and audible output.
  • The devices used in the system may also be implemented as a mobile device 120(6) such as a smartphone or personal digital assistant. The mobile device 120(6) may include a touch-sensitive display screen and various buttons for providing input as well as additional functionality such as the ability to send and receive telephone calls. Alternative implementations of the voice controlled device 100 may also include configuration as a computer, such as a laptop 120(4). The computer 120(4) may include a keyboard, a mouse, a display screen, and any other hardware or functionality that is typically found on a desktop, notebook, netbook, or other personal computing devices. The devices are merely examples and not intended to be limiting, as the techniques described in this disclosure may be used in essentially any device that has an ability to recognize speech input.
  • In the illustrated implementation, each of the devices 120 includes one or more processors 402 and computer-readable media 404. The computer-readable media 404 may include volatile and nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data. Such memory includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, RAID storage systems, or any other medium which can be used to store the desired information and which can be accessed by a computing device. The computer-readable media 404 may be implemented as computer-readable storage media (“CRSM”), which may be any available physical media accessible by the processor(s) 102 to execute instructions stored on the memory 404. In one basic implementation, CRSM may include random access memory (“RAM”) and Flash memory. In other implementations, CRSM may include, but is not limited to, read-only memory (“ROM”), electrically erasable programmable read-only memory (“EEPROM”), or any other tangible medium which can be used to store the desired information and which can be accessed by the processor(s) 402.
  • Several modules such as instruction, datastores, and so forth may be stored within the computer-readable media 404 and configured to execute on the processor(s) 402. A few example functional modules are shown as applications stored in the computer-readable media 404 and executed on the processor(s) 402, although the same functionality may alternatively be implemented in hardware, firmware, or as a system on a chip (SOC).
  • An operating system module 406 may be configured to manage hardware and services within and coupled to the device 120 for the benefit of other modules. A wake word recognition module 408 and a speech recognition module 410 may employ any number of conventional speech recognition techniques such as use of natural language processing and extensive lexicons to interpret voice input. For example, the speech recognition module 410 may employ general speech recognition techniques and the wake word recognition module may include speech or phrase recognition particular to the wake word. In some implementations, the wake word recognition module 408 may employ a hidden Markov model that represents the wake word itself. This model may be created in advance or on the fly depending on the particular implementation. In some implementations, the speech recognition module 410 may initially be in a passive state in which the speech recognition module 410 does not recognize or respond to speech. While the speech recognition module 410 is passive, the wake word recognition module 408 may recognize or respond to wake words. Once the wake word recognition module 408 recognizes or responds to a wake word, the speech recognition module 410 may enter an active state in which the speech recognition module 410 operates to detect any of the natural language commands for which it is programmed or to which it is capable of responding. While in the particular implementation shown in FIG. 4, the wake word recognition module 408 and the speech recognition module 410 are shown as separate modules; whereas in other implementations, these modules may be combined.
  • Other local modules 412 may also be present on the device, depending upon the implementation and configuration of the device. These modules may include more extensive speech recognition techniques, filters and echo cancellation modules, speaker detection and identification, and so forth.
  • The voice controlled device 100 may also include a plurality of applications 414 stored in the computer-readable media 404 or otherwise accessible to the device 120. In this implementation, the applications 414 are a music player 416, a movie player 418, a timer 420, and a personal shopper 422. However, the voice controlled device 120 may include any number or type of applications and is not limited to the specific examples shown here. The music player 416 may be configured to play songs or other audio files. The movie player 418 may be configured to play movies or other audio visual media. The timer 420 may be configured to provide the functions of a simple timing device and clock. The personal shopper 422 may be configured to assist a user in purchasing items from web-based merchants.
  • Datastores may also be stored locally on the media 404, including a content database 424 and one or more user profiles 426 of users that have interacted with the device 120. The content database 424 store various content that may be played or presented by the device, such as music, books, magazines, videos and so forth. The user profile(s) 426 may include user characteristics, preferences (e.g., user specific wake words), usage history, library information (e.g., music play lists), online purchase history, and other information specific to an individual user.
  • Generally, the voice controlled device 120 has input devices 428 and output devices 430. The input devices 428 may include a keyboard, keypad, mouse, touch screen, joystick, control buttons, etc. Specifically, one or more microphones 432 may function as input devices to receive audio input, such as user voice input. In some implementations, the input devices 428 may further include a camera to capture images of user gestures. The output devices 430 may include a display, a light element (e.g., LED), a vibrator to create haptic sensations, or the like. Specifically, one a more speakers 434 may function as output devices to output audio sounds.
  • A user may interact with the device 120 by speaking to it, and the microphone 432 captures the user's speech. The device 120 can communicate back to the user by emitting audible statements through the speaker 434. In this manner, the user can interact with the voice controlled device 120 solely through speech, without use of a keyboard or display.
  • The voice controlled device 120 might further include a wireless unit 436 coupled to an antenna 438 to facilitate a wireless connection to a network. The wireless unit 436 may implement one or more of various wireless technologies, such as Wi-Fi, Bluetooth, RF, and so on. A USB port 440 may further be provided as part of the device 120 to facilitate a wired connection to a network, or a plug-in network device that communicates with other wireless networks. In addition to the USB port 440, or as an alternative thereto, other forms of wired connections may be employed, such as a broadband connection. In this manner, the wireless unit 436 and USB 440 form two of many examples of possible interfaces used to connect the device 120 to the network 202 for interacting with the cloud services 130.
  • Accordingly, when implemented as the primarily-voice-operated device 120(1), there may be no input devices, such as navigation buttons, keypads, joysticks, keyboards, touch screens, and the like other than the microphone(s) 432. Further, there may be no output such as a display for text or graphical output. The speaker(s) 434 may be the main output device. In one implementation, the voice controlled device 120(1) may include non-input control mechanisms, such as basic volume control button(s) for increasing/decreasing volume, as well as power and reset buttons. There may also be a simple light element (e.g., LED) to indicate a state such as, for example, when power is on.
  • Accordingly, the device 120(1) may be implemented as an aesthetically appealing device with smooth and rounded surfaces, with one or more apertures for passage of sound waves. The device 120(1) may merely have a power cord and optionally a wired interface (e.g., broadband, USB, etc.). Once plugged in, the device may automatically self-configure, or with slight aid of the user, and be ready to use. As a result, the device 120(1) may be generally produced at a low cost. In other implementations, other I/O components may be added to this basic model, such as specialty buttons, a keypad, display, and the like.
  • Illustrative Processes
  • FIG. 5 shows an example process 500 for aiding a person in performing a task, including receiving a request from the person via one device and delivering a response to the person via another device. The process 500 may be implemented by the local endpoint devices 120(1)-(N) and server(s) 132 of FIG. 1, or by other devices. This process (along with the processes illustrated in FIGS. 6 and 7) is illustrated as a collection of blocks or actions in a logical flow graph. Some of the blocks represent operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the blocks represent computer-executable instructions stored on one or more computer-readable media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described blocks can be combined in any order or in parallel to implement the processes.
  • For purposes of describing one example implementation, the blocks are arranged visually in FIG. 5 in columns beneath the endpoint devices 120(1)-(N) and server(s) 132 to illustrate that these devices of the system 100 may perform these operations. That is, actions defined by blocks arranged beneath the devices 120(1)-(N) may be performed by any one of the devices. In certain situations, part of the process, such as the request input part, may be performed by a first endpoint device and another part of the process, such as the response delivery part, may be performed by a second endpoint device, as illustrated by the dashed boxes about portions of the flow diagram. Similarly, actions defined by blocks arranged beneath the server(s) 132 may be performed by one or more server(s) 132.
  • At 502, a first local endpoint device 120(1) receives speech input at the microphone(s) 208/434. The speech input may include a wake word to alert the device to intentional speech, or may be part of an ongoing discussion after the device is already awake and interacting with the user. The speech input includes a request.
  • At 504, the speech recognition module 410 at the first local endpoint device 120(1) attempts to discern whether the request in the speech input would benefit from knowing the identity of the person. Said another way, is the request general or more personal? If it is not personal (i.e., the “no” branch form 504) and person identity is not beneficial, the process 500 may proceed to some pre-processing of the speech input at 508. For instance, the speech input may be a question, “What is the weather today?” This request may be considered general in nature, and not personal, and hence the system need not remember who is making the request. On the other hand, the user may make a personal request (i.e., the “yes” branch from 504) where person identity is beneficial, leading to an operation to identify the person at 506. For instance, suppose the speech input is “please remind me to take out the garbage tomorrow morning” or “remind me to pick up my wife's anniversary present.” Both of these are examples of personal requests, with the latter having a higher degree of sensitivity in how the reminder is conveyed. In these situations, the person is identified through use voice identification (e.g., person A is talking), interchange context (male voice asks to take out garbage while in master bedroom), secondary visual confirmation, and so forth.
  • At 508, the first device 120(1) may optionally pre-process the speech input prior to sending it to the server. For instance, the device may apply natural language processing to the input, or compression algorithms to compress the data prior to sending it over to the servers 132, or even encryption algorithms to encrypt the audio data.
  • At 510, the speech input is passed to the servers 132 along with an identity of the first device 120(1) and an identity of the person, if known from 506. The identity of the device 120(1) may be a serial number, a registration number or the like, and is provided so that the task handler operating at the servers 132 knows from where the user request originated. In some cases, a response may be immediately returned to the first device 120(1), such as a response containing the current weather information. In some cases, the identity of the first device 120(1) may help confirm the identity of the user. Further, the user's use of the first device to make a particular request at a particular time of day may be recorded in the user's profile as a way to track habits or patterns in the user's normal course of the day. Further, when the person identity is associated with the first device 120(1), this association may be used in selecting a location and endpoint device through for delivery of responses to that identified user for a period of time shortly after receipt of the request, or for delivery of future responses. It is also noted that in some implementations, the identity of the person may be determined by the servers 132, rather than at the first device 120(1). In such implementations, the first device 120(1) passes audio data representative of the speech input from the person, and the servers 132 use the audio data and possibly other indicia to identify the person.
  • It is further noted that in some implementations, the user may set a reminder for another person. For instance, a first user (e.g., the husband Scott) may make a request for a second user (e.g., his wife, Elyn), such as “Please remind Elyn to pick up the prescription tomorrow afternoon”. In this situation, the request includes an identity of another user, which the servers at the cloud services will determine who that might be, based on the user profile data.
  • At 512, the servers 132 at the cloud services 130 processes in the speech input received from the first endpoint device 120(1). In one implementation, the processing may include decryption, decompression, and speech recognition. Once the audio data is parsed and understood, the task handler 220 determines an appropriate response. The task handler may consult any number of applications to generate the response. For instance, if the request is for a reminder to purchase airline tickets tomorrow, the task handler may involve a travel application as part of the solution of discovering airline prices when providing the reminder response tomorrow. In addition, the cloud services 130 may also determine for whom the response is to be directed. The response is likely to be returned to the original requester, but in some cases, it can be delivered to another person (in which the location determination would be with respect to the second person).
  • At 514, an immediate confirmation may be optionally sent to indicate to the user that the request was received and will be handled. For instance, in response to a request for a reminder, the response might be “Okay Scott, I'll remind you.” The servers 130 return the confirmation to the same endpoint device 120(1) from which the request was received. At 516, the first device 120(1) receives and plays the confirmation so that the user experience is one of a conversation, where the computing system heard the request and acknowledged it.
  • At 518, it is determined when to reply with a response. In one implementation, the task handler 220 discerns from the request an appropriate time to respond to the request. The user may use any number of ways to convey a desired answer. For instance, the user may ask for a reminder “before my company meeting” or “tomorrow morning” or at 5:00 PM on a date certain. Each of these has a different level of specificity. The latter is straightforward, with the task handler 220 setting a response for 5:00 PM. With respect to the two former examples, the task handler 220 may attempt to discern what “tomorrow morning” may be depending upon the request. If the request is for a reminder to “take out the garbage”, the timeframe associated with “tomorrow morning” is likely the time when the user is expected to be home in the morning (e.g., say at 6:30 AM as discussed above). If the request is for a reminder to “meet with marketing”, the timeframe for “tomorrow morning” may be more like to 9:00 AM or 10:00 AM. Finally, if the request is for “before my company meeting”, the task handler 220 may consult a calendar to see when the “company meeting” is scheduled and will set a reminder for a reasonable time period before that meeting is scheduled to start.
  • At 520, a location of the target person is determined in order to identify the place to which the response is to be timely sent. For instance, as the time for response approaches, the person location module 222 determines where the user may be located in order to deliver a timely response. There are many ways to make this determination. A more detailed discussion of this action is described below with reference to FIG. 6. Further, the target user may be the initial requester or another person.
  • At 522, a device to which to send the response is determined. In one implementation, an endpoint device selector 310 evaluates possible devices that might be available and then determines which endpoint device might be best in the circumstances to send the response. There are many techniques for evaluating possible devices and discerning the best fit. A more detailed discussion of this action is provided below with reference to FIG. 7.
  • At 524, an appropriate response is timely sent to the best-fit device at the location of the target user. Suppose, for discussion purposes, the best-fit device is a different endpoint device, such as a second local device 120(2), than the device 120(1) from which the request was received.
  • At 526, the response is received and played (or otherwise manifested) for the target user. As shown in FIG. 5, the second device 120(2) receives the response, and plays it for the user who is believed to be in the vicinity. The response may be in any form (e.g., audio, visual, haptic, etc.) and may include essentially any type of message, reminder, etc. The response may be in an audio form, where it is played out through the speaker for the user to hear. With the continuing examples, the response may be “Don't forget to take out the garbage”, or “You have your company meeting in 15 minutes”.
  • The technique described above and illustrated in FIG. 5 is merely an example and implementations are not limited to this technique. Rather, other techniques for operating the devices 120 and servers 132 may be employed and the implementations of the system disclosed herein are not limited to any particular technique.
  • FIG. 6 shows a more detailed process for determining a location of the person, from act 520 of FIG. 5. At 602, an identity of the target person is received. As noted above with respect to act 506, certain requests will include an identity of the person making the request, such as a unique user ID.
  • At 604, possible locations of the target person are determined. There are many ways to make this determination, several of which are presented as representative examples. For instance, at 604-1, the person location module 222 might poll optical devices throughout an environment to attempt to visually locate the target person. The optical devices, such as cameras, may employ recognition software (e.g., facial recognition, feature recognition, etc.) to identify users. As used herein, “polling” refers to obtaining the optical information from the optical devices, which may involve actively requesting the information (e.g., a “pull” model) or receiving the information without request (e.g., a “push” model). In another approach, at 604-2, the person location module 222 may poll audio devices throughout the environment to gain voice confirmation that the target person is present. Audio tools may be used to evaluate audio input against pre-recorded vocal profiles to uniquely identify different people.
  • Another technique is to locate portable devices that may be associated with the target person, at 604-3. For instance, the person location module 222 may interact with location software modules that locate devices such as smartphones, tablets, or personal digital assistants via GPS data and/or cell tower trilateration data. In some implementations, this technique may be used in cooperation with other approaches. For instance, this physical location data may help narrow a search for a person to a particular residence or office, and then polling audio or optical devices may be used to place the user in particular rooms or areas of the residence or office.
  • The person location module 222 may further consult with other applications in an effort to locate the user, such as a calendar application, at 604-4. The calendar application may specify where the user is scheduled to be located at a particular time. This is particularly useful when the user is in various meetings at the office. There are many other sources that may be consulted to provide other indicia of the target person's whereabouts, as represented by 604-N.
  • Suppose the person location module 222 identifies multiple possible locations. At 606, the possible locations may be optionally ranked. For instance, each location may be assigned a confidence score indicating how likely the user is to be located there. Use of visual data may have a very high confidence score, whereas audio data has slightly less confidence associated with it. Use of a calendar item may have a significantly lower confidence score attached as there is no guarantee that the user is following the schedule.
  • At 608, the person location module 222 may engage one or more local devices to interact with the target person to confirm his or her presence. For instance, suppose the person location module 222 initially believes the person is in a particular room. The person location module 222 may direct one of the devices in the room to engage the person, perhaps through asking a question (e.g., “Scott, do you need anything?”). If the person is present, the person may naturally respond (e.g., “No, nothing. Thanks”). The person location module 222 may then confirm that the target person is present.
  • At 610, a location is chosen for delivery of the response to the user. The choice may be based on the ranked possible locations of action 606 and/or on confirmation through a quick interaction of action 608.
  • FIG. 7 shows a more detailed process for determining an appropriate device to return the response, from action 522 of FIG. 5.
  • At 702, the location of the target person is received. This may be determined from the action 516, as illustrated in FIG. 6. Alternatively, the location of the target person may be pre-known or the user may have informed the system of where he or she was located.
  • At 704, possible devices proximal to the location of the target person are discovered as being available to deliver the response to the person. For example, if the user is found to be located in a room of a home or office, the computing endpoint device selector 310 discovers whether one or more devices reside in the room of the house. The selector 310 may consult the user's profile to see what devices are associated with the user, or may evaluate registration records that identify a residence or location in which the device is installed.
  • At 706, the available devices are evaluated to ascertain which might be the best device in the circumstances to return a response to the target person. There are many approaches to make this determination, several of which are presented as representative examples. For instance, at 706-1, a distance from the endpoint device to the target person may be analyzed. If the endpoint device is equipped with depth sensors (e.g., time of flight sensors), the depth value may be used. If multiple devices are in a room, the timing difference of receiving verbal input from a user among the devices may be used to estimate the location of the person and which device might be closest.
  • At 706-2, the background volume in an environment containing the target person may be analyzed. High background volume may impact the ability of the device to communicate with the target user. For instance, suppose a room has a first device located near an appliance and a second device located across the room. If the appliance is operating, the background volume for the first device may be much greater than the background volume for the second device, thereby suggesting that the second device might be more appropriate in this case to communicate with the user.
  • At 706-3, the signal-to-noise ratios (SNRs) of various available devices are analyzed. Devices with strong SNRs are given a preference over those with weaker SNRs.
  • At 706-4, echo characteristics of the environment may be analyzed. A baseline reading is taken when the room is empty of humans and moving objects to get an acoustical map of the surrounding environment, including location of surfaces and other objects that might cause sound echo. The echo characteristics may be measured at the time of engagement with humans, including the target user, to determine whether people or objects might change the acoustical map. Depending upon the outcome of these measurements, certain available devices may become more appropriate for delivering the response to the target user.
  • At 706-5, Doppler characteristics of the environment, particularly with respect to the target user's movement through the environment, may be analyzed. In some cases, a user may be moving through an environment from one part of a room to another part of the room, or from room to room. In these cases, if the user is also speaking and conversing with the computing system 100, there may be changing acoustics that affect which devices are the best to interact with the user, depending upon the direction of the user's movement, and orientation of the user's head when speaking. The Doppler characteristics may therefore impact which device is may be best for responding in a given set of circumstances.
  • At 706-6, the environment may be analyzed, such as how many people are in the room, or who in particular is in the room, and so forth. In some implementations, visual data received from cameras or other optical devices may provide insights as to numbers of people, or identification of people in the environment. This analysis may assist in determining which device is most appropriate to deliver a response. For instance, if a device is located in a room crowded with people, the system may feel another device away from the crowd might be better.
  • There are many other types of analyses applied to evaluate possible devices for providing the response, as represented by 706-M. For instance, another type of analysis is to review ownership or registration information to discover an association between the target user and personal devices. Devices that are more personal to the target user may receive a higher score.
  • At 708, the response is evaluated to determine whether there are any special criteria that might impact a decision of where to direct the response. For instance, in the scenario where the user asked for a reminder to pick up his wife's present, the response will include an element of privacy or sensitivity in that the system should not return a reminder to a location where the target person's wife may accidentally hear the reminder. Another example is where the user may be requesting information about a doctor appointment or personal financial data, which is not intended for general consumption. There are myriad examples of special criteria. Accordingly, at 708, these criteria are evaluated and used in the decision making process of finding the best endpoint device under the circumstances.
  • At 710, the best endpoint device 120 is chosen. This decision may be based on scoring the various analyses 706-1 to 706-M, ranking the results, and applying any special criteria to the results. In this example, the device with the highest score in the end, will be chosen.
  • CONCLUSION
  • Although the subject matter has been described in language specific to structural features, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features described. Rather, the specific features are disclosed as illustrative forms of implementing the claims.

Claims (38)

What is claimed is:
1. A computing system comprising:
a remote computing system;
multiple endpoint devices located in various locations local to one or more users, a first endpoint device comprising:
one or more processors;
computer-readable storage media storing computer-executable instructions;
at least one microphone to receive audio input from a user, the audio input containing a user request; and
an interface to transmit the user request to the remote computing system;
the remote computing system comprises one or more executable modules configured to produce a response to the user request, to determine when to deliver the response, to select a second endpoint device that is available to provide the response to the user, and to send the response to the second endpoint device; and
the second endpoint device comprising:
one or more processors;
computer-readable storage media storing computer-executable instructions; and
an interface to receive the response from the remote computing system; and
at least one speaker to output the response in audio form to the User.
2. The computing system as recited in claim 1, wherein the user request is selected from a group of requests comprising reminders, timers, alarms, calendar entries, directions, instructions, and reservations.
3. The computing system as recited in claim 1, wherein the remote computing system is configured to determine when to deliver the response by at least one of performing natural language understanding processing on the user request, using information from a calendar application, using information from a user profile associated with the user, or using information about events in an activity history associated with the user.
4. The computing system as recited in claim 1, wherein the first endpoint device further comprises a speech recognition module maintained in the one or more computer-readable storage media and executed by the one or more processors to convert a signal from the microphone representing the audio input of the user into text.
5. The computing system as recited in claim 1, wherein the one or more modules of the remote computing system are further configured to ascertain a location of the user prior to selecting the second endpoint device that is available at the location to provide the response to the user.
6. The computing system as recited in claim 1, further comprising a third endpoint device, wherein the one or more modules of the remote computing system are further configured to choose between the second and third endpoint devices to provide the response to the user.
7. The computing system as recited in claim 1, wherein the remote computing system is configured to ascertain the location of the user by receiving audio data from one or more of the endpoint devices.
8. The computing system as recited in claim 1, wherein the second endpoint device comprises a camera to capture images of an environment, the remote computing system being configured to ascertain the location of the user by receiving data derived from the images.
9. The computing system as recited in claim 1, wherein the remote computing system is configured to ascertain the location of the user by reviewing at least one of a calendar associated with the user or an activity history of the user.
10. The computing system as recited in claim 1, wherein the remote computing system is configured to select the second endpoint device by evaluating the one or more of the endpoint devices using at least one analysis comprising:
a distance analysis to determine a distance of an endpoint device from the user;
a background analysis to determine a volume of background noise of an endpoint device;
a signal-to-noise ratio (SNR) analysis to determine an SNR at an endpoint device with respect to the user and background noise sources;
an echo analysis to determine echo characteristics of an environment in which an endpoint device resides;
a Doppler analysis to determine Doppler characteristics of audio input from the user relative to an endpoint device; and
an environment analysis to determine a number of people proximal to an endpoint device.
11. One or more computer-readable media having computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
receiving, from a first computing device, a request from a first user;
processing the request to generate a response;
determining a second user to receive the response;
selecting a second computing device; and
delivering the response to the second computing device for presentation of the response to the second user.
12. The one or more computer-readable media as recited in claim 11, wherein the request comprises one of a text format or an audio format.
13. The one or more computer-readable media as recited in claim 11, wherein the first user and the second user are the same person.
14. The one or more computer-readable media as recited in claim 11, wherein the first computing device and the second computing device are the same computing device.
15. The one or more computer-readable media as recited in claim 11, wherein the first computing device resides at a first location and the second computing device resides at a second location different from the first location.
16. The one or more computer-readable media as recited in claim 11, further comprising computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform an additional operation comprising determining a time to deliver the response to the second user.
17. The one or more computer-readable media as recited in claim 11, further comprising computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform an additional operation comprising determining a time to deliver the response to the second user based in part on performing natural language understanding on the request.
18. The one or more computer-readable media as recited in claim 11, further comprising computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform an additional operation comprising determining a time to deliver the response to the second user, wherein the time to deliver is based at least in part by a determination of a presence of the second user.
19. The one or more computer-readable media as recited in claim 11, wherein selecting a second computing device comprises ascertaining a location of the second user and selecting a second computing device available at the location.
20. The one or more computer-readable media as recited in claim 19, wherein ascertaining a location of the second user comprises determining a location of a device associated with the second user.
21. The one or more computer-readable media as recited in claim 11, wherein determining a second user comprises performing natural language understanding on the request.
22. The one or more computer-readable media as recited in claim 11, wherein selecting a second computing device comprises polling computing devices in environments associated with the second user to detect whether the second user is present.
23. The one or more computer-readable media as recited in claim 11, wherein selecting the second computing device comprises at least one of:
determining a distance of the second computing device from the user;
determining a volume of background noise of the second computing device;
measuring a signal-to-noise ratio at the second computing device with respect to the user and background noise sources;
determining echo characteristics of an environment in which the second computing device resides;
determining Doppler characteristics of audio input from the user relative to the second computing device; or
determining a number of people proximal to the second computing device.
24. A computer-implemented method comprising:
under control of one or more computer systems configured with executable instructions,
receiving a request;
processing the request to generate a response;
selecting a computing device to deliver the response; and
delivering the response to the selected computing device.
25. The computer-implemented method as recited in claim 24, wherein receiving a request comprises receiving the request from a first computing device and wherein delivering the response comprises sending the response to a second computing device different from the first computing device.
26. The computer-implemented method as recited in claim 24, wherein receiving a request comprises receiving, from a first computing device, a request originated by a first user and wherein selecting a computing device comprises selecting one of the first computing device or a second computing device to deliver the response to a second user different from the first user.
27. The computer-implemented method as recited in claim 24, wherein receiving a request comprises receiving audio input indicative of voice entry by the user into a first computing device and delivering the response comprises sending audio data for audio output to the user by a second computing device different from the first computing device.
28. The computer-implemented method as recited in claim 24, wherein selecting a computing device to deliver the response comprises ascertaining a location of a user to receive the response and selecting a computing device from among multiple computing devices available at the location.
29. The computer-implemented method as recited in claim 28, wherein ascertaining a location of a user comprises at least one of:
polling one or more optical devices for visual confirmation of the user;
polling one or more audio devices for voice confirmation of the user;
locating an electronic device associated with the user; or
reviewing a calendar associated with the user.
30. The computer-implemented method as recited in claim 24, wherein selecting the computing device comprises at least one of:
analyzing proximity of the computing device to a user;
analyzing volume of background noise of the computing device;
analyzing signal-to-noise ratio of the computing device with respect to a user and background noise sources;
analyzing echo characteristics of an environment in which the computing device resides;
analyzing Doppler characteristics of audio input from a user relative to the computing device; or
analyzing a number of people proximal to the computing device.
31. The computer-implemented method as recited in claim 24, further comprising determining a time to return the response.
32. The computer-implemented method as recited in claim 24, further comprising determining a time to return the response by, in part, performing natural language understanding on the request.
33. A computer-implemented method comprising:
under control of one or more computer systems configured with executable instructions,
obtaining a message for delivery to a user;
determining a location of the user;
selecting one of one or more available computing devices; and
delivering the message to the selected computing device for presentation to the user.
34. The computer-implemented method as recited in claim 33, further comprising determining a time to deliver the message to the user.
35. The computer-implemented method as recited in claim 33, wherein obtaining a message comprises receiving, from a first computing device, a message from a first user, and wherein delivering the message comprises delivering the message to a second computing device for presentation to a second user different from the first user.
36. The computer-implemented method as recited in claim 33, wherein determining a location of the user comprises at least one of:
polling one or more optical devices for visual confirmation of the user;
polling one or more audio devices for voice confirmation of the user;
locating an electronic device associated with the user; or
reviewing a calendar associated with the user.
37. The computer-implemented method as recited in claim 33, wherein selecting one of one or more available computing devices comprises determining multiple computing devices available at the location and choosing said one computing device from among the multiple computing devices available at the location.
38. The computer-implemented method as recited in claim 33, further comprising repeating the determining, the selecting, and the delivering to resend the message to the user.
US13/715,741 2012-12-14 2012-12-14 Response endpoint selection Active 2033-11-09 US9271111B2 (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
US13/715,741 US9271111B2 (en) 2012-12-14 2012-12-14 Response endpoint selection
EP13861696.6A EP2932371B1 (en) 2012-12-14 2013-11-22 Response endpoint selection
CN201380063208.1A CN105051676B (en) 2012-12-14 2013-11-22 Response endpoint selects
PCT/US2013/071488 WO2014092980A1 (en) 2012-12-14 2013-11-22 Response endpoint selection
JP2015544158A JP2016502192A (en) 2012-12-14 2013-11-22 Response endpoint selection
US15/049,914 US10778778B1 (en) 2012-12-14 2016-02-22 Response endpoint selection based on user proximity determination
US17/016,769 US20210165630A1 (en) 2012-12-14 2020-09-10 Response endpoint selection
US18/149,127 US20230141659A1 (en) 2012-12-14 2023-01-02 Response endpoint selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/715,741 US9271111B2 (en) 2012-12-14 2012-12-14 Response endpoint selection

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/049,914 Continuation US10778778B1 (en) 2012-12-14 2016-02-22 Response endpoint selection based on user proximity determination

Publications (2)

Publication Number Publication Date
US20140172953A1 true US20140172953A1 (en) 2014-06-19
US9271111B2 US9271111B2 (en) 2016-02-23

Family

ID=50932239

Family Applications (4)

Application Number Title Priority Date Filing Date
US13/715,741 Active 2033-11-09 US9271111B2 (en) 2012-12-14 2012-12-14 Response endpoint selection
US15/049,914 Active 2034-10-14 US10778778B1 (en) 2012-12-14 2016-02-22 Response endpoint selection based on user proximity determination
US17/016,769 Abandoned US20210165630A1 (en) 2012-12-14 2020-09-10 Response endpoint selection
US18/149,127 Pending US20230141659A1 (en) 2012-12-14 2023-01-02 Response endpoint selection

Family Applications After (3)

Application Number Title Priority Date Filing Date
US15/049,914 Active 2034-10-14 US10778778B1 (en) 2012-12-14 2016-02-22 Response endpoint selection based on user proximity determination
US17/016,769 Abandoned US20210165630A1 (en) 2012-12-14 2020-09-10 Response endpoint selection
US18/149,127 Pending US20230141659A1 (en) 2012-12-14 2023-01-02 Response endpoint selection

Country Status (5)

Country Link
US (4) US9271111B2 (en)
EP (1) EP2932371B1 (en)
JP (1) JP2016502192A (en)
CN (1) CN105051676B (en)
WO (1) WO2014092980A1 (en)

Cited By (156)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140280541A1 (en) * 2013-03-14 2014-09-18 T-Mobile Usa, Inc. Proximity-Based Device Selection for Communication Delivery
US20150088272A1 (en) * 2013-09-23 2015-03-26 Emerson Electric Co. Energy Management Based on Occupancy and Occupant Activity Level
US20150198939A1 (en) * 2014-01-13 2015-07-16 Barbara Ander System and Method for Alerting a User
US20150245160A1 (en) * 2014-02-24 2015-08-27 International Business Machines Corporation Techniques for Mobility-Aware Dynamic Service Placement in Mobile Clouds
US20150271038A1 (en) * 2014-03-20 2015-09-24 Fujitsu Limited Link-device selecting apparatus and method
US20150370884A1 (en) * 2014-06-24 2015-12-24 Google Inc. List accumulation and reminder triggering
US20160021494A1 (en) * 2014-07-18 2016-01-21 Lei Yang Systems and methods for adaptive multi-feature semantic location sensing
US9271111B2 (en) * 2012-12-14 2016-02-23 Amazon Technologies, Inc. Response endpoint selection
US20170278331A1 (en) * 2013-12-23 2017-09-28 Assa Abloy Inc. Method for utilizing a wireless connection to unlock an opening
US9818407B1 (en) * 2013-02-07 2017-11-14 Amazon Technologies, Inc. Distributed endpointing for speech recognition
WO2017197312A3 (en) * 2016-05-13 2017-12-21 Bose Corporation Processing speech from distributed microphones
US9858927B2 (en) * 2016-02-12 2018-01-02 Amazon Technologies, Inc Processing spoken commands to control distributed audio outputs
US9898250B1 (en) * 2016-02-12 2018-02-20 Amazon Technologies, Inc. Controlling distributed audio outputs to enable voice output
US20180061403A1 (en) * 2016-09-01 2018-03-01 Amazon Technologies, Inc. Indicator for voice-based communications
US20180061404A1 (en) * 2016-09-01 2018-03-01 Amazon Technologies, Inc. Indicator for voice-based communications
US20180068663A1 (en) * 2016-09-07 2018-03-08 Samsung Electronics Co., Ltd. Server and method for controlling external device
US20180067717A1 (en) * 2016-09-02 2018-03-08 Allomind, Inc. Voice-driven interface to control multi-layered content in a head mounted display
US20180137860A1 (en) * 2015-05-19 2018-05-17 Sony Corporation Information processing device, information processing method, and program
US20180152557A1 (en) * 2014-07-09 2018-05-31 Ooma, Inc. Integrating intelligent personal assistants with appliance devices
US10037679B1 (en) * 2017-01-27 2018-07-31 Bengi Crosby Garbage reminder system
US10083006B1 (en) 2017-09-12 2018-09-25 Google Llc Intercom-style communication using multiple computing devices
US10116796B2 (en) 2015-10-09 2018-10-30 Ooma, Inc. Real-time communications-based internet advertising
US10127227B1 (en) * 2017-05-15 2018-11-13 Google Llc Providing access to user-controlled resources by automated assistants
US10135976B2 (en) 2013-09-23 2018-11-20 Ooma, Inc. Identifying and filtering incoming telephone calls to enhance privacy
US20180342237A1 (en) * 2017-05-29 2018-11-29 Samsung Electronics Co., Ltd. Electronic apparatus for recognizing keyword included in your utterance to change to operating state and controlling method thereof
US10158584B2 (en) 2015-05-08 2018-12-18 Ooma, Inc. Remote fault tolerance for managing alternative networks for high quality of service communications
US20190043510A1 (en) * 2015-09-30 2019-02-07 Huawei Technologies Co., Ltd. Voice Control Processing Method and Apparatus
US10229687B2 (en) 2016-03-10 2019-03-12 Microsoft Technology Licensing, Llc Scalable endpoint-dependent natural language understanding
CN109479110A (en) * 2016-03-08 2019-03-15 优确诺股份有限公司 The system and method that dynamic creation individualizes exercise videos
US20190129938A1 (en) * 2017-10-31 2019-05-02 Baidu Usa Llc System and method for performing tasks based on user inputs using natural language processing
WO2019118367A1 (en) * 2017-12-12 2019-06-20 Amazon Technologies, Inc. Selective notification delivery based on user presence detections
JP2019114296A (en) * 2014-05-15 2019-07-11 ソニー株式会社 System and device
US10379808B1 (en) * 2015-09-29 2019-08-13 Amazon Technologies, Inc. Audio associating of computing devices
US20190311712A1 (en) * 2017-07-28 2019-10-10 Nuance Communications, Inc. Selection system and method
USD864466S1 (en) 2017-05-05 2019-10-22 Hubbell Incorporated Lighting fixture
US10469556B2 (en) 2007-05-31 2019-11-05 Ooma, Inc. System and method for providing audio cues in operation of a VoIP service
CN110447067A (en) * 2017-03-23 2019-11-12 夏普株式会社 It gives orders or instructions the control program of device, the control method of the device of giving orders or instructions and the device of giving orders or instructions
US20190348048A1 (en) * 2018-05-10 2019-11-14 International Business Machines Corporation Providing reminders based on voice recognition
US10515653B1 (en) * 2013-12-19 2019-12-24 Amazon Technologies, Inc. Voice controlled system
US20200034108A1 (en) * 2018-07-25 2020-01-30 Sensory, Incorporated Dynamic Volume Adjustment For Virtual Assistants
US10553098B2 (en) 2014-05-20 2020-02-04 Ooma, Inc. Appliance device integration with alarm systems
US10573321B1 (en) * 2018-09-25 2020-02-25 Sonos, Inc. Voice detection optimization based on selected voice assistant service
US10586540B1 (en) 2019-06-12 2020-03-10 Sonos, Inc. Network microphone device with command keyword conditioning
US10600291B2 (en) 2014-01-13 2020-03-24 Alexis Ander Kashar System and method for alerting a user
US10606555B1 (en) 2017-09-29 2020-03-31 Sonos, Inc. Media playback system with concurrent voice assistance
US10614807B2 (en) 2016-10-19 2020-04-07 Sonos, Inc. Arbitration-based voice recognition
US10621981B2 (en) 2017-09-28 2020-04-14 Sonos, Inc. Tone interference cancellation
US20200150982A1 (en) * 2018-11-12 2020-05-14 International Business Machines Corporation Determination and inititation of a computing interface for computer-initiated task response
US10692518B2 (en) 2018-09-29 2020-06-23 Sonos, Inc. Linear filtering for noise-suppressed speech detection via multiple network microphone devices
US10699711B2 (en) 2016-07-15 2020-06-30 Sonos, Inc. Voice detection by multiple devices
US10708677B1 (en) * 2014-09-30 2020-07-07 Amazon Technologies, Inc. Audio assemblies for electronic devices
US10714115B2 (en) 2016-06-09 2020-07-14 Sonos, Inc. Dynamic player selection for audio signal processing
US10743101B2 (en) 2016-02-22 2020-08-11 Sonos, Inc. Content mixing
US10769931B2 (en) 2014-05-20 2020-09-08 Ooma, Inc. Network jamming detection and remediation
US10771396B2 (en) 2015-05-08 2020-09-08 Ooma, Inc. Communications network failure detection and remediation
WO2020180008A1 (en) * 2019-03-06 2020-09-10 Samsung Electronics Co., Ltd. Method for processing plans having multiple end points and electronic device applying the same method
US10803859B1 (en) * 2017-09-05 2020-10-13 Amazon Technologies, Inc. Speech processing for public devices
US10818158B2 (en) 2014-05-20 2020-10-27 Ooma, Inc. Security monitoring and control
CN111971647A (en) * 2018-04-09 2020-11-20 麦克赛尔株式会社 Speech recognition apparatus, cooperation system of speech recognition apparatus, and cooperation method of speech recognition apparatus
US10847143B2 (en) 2016-02-22 2020-11-24 Sonos, Inc. Voice control of a media playback system
US10847174B2 (en) 2017-12-20 2020-11-24 Hubbell Incorporated Voice responsive in-wall device
US10847164B2 (en) 2016-08-05 2020-11-24 Sonos, Inc. Playback device supporting concurrent voice assistants
US10847178B2 (en) 2018-05-18 2020-11-24 Sonos, Inc. Linear filtering for noise-suppressed speech detection
US10853761B1 (en) 2016-06-24 2020-12-01 Amazon Technologies, Inc. Speech-based inventory management system and method
US10873819B2 (en) 2016-09-30 2020-12-22 Sonos, Inc. Orientation-based playback device microphone selection
US10871943B1 (en) 2019-07-31 2020-12-22 Sonos, Inc. Noise classification for event detection
US10880650B2 (en) 2017-12-10 2020-12-29 Sonos, Inc. Network microphone devices with automatic do not disturb actuation capabilities
US10878811B2 (en) 2018-09-14 2020-12-29 Sonos, Inc. Networked devices, systems, and methods for intelligently deactivating wake-word engines
US10880644B1 (en) 2017-09-28 2020-12-29 Sonos, Inc. Three-dimensional beam forming with a microphone array
US10891932B2 (en) 2017-09-28 2021-01-12 Sonos, Inc. Multi-channel acoustic echo cancellation
US10896671B1 (en) * 2015-08-21 2021-01-19 Soundhound, Inc. User-defined extensions of the command input recognized by a virtual assistant
US10909981B2 (en) * 2017-06-13 2021-02-02 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Mobile terminal, method of controlling same, and computer-readable storage medium
US10911368B2 (en) 2015-05-08 2021-02-02 Ooma, Inc. Gateway address spoofing for alternate network utilization
US10938389B2 (en) 2017-12-20 2021-03-02 Hubbell Incorporated Gesture control for in-wall device
US10959029B2 (en) 2018-05-25 2021-03-23 Sonos, Inc. Determining and adapting to changes in microphone performance of playback devices
US10970035B2 (en) 2016-02-22 2021-04-06 Sonos, Inc. Audio response playback
US11017789B2 (en) 2017-09-27 2021-05-25 Sonos, Inc. Robust Short-Time Fourier Transform acoustic echo cancellation during audio playback
US11024331B2 (en) 2018-09-21 2021-06-01 Sonos, Inc. Voice detection optimization using sound metadata
US11032211B2 (en) 2015-05-08 2021-06-08 Ooma, Inc. Communications hub
US11042355B2 (en) 2016-02-22 2021-06-22 Sonos, Inc. Handling of loss of pairing between networked devices
US20210210088A1 (en) * 2020-01-08 2021-07-08 Beijing Xiaomi Pinecone Electronics Co., Ltd. Speech interaction method and apparatus, device and storage medium
US11076035B2 (en) 2018-08-28 2021-07-27 Sonos, Inc. Do not disturb feature for audio notifications
US11080005B2 (en) 2017-09-08 2021-08-03 Sonos, Inc. Dynamic computation of system response volume
USD927433S1 (en) 2018-01-05 2021-08-10 Hubbell Incorporated Front panel of in-wall fan controller with indicator component
US11087023B2 (en) 2018-08-07 2021-08-10 Google Llc Threshold-based assembly of automated assistant responses
US11100923B2 (en) 2018-09-28 2021-08-24 Sonos, Inc. Systems and methods for selective wake word detection using neural network models
US11132991B2 (en) * 2019-04-23 2021-09-28 Lg Electronics Inc. Method and apparatus for determining voice enable device
US11132989B2 (en) 2018-12-13 2021-09-28 Sonos, Inc. Networked microphone devices, systems, and methods of localized arbitration
US11138975B2 (en) 2019-07-31 2021-10-05 Sonos, Inc. Locally distributed keyword detection
US11138969B2 (en) 2019-07-31 2021-10-05 Sonos, Inc. Locally distributed keyword detection
US11159880B2 (en) 2018-12-20 2021-10-26 Sonos, Inc. Optimization of network microphone devices using noise classification
US11171875B2 (en) 2015-05-08 2021-11-09 Ooma, Inc. Systems and methods of communications network failure detection and remediation utilizing link probes
US20210352059A1 (en) * 2014-11-04 2021-11-11 Huawei Technologies Co., Ltd. Message Display Method, Apparatus, and Device
US11175880B2 (en) 2018-05-10 2021-11-16 Sonos, Inc. Systems and methods for voice-assisted media content selection
EP3753015A4 (en) * 2018-02-13 2021-11-17 Roku, Inc. Trigger word detection with multiple digital assistants
US11183183B2 (en) 2018-12-07 2021-11-23 Sonos, Inc. Systems and methods of operating media playback systems having multiple voice assistant services
US11184969B2 (en) 2016-07-15 2021-11-23 Sonos, Inc. Contextualization of voice inputs
US11183181B2 (en) 2017-03-27 2021-11-23 Sonos, Inc. Systems and methods of multiple voice services
US11189286B2 (en) 2019-10-22 2021-11-30 Sonos, Inc. VAS toggle based on device orientation
US20210375267A1 (en) * 2020-05-30 2021-12-02 Jio Platforms Limited Method and system for smart interaction in a multi voice capable device environment
US11197096B2 (en) 2018-06-28 2021-12-07 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
US11200900B2 (en) 2019-12-20 2021-12-14 Sonos, Inc. Offline voice control
US11200894B2 (en) 2019-06-12 2021-12-14 Sonos, Inc. Network microphone device with command keyword eventing
US11200889B2 (en) 2018-11-15 2021-12-14 Sonos, Inc. Dilated convolutions and gating for efficient keyword spotting
US20220012470A1 (en) * 2017-02-14 2022-01-13 Microsoft Technology Licensing, Llc Multi-user intelligent assistance
US11258858B1 (en) 2020-11-24 2022-02-22 International Business Machines Corporation Multi-device connection management
US11265684B2 (en) * 2017-03-03 2022-03-01 Orion Labs, Inc. Phone-less member of group communication constellations
USD947137S1 (en) 2019-10-22 2022-03-29 Hubbell Incorporated Front panel of in-wall fan controller with indicator component
US20220100464A1 (en) * 2020-09-28 2022-03-31 Samsung Electronics Co., Ltd. Methods and systems for execution of voice commands
US11302328B2 (en) * 2017-11-02 2022-04-12 Hisense Visual Technology Co., Ltd. Voice interactive device and method for controlling voice interactive device
US11308958B2 (en) 2020-02-07 2022-04-19 Sonos, Inc. Localized wakeword verification
US11308962B2 (en) 2020-05-20 2022-04-19 Sonos, Inc. Input detection windowing
US11315556B2 (en) 2019-02-08 2022-04-26 Sonos, Inc. Devices, systems, and methods for distributed voice processing by transmitting sound data associated with a wake word to an appropriate device for identification
US11315071B1 (en) * 2016-06-24 2022-04-26 Amazon Technologies, Inc. Speech-based storage tracking
US20220139413A1 (en) * 2020-10-30 2022-05-05 Samsung Electronics Co., Ltd. Electronic apparatus and method of controlling the same
US11343614B2 (en) 2018-01-31 2022-05-24 Sonos, Inc. Device designation of playback and network microphone device arrangements
US11361756B2 (en) 2019-06-12 2022-06-14 Sonos, Inc. Conditional wake word eventing based on environment
US11380322B2 (en) 2017-08-07 2022-07-05 Sonos, Inc. Wake-word detection suppression
US11405430B2 (en) 2016-02-22 2022-08-02 Sonos, Inc. Networked microphone device control
US11410646B1 (en) * 2016-09-29 2022-08-09 Amazon Technologies, Inc. Processing complex utterances for natural language understanding
US11410638B1 (en) * 2017-08-30 2022-08-09 Amazon Technologies, Inc. Voice user interface for nested content
US11423899B2 (en) * 2018-11-19 2022-08-23 Google Llc Controlling device output according to a determined condition of a user
US11430434B1 (en) * 2017-02-15 2022-08-30 Amazon Technologies, Inc. Intelligent privacy protection mediation
US11432030B2 (en) 2018-09-14 2022-08-30 Sonos, Inc. Networked devices, systems, and methods for associating playback devices based on sound codes
US11436417B2 (en) * 2017-05-15 2022-09-06 Google Llc Providing access to user-controlled resources by automated assistants
EP4057165A1 (en) * 2021-03-11 2022-09-14 Deutsche Telekom AG Voice assistance control
US11474883B2 (en) * 2018-10-26 2022-10-18 International Business Machines Corporation Cognitive agent for persistent multi-platform reminder provision
US11482978B2 (en) 2018-08-28 2022-10-25 Sonos, Inc. Audio notifications
US11482224B2 (en) 2020-05-20 2022-10-25 Sonos, Inc. Command keywords with input detection windowing
US20220413796A1 (en) * 2012-12-31 2022-12-29 Apple Inc. Multi-user tv user interface
US11551700B2 (en) 2021-01-25 2023-01-10 Sonos, Inc. Systems and methods for power-efficient keyword detection
US11551686B2 (en) * 2019-04-30 2023-01-10 Samsung Electronics Co., Ltd. Home appliance and method for controlling thereof
US11556306B2 (en) 2016-02-22 2023-01-17 Sonos, Inc. Voice controlled media playback system
US11556307B2 (en) 2020-01-31 2023-01-17 Sonos, Inc. Local voice data processing
US11562740B2 (en) 2020-01-07 2023-01-24 Sonos, Inc. Voice verification for media playback
US11604831B2 (en) * 2018-06-08 2023-03-14 Ntt Docomo, Inc. Interactive device
US11609678B2 (en) 2016-10-26 2023-03-21 Apple Inc. User interfaces for browsing content from multiple content applications on an electronic device
US11641559B2 (en) 2016-09-27 2023-05-02 Sonos, Inc. Audio playback settings for voice interaction
US11646023B2 (en) 2019-02-08 2023-05-09 Sonos, Inc. Devices, systems, and methods for distributed voice processing
US11646025B2 (en) 2017-08-28 2023-05-09 Roku, Inc. Media system with multiple digital assistants
US11676590B2 (en) 2017-12-11 2023-06-13 Sonos, Inc. Home graph
US11683565B2 (en) 2019-03-24 2023-06-20 Apple Inc. User interfaces for interacting with channels that provide content that plays in a media browsing application
US11698771B2 (en) 2020-08-25 2023-07-11 Sonos, Inc. Vocal guidance engines for playback devices
US11720229B2 (en) 2020-12-07 2023-08-08 Apple Inc. User interfaces for browsing and presenting content
US11727919B2 (en) 2020-05-20 2023-08-15 Sonos, Inc. Memory allocation for keyword spotting engines
US11798553B2 (en) 2019-05-03 2023-10-24 Sonos, Inc. Voice assistant persistence across multiple network microphone devices
US11797606B2 (en) 2019-05-31 2023-10-24 Apple Inc. User interfaces for a podcast browsing and playback application
US11804227B2 (en) 2017-08-28 2023-10-31 Roku, Inc. Local and cloud speech recognition
US20230359973A1 (en) * 2022-05-04 2023-11-09 Kyndryl, Inc. Ad-hoc application development
US11843838B2 (en) 2020-03-24 2023-12-12 Apple Inc. User interfaces for accessing episodes of a content series
US11863837B2 (en) 2019-05-31 2024-01-02 Apple Inc. Notification of augmented reality content on an electronic device
US11899895B2 (en) 2020-06-21 2024-02-13 Apple Inc. User interfaces for setting up an electronic device
US11899519B2 (en) 2018-10-23 2024-02-13 Sonos, Inc. Multiple stage network microphone device with reduced power consumption and processing load
US11934640B2 (en) 2021-01-29 2024-03-19 Apple Inc. User interfaces for record labels
WO2024072994A1 (en) * 2022-09-30 2024-04-04 Google Llc Selecting a device to respond to device-agnostic user requests
US11962836B2 (en) 2020-03-24 2024-04-16 Apple Inc. User interfaces for a media browsing application

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106814639A (en) * 2015-11-27 2017-06-09 富泰华工业(深圳)有限公司 Speech control system and method
US10976998B2 (en) 2016-09-23 2021-04-13 Sony Corporation Information processing apparatus and information processing method for controlling a response to speech
US10332523B2 (en) 2016-11-18 2019-06-25 Google Llc Virtual assistant identification of nearby computing devices
US11277274B2 (en) 2017-10-12 2022-03-15 International Business Machines Corporation Device ranking for secure collaboration
JP7057647B2 (en) * 2017-11-17 2022-04-20 キヤノン株式会社 Voice control system, control method and program
US10747477B2 (en) 2017-11-17 2020-08-18 Canon Kabushiki Kaisha Print control system that transmit to a registered printing apparatus, a change instruction for changing a setting of the power of the registered printing apparatus, and related method
JP7071098B2 (en) 2017-11-20 2022-05-18 キヤノン株式会社 Voice control system, control method and program
US11121990B2 (en) * 2017-12-21 2021-09-14 International Business Machines Corporation Methods and systems for optimizing delivery of electronic communications
JP6928842B2 (en) * 2018-02-14 2021-09-01 パナソニックIpマネジメント株式会社 Control information acquisition system and control information acquisition method
WO2020116026A1 (en) * 2018-12-07 2020-06-11 ソニー株式会社 Response processing device, response processing method, and response processing program
CN110223686A (en) * 2019-05-31 2019-09-10 联想(北京)有限公司 Audio recognition method, speech recognition equipment and electronic equipment
US11388021B2 (en) 2019-07-23 2022-07-12 International Business Machines Corporation Intelligent virtual assistant notification rerouting
CN110457078B (en) 2019-08-09 2020-11-24 百度在线网络技术(北京)有限公司 Intelligent service method, device and equipment
CN110990236A (en) * 2019-10-08 2020-04-10 山东科技大学 SaaS software performance problem recognition method based on hidden Markov random field
CA3059032A1 (en) 2019-10-17 2021-04-17 The Toronto-Dominion Bank Homomorphic encryption of communications involving voice-enabled devices in a distributed computing environment
CA3059029A1 (en) 2019-10-17 2021-04-17 The Toronto-Dominion Bank Maintaining data confidentiality in communications involving voice-enabled devices in a distributed computing environment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493692A (en) * 1993-12-03 1996-02-20 Xerox Corporation Selective delivery of electronic messages in a multiple computer system based on context and environment of a user
US5928325A (en) * 1997-02-24 1999-07-27 Motorola, Inc. Method of dynamically establishing communication of incoming messages to one or more user devices presently available to an intended recipient
US20050125541A1 (en) * 2003-12-04 2005-06-09 Randall Frank Integrating multiple communication modes
US7522608B2 (en) * 2005-11-01 2009-04-21 Microsoft Corporation Endpoint selection for a call completion response
US7673010B2 (en) * 2006-01-27 2010-03-02 Broadcom Corporation Multi user client terminals operable to support network communications
US7920679B1 (en) * 2006-02-02 2011-04-05 Sprint Communications Company L.P. Communication system and method for notifying persons of an emergency telephone call
US8166119B2 (en) * 2008-04-25 2012-04-24 T-Mobile Usa, Inc. Messaging device for delivering messages to recipients based on availability and preferences of recipients
US8484344B2 (en) * 2009-03-02 2013-07-09 International Business Machines Corporation Communicating messages to proximate devices on a contact list responsive to an unsuccessful call

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5255341A (en) * 1989-08-14 1993-10-19 Kabushiki Kaisha Toshiba Command input device for voice controllable elevator system
US5862321A (en) * 1994-06-27 1999-01-19 Xerox Corporation System and method for accessing and distributing electronic documents
JP3835771B2 (en) * 1996-03-15 2006-10-18 株式会社東芝 Communication apparatus and communication method
US6587835B1 (en) 2000-02-09 2003-07-01 G. Victor Treyz Shopping assistance with handheld computing device
US7084997B2 (en) * 2001-07-13 2006-08-01 Hewlett-Packard Development Company, L.P. Schedule-based printer selection
JP2003116175A (en) * 2001-10-03 2003-04-18 Ntt Docomo Inc Controller for notifying call out
US7099380B1 (en) * 2001-11-16 2006-08-29 Marvell International Ltd. Apparatus for antenna diversity for wireless communication and method thereof
US20040019603A1 (en) * 2002-05-29 2004-01-29 Honeywell International Inc. System and method for automatically generating condition-based activity prompts
US7720683B1 (en) 2003-06-13 2010-05-18 Sensory, Inc. Method and apparatus of specifying and performing speech recognition operations
US20050043940A1 (en) 2003-08-20 2005-02-24 Marvin Elder Preparing a data source for a natural language query
US7418392B1 (en) 2003-09-25 2008-08-26 Sensory, Inc. System and method for controlling the operation of a device by voice commands
US8180722B2 (en) * 2004-09-30 2012-05-15 Avaya Inc. Method and apparatus for data mining within communication session information using an entity relationship model
US7899468B2 (en) * 2005-09-30 2011-03-01 Telecommunication Systems, Inc. Location sensitive messaging
KR100678518B1 (en) * 2005-12-23 2007-02-02 아주대학교산학협력단 Smart scheduler capable of reflecting change of situation
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
JP2008228184A (en) * 2007-03-15 2008-09-25 Funai Electric Co Ltd Audio output apparatus
WO2009061820A1 (en) 2007-11-05 2009-05-14 Chacha Search, Inc Method and system of accessing information
CN101452697A (en) * 2007-11-29 2009-06-10 卢能晓 Environmental-protecting type horn of vehicle for self-regulating sound volume based on entironment noise
US8150967B2 (en) * 2009-03-24 2012-04-03 Yahoo! Inc. System and method for verified presence tracking
US8620846B2 (en) * 2010-01-21 2013-12-31 Telcordia Technologies, Inc. Method and system for improving personal productivity in home environments
US8332544B1 (en) * 2010-03-17 2012-12-11 Mattel, Inc. Systems, methods, and devices for assisting play
US20120223885A1 (en) 2011-03-02 2012-09-06 Microsoft Corporation Immersive display experience
US8737950B2 (en) * 2011-03-17 2014-05-27 Sony Corporation Verifying calendar information through proximate device detection
US20120259633A1 (en) * 2011-04-07 2012-10-11 Microsoft Corporation Audio-interactive message exchange
US20120297305A1 (en) * 2011-05-17 2012-11-22 Microsoft Corporation Presenting or sharing state in presence
US8954177B2 (en) * 2011-06-01 2015-02-10 Apple Inc. Controlling operation of a media device based upon whether a presentation device is currently being worn by a user
US8775103B1 (en) * 2011-06-17 2014-07-08 Amazon Technologies, Inc. Proximity sensor calibration and configuration
US9542956B1 (en) * 2012-01-09 2017-01-10 Interactive Voice, Inc. Systems and methods for responding to human spoken audio
US9438642B2 (en) * 2012-05-01 2016-09-06 Google Technology Holdings LLC Methods for coordinating communications between a plurality of communication devices of a user
US10250638B2 (en) * 2012-05-02 2019-04-02 Elwha Llc Control of transmission to a target device with a cloud-based architecture
US9460237B2 (en) * 2012-05-08 2016-10-04 24/7 Customer, Inc. Predictive 411
US9197848B2 (en) * 2012-06-25 2015-11-24 Intel Corporation Video conferencing transitions among a plurality of devices
US9015099B2 (en) * 2012-08-14 2015-04-21 Sri International Method, system and device for inferring a mobile user's current context and proactively providing assistance
US10028204B2 (en) * 2012-08-24 2018-07-17 Blackberry Limited Supporting device-to-device communication in a rich communication service context
US9436382B2 (en) * 2012-09-18 2016-09-06 Adobe Systems Incorporated Natural language image editing
US9264850B1 (en) * 2012-11-20 2016-02-16 Square, Inc. Multiple merchants in cardless payment transactions and multiple customers in cardless payment transactions
US20140164088A1 (en) * 2012-12-06 2014-06-12 Mark R. Rorabaugh Social network loyalty-reward system and method
US9271111B2 (en) * 2012-12-14 2016-02-23 Amazon Technologies, Inc. Response endpoint selection

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493692A (en) * 1993-12-03 1996-02-20 Xerox Corporation Selective delivery of electronic messages in a multiple computer system based on context and environment of a user
US5928325A (en) * 1997-02-24 1999-07-27 Motorola, Inc. Method of dynamically establishing communication of incoming messages to one or more user devices presently available to an intended recipient
US20050125541A1 (en) * 2003-12-04 2005-06-09 Randall Frank Integrating multiple communication modes
US7522608B2 (en) * 2005-11-01 2009-04-21 Microsoft Corporation Endpoint selection for a call completion response
US8179899B2 (en) * 2005-11-01 2012-05-15 Microsoft Corporation Endpoint selection for a call completion response
US7673010B2 (en) * 2006-01-27 2010-03-02 Broadcom Corporation Multi user client terminals operable to support network communications
US7920679B1 (en) * 2006-02-02 2011-04-05 Sprint Communications Company L.P. Communication system and method for notifying persons of an emergency telephone call
US8166119B2 (en) * 2008-04-25 2012-04-24 T-Mobile Usa, Inc. Messaging device for delivering messages to recipients based on availability and preferences of recipients
US8484344B2 (en) * 2009-03-02 2013-07-09 International Business Machines Corporation Communicating messages to proximate devices on a contact list responsive to an unsuccessful call

Cited By (283)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10469556B2 (en) 2007-05-31 2019-11-05 Ooma, Inc. System and method for providing audio cues in operation of a VoIP service
US10778778B1 (en) 2012-12-14 2020-09-15 Amazon Technologies, Inc. Response endpoint selection based on user proximity determination
US9271111B2 (en) * 2012-12-14 2016-02-23 Amazon Technologies, Inc. Response endpoint selection
US11822858B2 (en) * 2012-12-31 2023-11-21 Apple Inc. Multi-user TV user interface
US20220413796A1 (en) * 2012-12-31 2022-12-29 Apple Inc. Multi-user tv user interface
US9818407B1 (en) * 2013-02-07 2017-11-14 Amazon Technologies, Inc. Distributed endpointing for speech recognition
US10499192B2 (en) * 2013-03-14 2019-12-03 T-Mobile Usa, Inc. Proximity-based device selection for communication delivery
US20140280541A1 (en) * 2013-03-14 2014-09-18 T-Mobile Usa, Inc. Proximity-Based Device Selection for Communication Delivery
US10728386B2 (en) 2013-09-23 2020-07-28 Ooma, Inc. Identifying and filtering incoming telephone calls to enhance privacy
US10135976B2 (en) 2013-09-23 2018-11-20 Ooma, Inc. Identifying and filtering incoming telephone calls to enhance privacy
US20150088272A1 (en) * 2013-09-23 2015-03-26 Emerson Electric Co. Energy Management Based on Occupancy and Occupant Activity Level
US10515653B1 (en) * 2013-12-19 2019-12-24 Amazon Technologies, Inc. Voice controlled system
US11501792B1 (en) 2013-12-19 2022-11-15 Amazon Technologies, Inc. Voice controlled system
US10878836B1 (en) * 2013-12-19 2020-12-29 Amazon Technologies, Inc. Voice controlled system
US20170278331A1 (en) * 2013-12-23 2017-09-28 Assa Abloy Inc. Method for utilizing a wireless connection to unlock an opening
US10078931B2 (en) * 2013-12-23 2018-09-18 Assa Abloy Inc. Method for utilizing a wireless connection to unlock an opening
US10600291B2 (en) 2014-01-13 2020-03-24 Alexis Ander Kashar System and method for alerting a user
US10274908B2 (en) * 2014-01-13 2019-04-30 Barbara Ander System and method for alerting a user
US20150198939A1 (en) * 2014-01-13 2015-07-16 Barbara Ander System and Method for Alerting a User
US9432794B2 (en) * 2014-02-24 2016-08-30 International Business Machines Corporation Techniques for mobility-aware dynamic service placement in mobile clouds
US10231102B2 (en) 2014-02-24 2019-03-12 International Business Machines Corporation Techniques for mobility-aware dynamic service placement in mobile clouds
US20150245160A1 (en) * 2014-02-24 2015-08-27 International Business Machines Corporation Techniques for Mobility-Aware Dynamic Service Placement in Mobile Clouds
US20150271038A1 (en) * 2014-03-20 2015-09-24 Fujitsu Limited Link-device selecting apparatus and method
US11216153B2 (en) 2014-05-15 2022-01-04 Sony Corporation Information processing device, display control method, and program
JP2019114296A (en) * 2014-05-15 2019-07-11 ソニー株式会社 System and device
US11693530B2 (en) 2014-05-15 2023-07-04 Sony Corporation Information processing device, display control method, and program
US11495117B2 (en) 2014-05-20 2022-11-08 Ooma, Inc. Security monitoring and control
US10818158B2 (en) 2014-05-20 2020-10-27 Ooma, Inc. Security monitoring and control
US11094185B2 (en) 2014-05-20 2021-08-17 Ooma, Inc. Community security monitoring and control
US10769931B2 (en) 2014-05-20 2020-09-08 Ooma, Inc. Network jamming detection and remediation
US11151862B2 (en) 2014-05-20 2021-10-19 Ooma, Inc. Security monitoring and control utilizing DECT devices
US11763663B2 (en) 2014-05-20 2023-09-19 Ooma, Inc. Community security monitoring and control
US10553098B2 (en) 2014-05-20 2020-02-04 Ooma, Inc. Appliance device integration with alarm systems
US11250687B2 (en) 2014-05-20 2022-02-15 Ooma, Inc. Network jamming detection and remediation
CN106663241A (en) * 2014-06-24 2017-05-10 谷歌公司 List accumulation and reminder triggering
RU2666462C2 (en) * 2014-06-24 2018-09-07 Гугл Инк. Accumulation of lists and activation of reminder
US10783166B2 (en) * 2014-06-24 2020-09-22 Google Llc List accumulation and reminder triggering
CN113807806A (en) * 2014-06-24 2021-12-17 谷歌有限责任公司 List accumulation and reminder triggering
WO2015200042A1 (en) * 2014-06-24 2015-12-30 Google Inc. List accumulation and reminder triggering
US11562005B2 (en) 2014-06-24 2023-01-24 Google Llc List accumulation and reminder triggering
US20150370884A1 (en) * 2014-06-24 2015-12-24 Google Inc. List accumulation and reminder triggering
US11330100B2 (en) * 2014-07-09 2022-05-10 Ooma, Inc. Server based intelligent personal assistant services
US11316974B2 (en) 2014-07-09 2022-04-26 Ooma, Inc. Cloud-based assistive services for use in telecommunications and on premise devices
US11315405B2 (en) 2014-07-09 2022-04-26 Ooma, Inc. Systems and methods for provisioning appliance devices
US20180152557A1 (en) * 2014-07-09 2018-05-31 Ooma, Inc. Integrating intelligent personal assistants with appliance devices
US20160021494A1 (en) * 2014-07-18 2016-01-21 Lei Yang Systems and methods for adaptive multi-feature semantic location sensing
US9807549B2 (en) * 2014-07-18 2017-10-31 Intel Corporation Systems and methods for adaptive multi-feature semantic location sensing
US10708677B1 (en) * 2014-09-30 2020-07-07 Amazon Technologies, Inc. Audio assemblies for electronic devices
US11399224B1 (en) 2014-09-30 2022-07-26 Amazon Technologies, Inc. Audio assemblies for electronic devices
US20210352059A1 (en) * 2014-11-04 2021-11-11 Huawei Technologies Co., Ltd. Message Display Method, Apparatus, and Device
US11171875B2 (en) 2015-05-08 2021-11-09 Ooma, Inc. Systems and methods of communications network failure detection and remediation utilizing link probes
US10158584B2 (en) 2015-05-08 2018-12-18 Ooma, Inc. Remote fault tolerance for managing alternative networks for high quality of service communications
US11646974B2 (en) 2015-05-08 2023-05-09 Ooma, Inc. Systems and methods for end point data communications anonymization for a communications hub
US10263918B2 (en) 2015-05-08 2019-04-16 Ooma, Inc. Local fault tolerance for managing alternative networks for high quality of service communications
US10911368B2 (en) 2015-05-08 2021-02-02 Ooma, Inc. Gateway address spoofing for alternate network utilization
US10771396B2 (en) 2015-05-08 2020-09-08 Ooma, Inc. Communications network failure detection and remediation
US11032211B2 (en) 2015-05-08 2021-06-08 Ooma, Inc. Communications hub
US20210050013A1 (en) * 2015-05-19 2021-02-18 Sony Corporation Information processing device, information processing method, and program
US10861449B2 (en) * 2015-05-19 2020-12-08 Sony Corporation Information processing device and information processing method
US20180137860A1 (en) * 2015-05-19 2018-05-17 Sony Corporation Information processing device, information processing method, and program
US10896671B1 (en) * 2015-08-21 2021-01-19 Soundhound, Inc. User-defined extensions of the command input recognized by a virtual assistant
US10379808B1 (en) * 2015-09-29 2019-08-13 Amazon Technologies, Inc. Audio associating of computing devices
US10777205B2 (en) * 2015-09-30 2020-09-15 Huawei Technologies Co., Ltd. Voice control processing method and apparatus
US20190043510A1 (en) * 2015-09-30 2019-02-07 Huawei Technologies Co., Ltd. Voice Control Processing Method and Apparatus
US10341490B2 (en) 2015-10-09 2019-07-02 Ooma, Inc. Real-time communications-based internet advertising
US10116796B2 (en) 2015-10-09 2018-10-30 Ooma, Inc. Real-time communications-based internet advertising
US20200013397A1 (en) * 2016-02-12 2020-01-09 Amazon Technologies, Inc. Processing spoken commands to control distributed audio outputs
US10878815B2 (en) * 2016-02-12 2020-12-29 Amazon Technologies, Inc. Processing spoken commands to control distributed audio outputs
US10262657B1 (en) * 2016-02-12 2019-04-16 Amazon Technologies, Inc. Processing spoken commands to control distributed audio outputs
US9858927B2 (en) * 2016-02-12 2018-01-02 Amazon Technologies, Inc Processing spoken commands to control distributed audio outputs
US9898250B1 (en) * 2016-02-12 2018-02-20 Amazon Technologies, Inc. Controlling distributed audio outputs to enable voice output
US11137979B2 (en) 2016-02-22 2021-10-05 Sonos, Inc. Metadata exchange involving a networked playback system and a networked microphone system
US10970035B2 (en) 2016-02-22 2021-04-06 Sonos, Inc. Audio response playback
US10743101B2 (en) 2016-02-22 2020-08-11 Sonos, Inc. Content mixing
US11514898B2 (en) 2016-02-22 2022-11-29 Sonos, Inc. Voice control of a media playback system
US11184704B2 (en) 2016-02-22 2021-11-23 Sonos, Inc. Music service selection
US10764679B2 (en) 2016-02-22 2020-09-01 Sonos, Inc. Voice control of a media playback system
US11832068B2 (en) 2016-02-22 2023-11-28 Sonos, Inc. Music service selection
US11042355B2 (en) 2016-02-22 2021-06-22 Sonos, Inc. Handling of loss of pairing between networked devices
US11750969B2 (en) 2016-02-22 2023-09-05 Sonos, Inc. Default playback device designation
US11863593B2 (en) 2016-02-22 2024-01-02 Sonos, Inc. Networked microphone device control
US11513763B2 (en) 2016-02-22 2022-11-29 Sonos, Inc. Audio response playback
US10971139B2 (en) 2016-02-22 2021-04-06 Sonos, Inc. Voice control of a media playback system
US11405430B2 (en) 2016-02-22 2022-08-02 Sonos, Inc. Networked microphone device control
US11736860B2 (en) 2016-02-22 2023-08-22 Sonos, Inc. Voice control of a media playback system
US11212612B2 (en) 2016-02-22 2021-12-28 Sonos, Inc. Voice control of a media playback system
US11556306B2 (en) 2016-02-22 2023-01-17 Sonos, Inc. Voice controlled media playback system
US10847143B2 (en) 2016-02-22 2020-11-24 Sonos, Inc. Voice control of a media playback system
US11726742B2 (en) 2016-02-22 2023-08-15 Sonos, Inc. Handling of loss of pairing between networked devices
US11006214B2 (en) 2016-02-22 2021-05-11 Sonos, Inc. Default playback device designation
CN109479110A (en) * 2016-03-08 2019-03-15 优确诺股份有限公司 The system and method that dynamic creation individualizes exercise videos
US10229687B2 (en) 2016-03-10 2019-03-12 Microsoft Technology Licensing, Llc Scalable endpoint-dependent natural language understanding
WO2017197312A3 (en) * 2016-05-13 2017-12-21 Bose Corporation Processing speech from distributed microphones
US10714115B2 (en) 2016-06-09 2020-07-14 Sonos, Inc. Dynamic player selection for audio signal processing
US11133018B2 (en) 2016-06-09 2021-09-28 Sonos, Inc. Dynamic player selection for audio signal processing
US11545169B2 (en) 2016-06-09 2023-01-03 Sonos, Inc. Dynamic player selection for audio signal processing
US10853761B1 (en) 2016-06-24 2020-12-01 Amazon Technologies, Inc. Speech-based inventory management system and method
US11315071B1 (en) * 2016-06-24 2022-04-26 Amazon Technologies, Inc. Speech-based storage tracking
US11184969B2 (en) 2016-07-15 2021-11-23 Sonos, Inc. Contextualization of voice inputs
US11664023B2 (en) 2016-07-15 2023-05-30 Sonos, Inc. Voice detection by multiple devices
US10699711B2 (en) 2016-07-15 2020-06-30 Sonos, Inc. Voice detection by multiple devices
US10847164B2 (en) 2016-08-05 2020-11-24 Sonos, Inc. Playback device supporting concurrent voice assistants
US11531520B2 (en) 2016-08-05 2022-12-20 Sonos, Inc. Playback device supporting concurrent voice assistants
US10453449B2 (en) * 2016-09-01 2019-10-22 Amazon Technologies, Inc. Indicator for voice-based communications
US20180061404A1 (en) * 2016-09-01 2018-03-01 Amazon Technologies, Inc. Indicator for voice-based communications
US11264030B2 (en) 2016-09-01 2022-03-01 Amazon Technologies, Inc. Indicator for voice-based communications
US20220165268A1 (en) * 2016-09-01 2022-05-26 Amazon Technologies, Inc. Indicator for voice-based communications
US10580404B2 (en) * 2016-09-01 2020-03-03 Amazon Technologies, Inc. Indicator for voice-based communications
US20180061403A1 (en) * 2016-09-01 2018-03-01 Amazon Technologies, Inc. Indicator for voice-based communications
US20180067717A1 (en) * 2016-09-02 2018-03-08 Allomind, Inc. Voice-driven interface to control multi-layered content in a head mounted display
US10650822B2 (en) * 2016-09-07 2020-05-12 Samsung Electronics Co., Ltd. Server and method for controlling external device
US20180068663A1 (en) * 2016-09-07 2018-03-08 Samsung Electronics Co., Ltd. Server and method for controlling external device
US11482227B2 (en) 2016-09-07 2022-10-25 Samsung Electronics Co., Ltd. Server and method for controlling external device
US11641559B2 (en) 2016-09-27 2023-05-02 Sonos, Inc. Audio playback settings for voice interaction
US11410646B1 (en) * 2016-09-29 2022-08-09 Amazon Technologies, Inc. Processing complex utterances for natural language understanding
US10873819B2 (en) 2016-09-30 2020-12-22 Sonos, Inc. Orientation-based playback device microphone selection
US11516610B2 (en) 2016-09-30 2022-11-29 Sonos, Inc. Orientation-based playback device microphone selection
US11727933B2 (en) 2016-10-19 2023-08-15 Sonos, Inc. Arbitration-based voice recognition
US11308961B2 (en) 2016-10-19 2022-04-19 Sonos, Inc. Arbitration-based voice recognition
US10614807B2 (en) 2016-10-19 2020-04-07 Sonos, Inc. Arbitration-based voice recognition
US11609678B2 (en) 2016-10-26 2023-03-21 Apple Inc. User interfaces for browsing content from multiple content applications on an electronic device
US10037679B1 (en) * 2017-01-27 2018-07-31 Bengi Crosby Garbage reminder system
US20220012470A1 (en) * 2017-02-14 2022-01-13 Microsoft Technology Licensing, Llc Multi-user intelligent assistance
US11430434B1 (en) * 2017-02-15 2022-08-30 Amazon Technologies, Inc. Intelligent privacy protection mediation
US11265684B2 (en) * 2017-03-03 2022-03-01 Orion Labs, Inc. Phone-less member of group communication constellations
CN110447067A (en) * 2017-03-23 2019-11-12 夏普株式会社 It gives orders or instructions the control program of device, the control method of the device of giving orders or instructions and the device of giving orders or instructions
US11183181B2 (en) 2017-03-27 2021-11-23 Sonos, Inc. Systems and methods of multiple voice services
USD864466S1 (en) 2017-05-05 2019-10-22 Hubbell Incorporated Lighting fixture
CN110622126A (en) * 2017-05-15 2019-12-27 谷歌有限责任公司 Providing access to user-controlled resources through automated assistant
US11436417B2 (en) * 2017-05-15 2022-09-06 Google Llc Providing access to user-controlled resources by automated assistants
US10127227B1 (en) * 2017-05-15 2018-11-13 Google Llc Providing access to user-controlled resources by automated assistants
US10685187B2 (en) * 2017-05-15 2020-06-16 Google Llc Providing access to user-controlled resources by automated assistants
US20180342237A1 (en) * 2017-05-29 2018-11-29 Samsung Electronics Co., Ltd. Electronic apparatus for recognizing keyword included in your utterance to change to operating state and controlling method thereof
US10978048B2 (en) * 2017-05-29 2021-04-13 Samsung Electronics Co., Ltd. Electronic apparatus for recognizing keyword included in your utterance to change to operating state and controlling method thereof
US10909981B2 (en) * 2017-06-13 2021-02-02 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Mobile terminal, method of controlling same, and computer-readable storage medium
US11205421B2 (en) * 2017-07-28 2021-12-21 Cerence Operating Company Selection system and method
US20190311712A1 (en) * 2017-07-28 2019-10-10 Nuance Communications, Inc. Selection system and method
US11380322B2 (en) 2017-08-07 2022-07-05 Sonos, Inc. Wake-word detection suppression
US11900937B2 (en) 2017-08-07 2024-02-13 Sonos, Inc. Wake-word detection suppression
US11804227B2 (en) 2017-08-28 2023-10-31 Roku, Inc. Local and cloud speech recognition
US11646025B2 (en) 2017-08-28 2023-05-09 Roku, Inc. Media system with multiple digital assistants
US11410638B1 (en) * 2017-08-30 2022-08-09 Amazon Technologies, Inc. Voice user interface for nested content
US10803859B1 (en) * 2017-09-05 2020-10-13 Amazon Technologies, Inc. Speech processing for public devices
US11080005B2 (en) 2017-09-08 2021-08-03 Sonos, Inc. Dynamic computation of system response volume
US11500611B2 (en) 2017-09-08 2022-11-15 Sonos, Inc. Dynamic computation of system response volume
CN110741433A (en) * 2017-09-12 2020-01-31 谷歌有限责任公司 Intercom communication using multiple computing devices
US10083006B1 (en) 2017-09-12 2018-09-25 Google Llc Intercom-style communication using multiple computing devices
US11017789B2 (en) 2017-09-27 2021-05-25 Sonos, Inc. Robust Short-Time Fourier Transform acoustic echo cancellation during audio playback
US11646045B2 (en) 2017-09-27 2023-05-09 Sonos, Inc. Robust short-time fourier transform acoustic echo cancellation during audio playback
US10880644B1 (en) 2017-09-28 2020-12-29 Sonos, Inc. Three-dimensional beam forming with a microphone array
US11538451B2 (en) 2017-09-28 2022-12-27 Sonos, Inc. Multi-channel acoustic echo cancellation
US10891932B2 (en) 2017-09-28 2021-01-12 Sonos, Inc. Multi-channel acoustic echo cancellation
US11302326B2 (en) 2017-09-28 2022-04-12 Sonos, Inc. Tone interference cancellation
US11769505B2 (en) 2017-09-28 2023-09-26 Sonos, Inc. Echo of tone interferance cancellation using two acoustic echo cancellers
US10621981B2 (en) 2017-09-28 2020-04-14 Sonos, Inc. Tone interference cancellation
US11893308B2 (en) 2017-09-29 2024-02-06 Sonos, Inc. Media playback system with concurrent voice assistance
US11175888B2 (en) 2017-09-29 2021-11-16 Sonos, Inc. Media playback system with concurrent voice assistance
US10606555B1 (en) 2017-09-29 2020-03-31 Sonos, Inc. Media playback system with concurrent voice assistance
US11288039B2 (en) 2017-09-29 2022-03-29 Sonos, Inc. Media playback system with concurrent voice assistance
US20190129938A1 (en) * 2017-10-31 2019-05-02 Baidu Usa Llc System and method for performing tasks based on user inputs using natural language processing
US10747954B2 (en) * 2017-10-31 2020-08-18 Baidu Usa Llc System and method for performing tasks based on user inputs using natural language processing
US11302328B2 (en) * 2017-11-02 2022-04-12 Hisense Visual Technology Co., Ltd. Voice interactive device and method for controlling voice interactive device
US11451908B2 (en) 2017-12-10 2022-09-20 Sonos, Inc. Network microphone devices with automatic do not disturb actuation capabilities
US10880650B2 (en) 2017-12-10 2020-12-29 Sonos, Inc. Network microphone devices with automatic do not disturb actuation capabilities
US11676590B2 (en) 2017-12-11 2023-06-13 Sonos, Inc. Home graph
WO2019118367A1 (en) * 2017-12-12 2019-06-20 Amazon Technologies, Inc. Selective notification delivery based on user presence detections
US11545171B2 (en) 2017-12-20 2023-01-03 Hubbell Incorporated Voice responsive in-wall device
US11296695B2 (en) 2017-12-20 2022-04-05 Hubbell Incorporated Gesture control for in-wall device
US10938389B2 (en) 2017-12-20 2021-03-02 Hubbell Incorporated Gesture control for in-wall device
US10847174B2 (en) 2017-12-20 2020-11-24 Hubbell Incorporated Voice responsive in-wall device
USD927433S1 (en) 2018-01-05 2021-08-10 Hubbell Incorporated Front panel of in-wall fan controller with indicator component
US11343614B2 (en) 2018-01-31 2022-05-24 Sonos, Inc. Device designation of playback and network microphone device arrangements
US11689858B2 (en) 2018-01-31 2023-06-27 Sonos, Inc. Device designation of playback and network microphone device arrangements
US11935537B2 (en) 2018-02-13 2024-03-19 Roku, Inc. Trigger word detection with multiple digital assistants
EP3753015A4 (en) * 2018-02-13 2021-11-17 Roku, Inc. Trigger word detection with multiple digital assistants
US11664026B2 (en) 2018-02-13 2023-05-30 Roku, Inc. Trigger word detection with multiple digital assistants
EP3779667A4 (en) * 2018-04-09 2022-02-23 Maxell, Ltd. Speech recognition device, speech recognition device cooperation system, and speech recognition device cooperation method
US11810567B2 (en) 2018-04-09 2023-11-07 Maxell, Ltd. Speech recognition device, speech-recognition-device coordination system, and speech-recognition-device coordination method
CN111971647A (en) * 2018-04-09 2020-11-20 麦克赛尔株式会社 Speech recognition apparatus, cooperation system of speech recognition apparatus, and cooperation method of speech recognition apparatus
US11797263B2 (en) 2018-05-10 2023-10-24 Sonos, Inc. Systems and methods for voice-assisted media content selection
US10755717B2 (en) * 2018-05-10 2020-08-25 International Business Machines Corporation Providing reminders based on voice recognition
US11175880B2 (en) 2018-05-10 2021-11-16 Sonos, Inc. Systems and methods for voice-assisted media content selection
US20190348048A1 (en) * 2018-05-10 2019-11-14 International Business Machines Corporation Providing reminders based on voice recognition
US11715489B2 (en) 2018-05-18 2023-08-01 Sonos, Inc. Linear filtering for noise-suppressed speech detection
US10847178B2 (en) 2018-05-18 2020-11-24 Sonos, Inc. Linear filtering for noise-suppressed speech detection
US10959029B2 (en) 2018-05-25 2021-03-23 Sonos, Inc. Determining and adapting to changes in microphone performance of playback devices
US11792590B2 (en) 2018-05-25 2023-10-17 Sonos, Inc. Determining and adapting to changes in microphone performance of playback devices
US11604831B2 (en) * 2018-06-08 2023-03-14 Ntt Docomo, Inc. Interactive device
US11696074B2 (en) 2018-06-28 2023-07-04 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
US11197096B2 (en) 2018-06-28 2021-12-07 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
US20200034108A1 (en) * 2018-07-25 2020-01-30 Sensory, Incorporated Dynamic Volume Adjustment For Virtual Assistants
US10705789B2 (en) * 2018-07-25 2020-07-07 Sensory, Incorporated Dynamic volume adjustment for virtual assistants
US11455418B2 (en) 2018-08-07 2022-09-27 Google Llc Assembling and evaluating automated assistant responses for privacy concerns
US11087023B2 (en) 2018-08-07 2021-08-10 Google Llc Threshold-based assembly of automated assistant responses
US11790114B2 (en) 2018-08-07 2023-10-17 Google Llc Threshold-based assembly of automated assistant responses
US11314890B2 (en) 2018-08-07 2022-04-26 Google Llc Threshold-based assembly of remote automated assistant responses
US11822695B2 (en) 2018-08-07 2023-11-21 Google Llc Assembling and evaluating automated assistant responses for privacy concerns
US11563842B2 (en) 2018-08-28 2023-01-24 Sonos, Inc. Do not disturb feature for audio notifications
US11482978B2 (en) 2018-08-28 2022-10-25 Sonos, Inc. Audio notifications
US11076035B2 (en) 2018-08-28 2021-07-27 Sonos, Inc. Do not disturb feature for audio notifications
US11778259B2 (en) 2018-09-14 2023-10-03 Sonos, Inc. Networked devices, systems and methods for associating playback devices based on sound codes
US10878811B2 (en) 2018-09-14 2020-12-29 Sonos, Inc. Networked devices, systems, and methods for intelligently deactivating wake-word engines
US11432030B2 (en) 2018-09-14 2022-08-30 Sonos, Inc. Networked devices, systems, and methods for associating playback devices based on sound codes
US11551690B2 (en) 2018-09-14 2023-01-10 Sonos, Inc. Networked devices, systems, and methods for intelligently deactivating wake-word engines
US11024331B2 (en) 2018-09-21 2021-06-01 Sonos, Inc. Voice detection optimization using sound metadata
US11790937B2 (en) 2018-09-21 2023-10-17 Sonos, Inc. Voice detection optimization using sound metadata
US11031014B2 (en) * 2018-09-25 2021-06-08 Sonos, Inc. Voice detection optimization based on selected voice assistant service
US10573321B1 (en) * 2018-09-25 2020-02-25 Sonos, Inc. Voice detection optimization based on selected voice assistant service
US11727936B2 (en) 2018-09-25 2023-08-15 Sonos, Inc. Voice detection optimization based on selected voice assistant service
US10811015B2 (en) 2018-09-25 2020-10-20 Sonos, Inc. Voice detection optimization based on selected voice assistant service
US11790911B2 (en) 2018-09-28 2023-10-17 Sonos, Inc. Systems and methods for selective wake word detection using neural network models
US11100923B2 (en) 2018-09-28 2021-08-24 Sonos, Inc. Systems and methods for selective wake word detection using neural network models
US10692518B2 (en) 2018-09-29 2020-06-23 Sonos, Inc. Linear filtering for noise-suppressed speech detection via multiple network microphone devices
US11501795B2 (en) 2018-09-29 2022-11-15 Sonos, Inc. Linear filtering for noise-suppressed speech detection via multiple network microphone devices
US11899519B2 (en) 2018-10-23 2024-02-13 Sonos, Inc. Multiple stage network microphone device with reduced power consumption and processing load
US11474883B2 (en) * 2018-10-26 2022-10-18 International Business Machines Corporation Cognitive agent for persistent multi-platform reminder provision
US11226833B2 (en) * 2018-11-12 2022-01-18 International Business Machines Corporation Determination and initiation of a computing interface for computer-initiated task response
US11226835B2 (en) * 2018-11-12 2022-01-18 International Business Machines Corporation Determination and initiation of a computing interface for computer-initiated task response
US20200150982A1 (en) * 2018-11-12 2020-05-14 International Business Machines Corporation Determination and inititation of a computing interface for computer-initiated task response
US11741948B2 (en) 2018-11-15 2023-08-29 Sonos Vox France Sas Dilated convolutions and gating for efficient keyword spotting
US11200889B2 (en) 2018-11-15 2021-12-14 Sonos, Inc. Dilated convolutions and gating for efficient keyword spotting
US11423899B2 (en) * 2018-11-19 2022-08-23 Google Llc Controlling device output according to a determined condition of a user
US20220406307A1 (en) * 2018-11-19 2022-12-22 Google Llc Controlling device output according to a determined condition of a user
US11557294B2 (en) 2018-12-07 2023-01-17 Sonos, Inc. Systems and methods of operating media playback systems having multiple voice assistant services
US11183183B2 (en) 2018-12-07 2021-11-23 Sonos, Inc. Systems and methods of operating media playback systems having multiple voice assistant services
US11132989B2 (en) 2018-12-13 2021-09-28 Sonos, Inc. Networked microphone devices, systems, and methods of localized arbitration
US11538460B2 (en) 2018-12-13 2022-12-27 Sonos, Inc. Networked microphone devices, systems, and methods of localized arbitration
US11159880B2 (en) 2018-12-20 2021-10-26 Sonos, Inc. Optimization of network microphone devices using noise classification
US11540047B2 (en) 2018-12-20 2022-12-27 Sonos, Inc. Optimization of network microphone devices using noise classification
US11646023B2 (en) 2019-02-08 2023-05-09 Sonos, Inc. Devices, systems, and methods for distributed voice processing
US11315556B2 (en) 2019-02-08 2022-04-26 Sonos, Inc. Devices, systems, and methods for distributed voice processing by transmitting sound data associated with a wake word to an appropriate device for identification
US11264031B2 (en) 2019-03-06 2022-03-01 Samsung Electronics Co., Ltd. Method for processing plans having multiple end points and electronic device applying the same method
WO2020180008A1 (en) * 2019-03-06 2020-09-10 Samsung Electronics Co., Ltd. Method for processing plans having multiple end points and electronic device applying the same method
US11683565B2 (en) 2019-03-24 2023-06-20 Apple Inc. User interfaces for interacting with channels that provide content that plays in a media browsing application
US11132991B2 (en) * 2019-04-23 2021-09-28 Lg Electronics Inc. Method and apparatus for determining voice enable device
US11749277B2 (en) 2019-04-30 2023-09-05 Samsung Eletronics Co., Ltd. Home appliance and method for controlling thereof
US11551686B2 (en) * 2019-04-30 2023-01-10 Samsung Electronics Co., Ltd. Home appliance and method for controlling thereof
US20230368790A1 (en) * 2019-04-30 2023-11-16 Samsung Electronics Co., Ltd. Home appliance and method for controlling thereof
US11798553B2 (en) 2019-05-03 2023-10-24 Sonos, Inc. Voice assistant persistence across multiple network microphone devices
US11863837B2 (en) 2019-05-31 2024-01-02 Apple Inc. Notification of augmented reality content on an electronic device
US11797606B2 (en) 2019-05-31 2023-10-24 Apple Inc. User interfaces for a podcast browsing and playback application
US11200894B2 (en) 2019-06-12 2021-12-14 Sonos, Inc. Network microphone device with command keyword eventing
US10586540B1 (en) 2019-06-12 2020-03-10 Sonos, Inc. Network microphone device with command keyword conditioning
US11361756B2 (en) 2019-06-12 2022-06-14 Sonos, Inc. Conditional wake word eventing based on environment
US11501773B2 (en) 2019-06-12 2022-11-15 Sonos, Inc. Network microphone device with command keyword conditioning
US11854547B2 (en) 2019-06-12 2023-12-26 Sonos, Inc. Network microphone device with command keyword eventing
US11138975B2 (en) 2019-07-31 2021-10-05 Sonos, Inc. Locally distributed keyword detection
US11714600B2 (en) 2019-07-31 2023-08-01 Sonos, Inc. Noise classification for event detection
US11138969B2 (en) 2019-07-31 2021-10-05 Sonos, Inc. Locally distributed keyword detection
US11710487B2 (en) 2019-07-31 2023-07-25 Sonos, Inc. Locally distributed keyword detection
US11354092B2 (en) 2019-07-31 2022-06-07 Sonos, Inc. Noise classification for event detection
US10871943B1 (en) 2019-07-31 2020-12-22 Sonos, Inc. Noise classification for event detection
US11551669B2 (en) 2019-07-31 2023-01-10 Sonos, Inc. Locally distributed keyword detection
US11862161B2 (en) 2019-10-22 2024-01-02 Sonos, Inc. VAS toggle based on device orientation
US11189286B2 (en) 2019-10-22 2021-11-30 Sonos, Inc. VAS toggle based on device orientation
USD947137S1 (en) 2019-10-22 2022-03-29 Hubbell Incorporated Front panel of in-wall fan controller with indicator component
US11200900B2 (en) 2019-12-20 2021-12-14 Sonos, Inc. Offline voice control
US11869503B2 (en) 2019-12-20 2024-01-09 Sonos, Inc. Offline voice control
US11562740B2 (en) 2020-01-07 2023-01-24 Sonos, Inc. Voice verification for media playback
US11798545B2 (en) * 2020-01-08 2023-10-24 Beijing Xiaomi Pinecone Electronics Co., Ltd. Speech interaction method and apparatus, device and storage medium
US20210210088A1 (en) * 2020-01-08 2021-07-08 Beijing Xiaomi Pinecone Electronics Co., Ltd. Speech interaction method and apparatus, device and storage medium
US11556307B2 (en) 2020-01-31 2023-01-17 Sonos, Inc. Local voice data processing
US11308958B2 (en) 2020-02-07 2022-04-19 Sonos, Inc. Localized wakeword verification
US11962836B2 (en) 2020-03-24 2024-04-16 Apple Inc. User interfaces for a media browsing application
US11843838B2 (en) 2020-03-24 2023-12-12 Apple Inc. User interfaces for accessing episodes of a content series
US11482224B2 (en) 2020-05-20 2022-10-25 Sonos, Inc. Command keywords with input detection windowing
US11727919B2 (en) 2020-05-20 2023-08-15 Sonos, Inc. Memory allocation for keyword spotting engines
US11694689B2 (en) 2020-05-20 2023-07-04 Sonos, Inc. Input detection windowing
US11308962B2 (en) 2020-05-20 2022-04-19 Sonos, Inc. Input detection windowing
US20210375267A1 (en) * 2020-05-30 2021-12-02 Jio Platforms Limited Method and system for smart interaction in a multi voice capable device environment
US11899895B2 (en) 2020-06-21 2024-02-13 Apple Inc. User interfaces for setting up an electronic device
US11698771B2 (en) 2020-08-25 2023-07-11 Sonos, Inc. Vocal guidance engines for playback devices
US20220100464A1 (en) * 2020-09-28 2022-03-31 Samsung Electronics Co., Ltd. Methods and systems for execution of voice commands
US20220139413A1 (en) * 2020-10-30 2022-05-05 Samsung Electronics Co., Ltd. Electronic apparatus and method of controlling the same
US11258858B1 (en) 2020-11-24 2022-02-22 International Business Machines Corporation Multi-device connection management
US11720229B2 (en) 2020-12-07 2023-08-08 Apple Inc. User interfaces for browsing and presenting content
US11551700B2 (en) 2021-01-25 2023-01-10 Sonos, Inc. Systems and methods for power-efficient keyword detection
US11934640B2 (en) 2021-01-29 2024-03-19 Apple Inc. User interfaces for record labels
EP4057165A1 (en) * 2021-03-11 2022-09-14 Deutsche Telekom AG Voice assistance control
US11961519B2 (en) 2022-04-18 2024-04-16 Sonos, Inc. Localized wakeword verification
US20230359973A1 (en) * 2022-05-04 2023-11-09 Kyndryl, Inc. Ad-hoc application development
WO2024072994A1 (en) * 2022-09-30 2024-04-04 Google Llc Selecting a device to respond to device-agnostic user requests
US11961521B2 (en) 2023-03-23 2024-04-16 Roku, Inc. Media system with multiple digital assistants

Also Published As

Publication number Publication date
EP2932371B1 (en) 2018-06-13
CN105051676A (en) 2015-11-11
US20210165630A1 (en) 2021-06-03
WO2014092980A1 (en) 2014-06-19
EP2932371A4 (en) 2016-08-03
CN105051676B (en) 2018-04-24
JP2016502192A (en) 2016-01-21
US10778778B1 (en) 2020-09-15
US20230141659A1 (en) 2023-05-11
EP2932371A1 (en) 2015-10-21
US9271111B2 (en) 2016-02-23

Similar Documents

Publication Publication Date Title
US20230141659A1 (en) Response endpoint selection
EP3622510B1 (en) Intercom-style communication using multiple computing devices
US11942085B1 (en) Naming devices via voice commands
US11810562B2 (en) Reducing the need for manual start/end-pointing and trigger phrases
US10609331B1 (en) Location based device grouping with voice control
US10127906B1 (en) Naming devices via voice commands
KR102498811B1 (en) Dynamic and/or context specific hotwords to invoke automated assistants
US10185544B1 (en) Naming devices via voice commands
JP2017516167A (en) Perform actions related to an individual's presence
JP7164615B2 (en) Selecting content to render on the assistant device display
US20210264910A1 (en) User-driven content generation for virtual assistant

Legal Events

Date Code Title Description
AS Assignment

Owner name: RAWLES LLC, DELAWARE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BLANKSTEEN, SCOTT IAN;REEL/FRAME:029474/0936

Effective date: 20121214

AS Assignment

Owner name: AMAZON TECHNOLOGIES, INC., WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RAWLES LLC;REEL/FRAME:037103/0084

Effective date: 20151106

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8