WO2013176574A1 - Methods and systems for mapping pointing device on depth map - Google Patents

Methods and systems for mapping pointing device on depth map Download PDF

Info

Publication number
WO2013176574A1
WO2013176574A1 PCT/RU2013/000188 RU2013000188W WO2013176574A1 WO 2013176574 A1 WO2013176574 A1 WO 2013176574A1 RU 2013000188 W RU2013000188 W RU 2013000188W WO 2013176574 A1 WO2013176574 A1 WO 2013176574A1
Authority
WO
WIPO (PCT)
Prior art keywords
pointing device
user
handheld pointing
depth map
motion data
Prior art date
Application number
PCT/RU2013/000188
Other languages
French (fr)
Inventor
Pavel Anatolievich Zaitsev
Alexander Vyacheslavovich ARGUTIN
Andrey Vladimirovich VALIK
Dmitry Aleksandrovich MOROZOV
Original Assignee
3Divi Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/478,457 external-priority patent/US20130010207A1/en
Application filed by 3Divi Company filed Critical 3Divi Company
Publication of WO2013176574A1 publication Critical patent/WO2013176574A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/40Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment
    • A63F13/42Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle
    • A63F13/428Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle involving motion or position input signals, e.g. signals representing the rotation of an input controller or a player's arm motions sensed by accelerometers or gyroscopes
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/90Constructional details or arrangements of video game devices not provided for in groups A63F13/20 or A63F13/25, e.g. housing, wiring, connections or cabinets
    • A63F13/92Video game devices specially adapted to be hand-held while playing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/0346Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • A63F13/21Input arrangements for video game devices characterised by their sensors, purposes or types
    • A63F13/211Input arrangements for video game devices characterised by their sensors, purposes or types using inertial sensors, e.g. accelerometers or gyroscopes
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • A63F13/21Input arrangements for video game devices characterised by their sensors, purposes or types
    • A63F13/212Input arrangements for video game devices characterised by their sensors, purposes or types using sensors worn by the player, e.g. for measuring heart beat or leg activity
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/10Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
    • A63F2300/105Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals using inertial sensors, e.g. accelerometers, gyroscopes
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/10Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
    • A63F2300/1087Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
    • A63F2300/1093Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera using visible light
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6045Methods for processing data by generating or executing the game program for mapping control signals received from the input arrangement into game commands
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/66Methods for processing data by generating or executing the game program for rendering three dimensional images
    • A63F2300/6607Methods for processing data by generating or executing the game program for rendering three dimensional images for animating game characters, e.g. skeleton kinematics
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2370/00Aspects of data communication
    • G09G2370/10Use of a protocol of communication by packets in interfaces along the display data pipeline

Definitions

  • This disclosure relates generally to human-computer interfaces and, more particularly, to the technology for determining a location of a handheld pointing device, such as a remoter controller for a game console, on a depth map generated by a gesture recognition control system.
  • gesture recognition technology which enables the users to interact with the computer naturally, using body language, without any mechanical devices.
  • the users can make inputs or generate commands using gestures or motions made by hands, arms, fingers, legs, and so forth. For example, using the concept of gesture recognition, it is possible to point a finger at the computer screen so that the pointer will move accordingly.
  • gesture recognition control systems also known as motion sensing input systems
  • a depth sensing camera which captures scene images in real time
  • a computing unit which interprets captured scene images so as to generate various commands based on identification of user gestures.
  • the gesture recognition control systems have very limited computation resources and also small resolution of the depth sensing camera so it is difficult to identify and track motions of relatively small objects such as handheld pointing devices.
  • the pointing devices may play an important role for human- computer interaction, especially, for gaming software applications.
  • the pointing devices may refer to controller wands or remote control devices enabling the users to generate specific commands by pressing dedicated buttons arranged thereon or by making predetermined gestures.
  • the computer can be controlled via the gesture recognition technology (i.e., by processing data related to motion and location of the handheld pointing devices) and also receipt of specific commands originated by pressing dedicated buttons.
  • the gesture recognition control systems when enabled, monitor and track all gestures performed by users with the help of handheld pointing devices.
  • a high resolution depth sensing camera and immoderate computational resources are used to enable the gesture recognition control systems to identify and track a motion of a relatively small handheld pointing device.
  • the present day gesture recognition control systems lack sufficient accuracy or generate unwanted latency when there are tracked gestures performed by relatively small handheld pointing devices.
  • the handheld pointing devices may include specific auxiliary devices, such as a lighting sphere, to facilitate their identification and tracking. Either one of these approaches is disadvantageous and increases costs of the gesture recognition control systems.
  • the present disclosure refers to gesture recognition control systems configured to identify various user gestures and generate corresponding control commands. More specifically, the technology disclosed herein can determine and track a current location of a handheld pointing device based upon comparison of user gestures captured by a depth sensing camera and motion data of a handheld pointing device acquired by a communication module. The present technology allows determining a current position of a handheld pointing device on a depth map using typical computational resources and without a necessity to use dedicated auxiliary devices such as a lighting sphere.
  • the gesture recognition control system includes a depth sensing camera, which is used for generation of a depth map, and also a computing unit configured to process the depth map in real time to identify a user, user gestures, one or more user body parts, a user skeleton, motion data associated with user gestures, orientation data associated with user gestures, generate one or more commands associated with the identified gestures, and so forth.
  • the gesture recognition control system further includes a communication module which may receive motion data of a handheld pointing device and optionally orientation data of the handheld pointing device.
  • the gesture recognition control system assigns a current location (coordinates) of the user hand to the handheld pointing device so that its exact location is determined and can be further tracked.
  • the gesture recognition control system may be operatively coupled to or integrated with a computer, display, game console, and so forth. Accordingly, the determined and tracked location of the handheld pointing device may be used to control a display screen, game, or any other software application running on the computer.
  • the present disclosure discloses various methods for determining and tracking a current location of handheld pointing device in real time and also corresponding systems that can be used to implement these methods.
  • a simplified summary of one or more aspects regarding these methods in order to provide a basic understanding of such aspects as a prelude to the more detailed description that is presented later.
  • An example method may comprise: determining one or more motions of one or more user hands on a depth map, generating motion data associated with the one or more motions of the one or more user hands, acquiring motion data of a handheld pointing device, determining that the motion of the handheld pointing device is associated with the one or more motions of the one or more user hands, and determining a current position of the handheld pointing device on the depth map.
  • the method may further comprise generating the depth map by capturing a series of images.
  • the method may further comprise generating a virtual skeleton of the user.
  • the virtual skeleton may comprise at least one virtual limb of the user.
  • the method may further comprise determining coordinates of the one or more user hands, wherein the coordinates are associated with the virtual skeleton, and generating motion data of the one or more user hands.
  • the determination that the motion of the handheld pointing device is associated with the one or more motions of the one or more user hands can comprise comparing motion data of the one or more user hands and motion data of the handheld pointing device.
  • the method may further comprise determining which hand is holding the handheld pointing device.
  • the method may further comprise selectively assigning the coordinates of the hand holding the handheld pointing device to the handheld pointing device.
  • the method may further comprise determining an orientation of the handheld pointing device based upon the coordinates of various virtual skeleton joints related to the hand holding the handheld pointing device.
  • the method may further comprise generating a vector associated with the orientation of the handheld pointing device.
  • the handheld pointing device can be selected from a group comprising: a cellular phone, a smart phone, a remote controller, a video game console, a handheld game console, a computer, and a tablet computer.
  • the motion data associated with a motion of the handheld pointing device can comprise one or more of acceleration data, velocity data, and inertial data.
  • the method may further comprise acquiring orientation data of the handheld pointing device.
  • the orientation data can be generated by one or more orientations sensors of the handheld pointing device.
  • the orientation data may comprise one or more of the following: a pitch angle, a roll angle, and a yaw angle.
  • the method may further comprise determining that the handheld pointing device is in active use by the user.
  • the handheld pointing device is in active use by the user when the handheld pointing device is held and moved by the user and when the user is identified on the depth map.
  • the method may further comprise identifying the user on the depth map.
  • the method may further comprise tracking motions of the one or more user hands.
  • FIG. 1 shows an example system environment for providing a real time human-computer interface.
  • FIG. 2 is a general illustration of scene suitable for controlling an electronic device by recognition of gestures made by a user.
  • FIG. 3A shows a simplified view of an exemplary virtual skeleton associated with a user.
  • FIG. 3B shows a simplified view of an exemplary virtual skeleton associated with a user holding a handheld pointing device.
  • FIG. 4 shows an environment suitable for implementing methods for determining a position of a handheld pointing device.
  • FIG. 5 shows a simplified diagram of a handheld pointing device, according to an example embodiment.
  • FIG. 6 is a process flow diagram showing a method for determining a position of the handheld pointing device, according to an example embodiment.
  • FIG. 7 is a diagrammatic representation of an example machine in the form of a computer system within which a set of instructions for the machine to perform any one or more of the methodologies discussed herein is executed.
  • the techniques of the embodiments disclosed herein may be implemented using a variety of technologies.
  • the methods described herein may be implemented in software executing on a computer system or in hardware utilizing either a combination of microprocessors or other specially designed application-specific integrated circuits (ASICs), programmable logic devices, or various combinations thereof.
  • the methods described herein may be implemented by a series of computer-executable instructions residing on a storage medium such as a disk drive, or computer-readable medium.
  • the embodiments described herein relate to computer- implemented methods for determining and tracking a current location of a handheld pointing device.
  • one or more depth sensing cameras can be used to generate a depth map of a physical scene.
  • the depth map analysis and interpretation can be performed by a computing unit operatively coupled to or embedding the depth sensing camera.
  • Some examples of computing units may include a desktop computer, laptop computer, tablet computer, gaming console, audio system, video system, cellular phone, smart phone, personal digital assistant (PDA), set-top box (STB), television set, smart television system, or any other wired or wireless electronic device.
  • the computing unit may include or be operatively coupled to a communication unit which may communicate with various handheld pointing devices and, in particular, receive motion data of handheld pointing devices.
  • handheld pointing device refers to an input device or any other suitable remote controlling device which can be used for making an input.
  • Some examples of handheld pointing devices include a remote controller, cellular phone, smart phone, video game console, handheld game console, computer (e.g., a tablet computer), and so forth.
  • it may include various motion detectors, such as acceleration sensors, gyroscopes, or other detectors configured to measure velocity, momentum, and acceleration such as pitch, roll, and yaw (in other words, acceleration for X, Y, and Z movement in Cartesian axes), and/or orientation sensors to generate orientation data including pitch angles, roll angles, and yaw angles.
  • the handheld pointing device determines motion data, which includes velocities and/or acceleration levels, and transmits it to the computing unit over a wired or wireless network.
  • the computing unit interprets the depth map such that it may identify the user, generate a corresponding virtual skeleton of the user, which skeleton includes multiple "joints" and “bones,” and determine that the user made a gesture using his hands or arms.
  • the coordinates of every joint can be determined by the computing unit, and thus every user hand/arm motion can be tracked, and corresponding motion data can be generated, which may include a velocity, acceleration, orientation, and so forth.
  • the computing unit compares motion data associated with the user's hand/arm gesture and motion data (and optionally orientation data) associated with movement of the handheld pointing device. When both of these motion data coincide or correspond to each other, the computing unit determines that the handheld pointing device is held by a corresponding arm or hand of the user. Since coordinates of the user's arm/hand are known and tracked, the same coordinates are then assigned to the handheld pointing device. Therefore, the handheld pointing device can be tied to the virtual skeleton of the user so that the current location of the handheld pointing device can be determined and further monitored. In other words, the handheld pointing device is mapped on the depth map.
  • movements of the handheld pointing device may be further tracked in real time to identify particular user gestures causing the computing unit to generate corresponding control commands.
  • This approach can be used in various gaming and simulation/teaching software without a necessity to use immoderate computational resources, high resolution depth sensing cameras, or auxiliary devices (e.g., a lighting sphere) attached to the handheld pointing device to facilitate its identification on the depth map.
  • auxiliary devices e.g., a lighting sphere
  • FIG. 1 shows an example system environment 100 for providing a real time human-computer interface.
  • the system environment 100 includes a gesture recognition control system 110, a display device 120, and an entertainment system 130.
  • the gesture recognition control system 110 is configured to capture various user gestures and user inputs, interpret them, and generate corresponding control commands, which are further transmitted to the entertainment system 130. Once the entertainment system 130 receives commands generated by the gesture recognition control system 110, the entertainment system performs certain actions depending on which software application is running. For example, the user may control a pointer on the display screen by making certain gestures.
  • the entertainment system 130 may refer to any electronic device such as a computer (e.g., a laptop computer, desktop computer, tablet computer, workstation, server), game console, television (TV) set, TV adapter, STB, smart television system, audio system, video system, cellular phone, smart phone, PDA, and so forth.
  • a computer e.g., a laptop computer, desktop computer, tablet computer, workstation, server
  • game console television (TV) set
  • TV adapter TV adapter
  • STB smart television system
  • audio system audio system
  • video system cellular phone
  • smart phone cellular phone
  • PDA smart phone
  • FIG. 2 is a general illustration of a scene 200 suitable for controlling an electronic device by recognition of gestures made by a user.
  • this figure shows a user 210 interacting with the gesture recognition control system 110 with the help of a handheld pointing device 220.
  • the gesture recognition control system 110 may include a depth sensing camera, a computing unit, and a communication unit, which can be stand-alone devices or embedded within a single housing (as shown).
  • a depth sensing camera e.g., a depth sensing camera
  • computing unit e.g., a computing unit
  • communication unit e.g., a communication unit
  • the user and a corresponding environment, such as a living room are located, at least in part, within the field of view of the depth sensing camera.
  • the gesture recognition control system 110 may be configured to capture a depth map of the scene in real time and further process the depth map to identify the user, determine one or more user gestures, determine one or more user body parts, and generate corresponding control commands.
  • the gesture recognition control system 110 may also determine specific motion data associated with user gestures, wherein the motion data may include coordinates of the user's hands or arms, and velocity and acceleration of the user's hands/arms.
  • the gesture recognition control system 110 may generate a virtual skeleton of the user as shown in FIG. 3 and described below in greater details.
  • the handheld pointing device 220 may refer to a controller wand, remote control device (e.g., a gaming console remote controller), smart phone, cellular phone, PDA, tablet computer, or any other electronic device enabling the user 210 to generate specific commands by pressing dedicated buttons arranged thereon.
  • the handheld pointing device 220 is configured to determine its velocity, acceleration and/or orientation within the space with the help of embedded acceleration sensors, gyroscopes, or other motion sensors and/or orientation sensors.
  • the velocity, acceleration and/or orientation data can be transmitted to the gesture recognition control system 110 over a wireless or wire network.
  • a communication module which is configured to receive motion data (and optionally orientation data) associated with movements of handheld pointing device 220, may be embedded in the gesture recognition control system 110.
  • the gesture recognition control system 110 is also configured to determine the location of handheld pointing device 220 on the depth map by matching motion data associated with the gestures of one or more user's arms captured by the depth sensing camera and motion data (and optionally the orientation data) associated with movements of handheld pointing device 220 as received by the communication module. When the motions match each other, the gesture recognition control system 110 acknowledges that the handheld pointing device 220 is held in a particular hand of the user and then assigns coordinates of the user's hand to the handheld pointing device 220. In various embodiments, this technology can be used for determining that the handheld pointing device 220 is in "active use," which means that the handheld pointing device 220 is held by the user 210 who is located in the sensitive area of the depth sensing camera. FIG.
  • FIG. 3A shows a simplified view of an exemplary virtual skeleton 300 as can be generated by the gesture recognition control system 110 based upon the depth map.
  • the virtual skeleton 300 comprises a plurality of "bones" and “joints” 310 interconnecting the bones.
  • the bones and joints in combination, represent the user 210 in real time so that every motion of the user's limbs is represented by corresponding motions of the bones and joints.
  • each of the joints 310 may be associated with certain coordinates in a three-dimensional (3D) space defining its exact location.
  • any motion of the user's limbs such as an arm, may be interpreted by a plurality of coordinates or coordinate vectors related to the corresponding joint(s) 310.
  • motion data can be generated for every limb movement. This motion data may include exact coordinates per period of time, velocity, direction, acceleration, orientation, and so forth.
  • FIG. 3B shows a simplified view of exemplary virtual skeleton 300 associated with the user 210 holding the handheld pointing device 220.
  • the gesture recognition control system 110 determines that the user 210 holds and the handheld pointing device 220 and then determines the location (coordinates) of the handheld pointing device 220, a corresponding mark or label can be generated on the virtual skeleton 300.
  • the gesture recognition control system 110 can determine an orientation of the handheld pointing device 220 by analyzing the virtual skeleton 300 and/or by acquiring orientation data from the handheld pointing device 220.
  • the orientation of handheld pointing device 220 may be represented as a vector 320 as shown in FIG. 3B.
  • FIG. 4 shows an environment 400 suitable for implementing methods for determining a position of a handheld pointing device 220.
  • the gesture recognition control system 110 which may comprise at least one depth sensing camera 410 configured to capture a depth map.
  • depth map refers to an image or image channel that contains information relating to the distance of the surfaces of scene objects from a depth sensing camera.
  • the depth sensing camera 410 may include an infrared (IR) projector to generate modulated light, and also an IR camera to capture 3D images.
  • IR infrared
  • the gesture recognition control system 110 may optionally comprise a color video camera 420 to capture a series of 2D images in addition to 3D imagery created by the depth sensing camera 410.
  • the series of 2D images captured by the color video camera 420 may be used to facilitate identification of the user on the depth map and/or various gestures of the user.
  • the depth sensing camera 410 and the color video camera 420 can be either stand alone devices or be encased within a single housing.
  • the gesture recognition control system 110 may also comprise a computing unit 430 for processing depth data and generating control commands for one or more electronic devices 460 (e.g., the entertainment system 130).
  • the computing unit 430 is also configured to implement steps of methods for determining a position of the handheld pointing device 220 as described herein.
  • the gesture recognition control system 110 also includes a communication module 440 configured to communicate with the handheld pointing device 220 and one or more electronic devices 460. More specifically, the communication module 440 is configured to receive motion data and orientation data from the handheld pointing device 220 and transmit control commands to one or more electronic devices 460.
  • the gesture recognition control system 110 may also include a bus 450 interconnecting the depth sensing camera 410, color video camera 420, computing unit 430, and communication module 440.
  • the aforementioned one or more electronic devices 460 can refer, in general, to any electronic device configured to trigger one or more predefined actions upon receipt of a certain control command.
  • Some examples of electronic devices 460 include, but are not limited to, computers (e.g., laptop computers, tablet computers), displays, audio systems, video systems, gaming consoles, entertainment systems, lighting devices, cellular phones, smart phones, TVs, and so forth.
  • the communication between the communication module 440 and the handheld pointing device 220 and/or one or more electronic devices 460 can be performed via a network (not shown).
  • the network can be a wireless or wired network, or a combination thereof.
  • the network may include the Internet, local intranet, PAN (Personal Area Network), LAN (Local Area Network), WAN (Wide Area Network), MAN (Metropolitan Area Network), virtual private network (VPN), storage area network (SAN), frame relay connection, Advanced Intelligent Network (AIN) connection, synchronous optical network (SONET) connection, digital Tl, T3, El or E3 line, Digital Data Service (DDS) connection, DSL (Digital Subscriber Line) connection, Ethernet connection, ISDN (Integrated Services Digital Network) line, dial-up port such as a V.90, V.34 or V.34bis analog modem connection, cable modem, ATM (Asynchronous Transfer Mode) connection, or an FDDI (Fiber Distributed Data Interface) or CDDI (Copper Distributed Data Interface) connection.
  • PAN
  • communications may also include links to any of a variety of wireless networks including WAP (Wireless Application Protocol), GPRS (General Packet Radio Service), GSM (Global System for Mobile Communication), CDMA (Code Division Multiple Access) or TDMA (Time Division Multiple Access), cellular phone networks, Global Positioning System (GPS), CDPD (cellular digital packet data), RIM (Research in Motion, Limited) duplex paging network, Bluetooth radio, or an IEEE 802.11 -based radio frequency network.
  • WAP Wireless Application Protocol
  • GPRS General Packet Radio Service
  • GSM Global System for Mobile Communication
  • CDMA Code Division Multiple Access
  • TDMA Time Division Multiple Access
  • cellular phone networks Global Positioning System (GPS)
  • GPS Global Positioning System
  • CDPD cellular digital packet data
  • RIM Research in Motion, Limited
  • Bluetooth radio or an IEEE 802.11 -based radio frequency network.
  • the network can further include or interface with any one or more of the following: RS-232 serial connection, IEEE- 1394 (Firewire) connection, Fiber Channel connection, IrDA (infrared) port, SCSI (Small Computer Systems Interface) connection, USB (Universal Serial Bus) connection, or other wired or wireless, digital or analog interface or connection, mesh or Digi® networking.
  • RS-232 serial connection IEEE- 1394 (Firewire) connection, Fiber Channel connection, IrDA (infrared) port, SCSI (Small Computer Systems Interface) connection, USB (Universal Serial Bus) connection, or other wired or wireless, digital or analog interface or connection, mesh or Digi® networking.
  • FIG. 5 shows a simplified diagram of the handheld pointing device 220, according to an example embodiment.
  • the handheld pointing device 220 comprises one or more motion sensors 510, one or more orientation sensors 520 and also a communication module 530.
  • the handheld pointing device 220 may include additional modules (not shown), such as an input module, a computing module, a display, or any other modules, depending on the type of the handheld pointing device 220.
  • the motion sensors 510 and orientation sensors 520 may include gyroscopes, acceleration sensors, velocity sensors, and so forth.
  • the motion sensors 510 are configured to determine motion data which may include a velocity, momentum, and acceleration such as pitch, roll, and yaw (in other words, acceleration for X, Y, and Z movement in Cartesian axes) of the handheld pointing device 220.
  • the orientation sensors 520 may determine a relative orientation of the handheld pointing device 220.
  • the orientation sensors 520 may be configured to generate orientation data including one or more of the following: pitch angle, roll angle, and yaw angle related to the handheld pointing device 220.
  • motion data and optionally orientation data are then transmitted to the gesture recognition control system 110 with the help of communication module 520.
  • the motion data and orientation data can be transmitted via the network as described above.
  • FIG. 6 is a process flow diagram showing a method 600 for determining a position of the handheld pointing device 220, according to an example embodiment.
  • the method 600 may be performed by processing logic that may comprise hardware (e.g., dedicated logic, programmable logic, and microcode), software (such as software run on a general-purpose computer system or a dedicated machine), or a combination of both.
  • the processing logic resides at the gesture recognition control system 110.
  • the method 600 can be performed by the units/devices discussed above with reference to FIG. 4. Each of these units or devices can comprise processing logic. It will be appreciated by one of ordinary skill in the art that examples of the foregoing units/devices may be virtual, and instructions said to be executed by a unit/device may in fact be retrieved and executed by a processor. The foregoing units/devices may also include memory cards, servers, and/or computer discs. Although various modules may be configured to perform some or all of the various steps described herein, fewer or more units may be provided and still fall within the scope of example embodiments.
  • the method 600 may commence at operation 610, with the depth sensing camera 410 generating a depth map by capturing a plurality of depth values of the scene in real time.
  • the depth map can be analyzed by the computing unit 430 to identify the user 210 on the depth map.
  • the computing unit 430 segments the depth data of the user 210 so as to generate a virtual skeleton of the user 210.
  • the computing unit 430 determines coordinates of at least one user's hand (user's arm or user's limb).
  • the coordinates of the at least one user's hand can be associated with the virtual skeleton as discussed above.
  • the computing unit 430 determines a motion of the at least one user's hand by processing a plurality of depth maps.
  • the computing unit 430 generates motion data of the at least one user's hand.
  • the computing unit 430 acquires motion data and optionally orientation data of the handheld electronic device 220 via the communication module 440.
  • the computing unit 430 compares the motion data (and optionally orientation data) of handheld electronic device 220 as acquired at operation 670 and the motion data of the at least one user's hand as generated at operation 660. If the motion data of handheld electronic device 220 correspond (or match or are relatively similar) to the motion data of the user's hand, the computing unit 430 selectively assigns the coordinates of the user's hand to the handheld pointing device 220 at operation 690.
  • the location of handheld pointing device 220 is determined on the depth map. Further, the location of handheld pointing device 220 can be tracked in real time so that various gestures can be interpreted for generation of corresponding control commands for one or more electronic devices 460.
  • the described technology can be used for determining that the handheld pointing device 220 is in active use by the user 210.
  • active use means that the user 210 is identified on the depth map (see operation 620) or, in other words, is located within the viewing area of depth sensing camera 410 when the handheld pointing device 220 is moved.
  • the method 600 may further include operations (not shown) when the computing unit 430 generates a vector defining the current orientation of the handheld pointing device 220.
  • the orientation of handheld pointing device 220 may be represented as the vector 320 (see FIG. 3B).
  • the computing unit 430 generates the vector 320 by processing the orientation data of the handheld electronic device 220 as acquired at operation 670 by transforming the orientation data tied to an axis system of the handheld electronic device 220 to orientation data tied to an axis system of the gesture recognition control system 110.
  • the vector coordinates are calculated for the axis system associated with the gesture recognition control system 110 based upon the vector coordinates in the axis system of the handheld electronic device 220.
  • FIG. 7 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system 700, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
  • the machine operates as a standalone device, or can be connected (e.g., networked) to other machines.
  • the machine can operate in the capacity of a server, a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine can be a personal computer (PC), tablet PC, STB, PDA, cellular telephone, portable music player (e.g., a portable hard drive audio device, such as a Moving Picture Experts Group Audio Layer 3 (MP3) player), web appliance, network router, switch, bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • portable music player e.g., a portable hard drive audio device, such as a Moving Picture Experts Group Audio Layer 3 (MP3) player
  • web appliance e.g., a portable hard drive audio device, such as a Moving Picture Experts Group Audio Layer 3 (MP3) player
  • MP3 Moving Picture Experts Group Audio Layer 3
  • the example computer system 700 includes a processor or multiple processors 702 (e.g., a central processing unit (CPU), graphics processing unit (GPU), or both), main memory 704 and static memory 706, which communicate with each other via a bus 708.
  • the computer system 700 can further include a video display unit 710 (e.g., a liquid crystal display (LCD) or cathode ray tube (CRT)).
  • the computer system 700 also includes at least one input device 712, such as an alphanumeric input device (e.g., a keyboard), pointer control device (e.g., a mouse), microphone, digital camera, video camera, and so forth.
  • the computer system 700 also includes a disk drive unit 714, signal generation device 716 (e.g., a speaker), and network interface device 718.
  • the disk drive unit 714 includes a computer-readable medium 720 that stores one or more sets of instructions and data structures (e.g., instructions 722) embodying or utilized by any one or more of the methodologies or functions described herein.
  • the instructions 722 can also reside, completely or at least partially, within the main memory 704 and/or within the processors 702 during execution by the computer system 700.
  • the main memory 704 and the processors 702 also constitute machine-readable media.
  • the instructions 722 can further be transmitted or received over the network 724 via the network interface device 718 utilizing any one of a number of well-known transfer protocols (e.g., Hyper Text Transfer Protocol (HTTP), CAN, Serial, and Modbus).
  • HTTP Hyper Text Transfer Protocol
  • CAN Serial
  • Modbus any one of a number of well-known transfer protocols
  • While the computer-readable medium 720 is shown in an example embodiment to be a single medium, the term "computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
  • the term "computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine, and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions.
  • the term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media. Such media may also include, without limitation, hard disks, floppy disks, flash memory cards, digital video disks, random access memory (RAM), read only memory (ROM), and the like.
  • the example embodiments described herein may be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware.
  • the computer-executable instructions may be written in a computer programming language or may be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions may be executed on a variety of hardware platforms and for interfaces associated with a variety of operating systems.
  • computer software programs for implementing the present method may be written in any number of suitable programming languages such as, for example, C, C++, C#, Cobol, Eiffel, Haskell, Visual Basic, Java, JavaScript, Python, or other compilers, assemblers, interpreters, or other computer languages or platforms.

Abstract

Disclosed are methods for determining and tracking a current location of a handheld pointing device, such as a remote control for an entertainment system, on a depth map generated by a gesture recognition control system. The methods disclosed herein enable identifying a userэs hand gesture, and generating corresponding motion data. Further, the handheld pointing device may send motion, such as acceleration or velocity, and/or orientation data such as pitch, roll, and yaw angles. The motion data of user's hand gesture and motion data (orientation data) as received from the handheld pointing device are then compared, and if they correspond to each other, it is determined that the handheld pointing device is in active use by the user as it is held by a particular hand. Accordingly, a location of the handheld pointing device on the depth map can be determined.

Description

METHODS AND SYSTEMS FOR MAPPING POINTING DEVICE
ON DEPTH MAP
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is Continuation-in-Part of Russian Patent Application Serial No. 2011127116, filed on July 4, 2011 , which is incorporated herein by reference in its entirety for all purposes.
TECHNICAL FIELD
This disclosure relates generally to human-computer interfaces and, more particularly, to the technology for determining a location of a handheld pointing device, such as a remoter controller for a game console, on a depth map generated by a gesture recognition control system.
BACKGROUND
The approaches described in this section could be pursued, but are not necessarily approaches that have previously been conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Technologies associated with human-computer interaction have evolved over the last several decades. There are currently many various input devices and associated interfaces to enable computer users to control and provide data to their computers. Keyboards, pointing devices, joysticks, and touchscreens are just some examples of input devices that can be used to interact with various software products. One of the rapidly growing technologies in this field is the gesture recognition technology which enables the users to interact with the computer naturally, using body language, without any mechanical devices. In particular, the users can make inputs or generate commands using gestures or motions made by hands, arms, fingers, legs, and so forth. For example, using the concept of gesture recognition, it is possible to point a finger at the computer screen so that the pointer will move accordingly.
There currently exist various gesture recognition control systems (also known as motion sensing input systems) which, generally speaking, include a depth sensing camera, which captures scene images in real time, and a computing unit, which interprets captured scene images so as to generate various commands based on identification of user gestures. Typically, the gesture recognition control systems have very limited computation resources and also small resolution of the depth sensing camera so it is difficult to identify and track motions of relatively small objects such as handheld pointing devices.
Various handheld pointing devices may play an important role for human- computer interaction, especially, for gaming software applications. The pointing devices may refer to controller wands or remote control devices enabling the users to generate specific commands by pressing dedicated buttons arranged thereon or by making predetermined gestures. Accordingly, the computer can be controlled via the gesture recognition technology (i.e., by processing data related to motion and location of the handheld pointing devices) and also receipt of specific commands originated by pressing dedicated buttons.
Typically, the gesture recognition control systems, when enabled, monitor and track all gestures performed by users with the help of handheld pointing devices. However, to enable the gesture recognition control systems to identify and track a motion of a relatively small handheld pointing device, a high resolution depth sensing camera and immoderate computational resources are used. Moreover, the present day gesture recognition control systems lack sufficient accuracy or generate unwanted latency when there are tracked gestures performed by relatively small handheld pointing devices. Alternatively, the handheld pointing devices may include specific auxiliary devices, such as a lighting sphere, to facilitate their identification and tracking. Either one of these approaches is disadvantageous and increases costs of the gesture recognition control systems. In view of the foregoing, there is still a need for improvements of gesture recognition control systems that will enhance interaction effectiveness and reduce required computational resources.
SUMMARY
This summary is provided to introduce a selection of concepts in a simplified form that are further described in the Detailed Description below. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The present disclosure refers to gesture recognition control systems configured to identify various user gestures and generate corresponding control commands. More specifically, the technology disclosed herein can determine and track a current location of a handheld pointing device based upon comparison of user gestures captured by a depth sensing camera and motion data of a handheld pointing device acquired by a communication module. The present technology allows determining a current position of a handheld pointing device on a depth map using typical computational resources and without a necessity to use dedicated auxiliary devices such as a lighting sphere.
The gesture recognition control system includes a depth sensing camera, which is used for generation of a depth map, and also a computing unit configured to process the depth map in real time to identify a user, user gestures, one or more user body parts, a user skeleton, motion data associated with user gestures, orientation data associated with user gestures, generate one or more commands associated with the identified gestures, and so forth. The gesture recognition control system further includes a communication module which may receive motion data of a handheld pointing device and optionally orientation data of the handheld pointing device. Once motion data (and optionally the orientation data) of the handheld pointing device and motion data associated with a user gesture, such as a gesture of a user's arm, correspond to each other, the gesture recognition control system assigns a current location (coordinates) of the user hand to the handheld pointing device so that its exact location is determined and can be further tracked.
The gesture recognition control system may be operatively coupled to or integrated with a computer, display, game console, and so forth. Accordingly, the determined and tracked location of the handheld pointing device may be used to control a display screen, game, or any other software application running on the computer.
Thus, the present disclosure discloses various methods for determining and tracking a current location of handheld pointing device in real time and also corresponding systems that can be used to implement these methods. Below is provided a simplified summary of one or more aspects regarding these methods in order to provide a basic understanding of such aspects as a prelude to the more detailed description that is presented later.
According to an aspect, there is provided a method for determining a position of a handheld pointing device. An example method may comprise: determining one or more motions of one or more user hands on a depth map, generating motion data associated with the one or more motions of the one or more user hands, acquiring motion data of a handheld pointing device, determining that the motion of the handheld pointing device is associated with the one or more motions of the one or more user hands, and determining a current position of the handheld pointing device on the depth map.
According to various embodiments, the method may further comprise generating the depth map by capturing a series of images. The method may further comprise generating a virtual skeleton of the user. The virtual skeleton may comprise at least one virtual limb of the user. The method may further comprise determining coordinates of the one or more user hands, wherein the coordinates are associated with the virtual skeleton, and generating motion data of the one or more user hands. The determination that the motion of the handheld pointing device is associated with the one or more motions of the one or more user hands can comprise comparing motion data of the one or more user hands and motion data of the handheld pointing device.
According to further embodiments, the method may further comprise determining which hand is holding the handheld pointing device. The method may further comprise selectively assigning the coordinates of the hand holding the handheld pointing device to the handheld pointing device. The method may further comprise determining an orientation of the handheld pointing device based upon the coordinates of various virtual skeleton joints related to the hand holding the handheld pointing device.
According to further embodiments, the method may further comprise generating a vector associated with the orientation of the handheld pointing device. The handheld pointing device can be selected from a group comprising: a cellular phone, a smart phone, a remote controller, a video game console, a handheld game console, a computer, and a tablet computer. The motion data associated with a motion of the handheld pointing device can comprise one or more of acceleration data, velocity data, and inertial data.
According to further embodiments, the method may further comprise acquiring orientation data of the handheld pointing device. The orientation data can be generated by one or more orientations sensors of the handheld pointing device. The orientation data may comprise one or more of the following: a pitch angle, a roll angle, and a yaw angle.
According to further embodiments, the method may further comprise determining that the handheld pointing device is in active use by the user. The handheld pointing device is in active use by the user when the handheld pointing device is held and moved by the user and when the user is identified on the depth map. According to yet another embodiment, the method may further comprise identifying the user on the depth map. The method may further comprise tracking motions of the one or more user hands.
In further examples, the above methods steps are stored on a nontransitory machine-readable medium comprising instructions, which perform the steps when implemented by one or more processors. In yet further examples, subsystems or devices can be adapted to perform the recited steps. Other features, examples, and embodiments are described below. BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments are illustrated by way of example, and not by limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
FIG. 1 shows an example system environment for providing a real time human-computer interface.
FIG. 2 is a general illustration of scene suitable for controlling an electronic device by recognition of gestures made by a user.
FIG. 3A shows a simplified view of an exemplary virtual skeleton associated with a user.
FIG. 3B shows a simplified view of an exemplary virtual skeleton associated with a user holding a handheld pointing device.
FIG. 4 shows an environment suitable for implementing methods for determining a position of a handheld pointing device.
FIG. 5 shows a simplified diagram of a handheld pointing device, according to an example embodiment.
FIG. 6 is a process flow diagram showing a method for determining a position of the handheld pointing device, according to an example embodiment.
FIG. 7 is a diagrammatic representation of an example machine in the form of a computer system within which a set of instructions for the machine to perform any one or more of the methodologies discussed herein is executed.
DETAILED DESCRIPTION
The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These example embodiments, which are also referred to herein as "examples," are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other embodiments can be utilized, or structural, logical, and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is therefore not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents. In this document, the terms "a" and "an" are used, as is common in patent documents, to include one or more than one. In this document, the term "or" is used to refer to a nonexclusive "or," such that "A or B" includes "A but not B," "B but not A," and "A and B," unless otherwise indicated.
The techniques of the embodiments disclosed herein may be implemented using a variety of technologies. For example, the methods described herein may be implemented in software executing on a computer system or in hardware utilizing either a combination of microprocessors or other specially designed application-specific integrated circuits (ASICs), programmable logic devices, or various combinations thereof. In particular, the methods described herein may be implemented by a series of computer-executable instructions residing on a storage medium such as a disk drive, or computer-readable medium.
The embodiments described herein relate to computer- implemented methods for determining and tracking a current location of a handheld pointing device.
In general, one or more depth sensing cameras (and, optionally, video cameras) can be used to generate a depth map of a physical scene. The depth map analysis and interpretation can be performed by a computing unit operatively coupled to or embedding the depth sensing camera. Some examples of computing units may include a desktop computer, laptop computer, tablet computer, gaming console, audio system, video system, cellular phone, smart phone, personal digital assistant (PDA), set-top box (STB), television set, smart television system, or any other wired or wireless electronic device. The computing unit may include or be operatively coupled to a communication unit which may communicate with various handheld pointing devices and, in particular, receive motion data of handheld pointing devices.
The term "handheld pointing device," as used herein, refers to an input device or any other suitable remote controlling device which can be used for making an input. Some examples of handheld pointing devices include a remote controller, cellular phone, smart phone, video game console, handheld game console, computer (e.g., a tablet computer), and so forth. Regardless of what type of handheld pointing device is used, it may include various motion detectors, such as acceleration sensors, gyroscopes, or other detectors configured to measure velocity, momentum, and acceleration such as pitch, roll, and yaw (in other words, acceleration for X, Y, and Z movement in Cartesian axes), and/or orientation sensors to generate orientation data including pitch angles, roll angles, and yaw angles. In operation, the handheld pointing device determines motion data, which includes velocities and/or acceleration levels, and transmits it to the computing unit over a wired or wireless network.
The computing unit, in turn, interprets the depth map such that it may identify the user, generate a corresponding virtual skeleton of the user, which skeleton includes multiple "joints" and "bones," and determine that the user made a gesture using his hands or arms. The coordinates of every joint can be determined by the computing unit, and thus every user hand/arm motion can be tracked, and corresponding motion data can be generated, which may include a velocity, acceleration, orientation, and so forth.
Further, the computing unit compares motion data associated with the user's hand/arm gesture and motion data (and optionally orientation data) associated with movement of the handheld pointing device. When both of these motion data coincide or correspond to each other, the computing unit determines that the handheld pointing device is held by a corresponding arm or hand of the user. Since coordinates of the user's arm/hand are known and tracked, the same coordinates are then assigned to the handheld pointing device. Therefore, the handheld pointing device can be tied to the virtual skeleton of the user so that the current location of the handheld pointing device can be determined and further monitored. In other words, the handheld pointing device is mapped on the depth map.
Once the handheld pointing device is tied to the user, movements of the handheld pointing device may be further tracked in real time to identify particular user gestures causing the computing unit to generate corresponding control commands. This approach can be used in various gaming and simulation/teaching software without a necessity to use immoderate computational resources, high resolution depth sensing cameras, or auxiliary devices (e.g., a lighting sphere) attached to the handheld pointing device to facilitate its identification on the depth map. The technology described herein provides an easy and effective method for locating the handheld pointing device on the scene and tracking its motions.
Provided below is a detailed description of various embodiments related to methods and systems for determining a position of a handheld pointing device.
With reference now to the drawings, FIG. 1 shows an example system environment 100 for providing a real time human-computer interface. The system environment 100 includes a gesture recognition control system 110, a display device 120, and an entertainment system 130.
The gesture recognition control system 110 is configured to capture various user gestures and user inputs, interpret them, and generate corresponding control commands, which are further transmitted to the entertainment system 130. Once the entertainment system 130 receives commands generated by the gesture recognition control system 110, the entertainment system performs certain actions depending on which software application is running. For example, the user may control a pointer on the display screen by making certain gestures.
The entertainment system 130 may refer to any electronic device such as a computer (e.g., a laptop computer, desktop computer, tablet computer, workstation, server), game console, television (TV) set, TV adapter, STB, smart television system, audio system, video system, cellular phone, smart phone, PDA, and so forth. Although the figure shows that the gesture recognition control system 110 and the entertainment system 130 are separate and stand-alone devices, in some alternative embodiments, these systems can be integrated within a single device.
FIG. 2 is a general illustration of a scene 200 suitable for controlling an electronic device by recognition of gestures made by a user. In particular, this figure shows a user 210 interacting with the gesture recognition control system 110 with the help of a handheld pointing device 220.
The gesture recognition control system 110 may include a depth sensing camera, a computing unit, and a communication unit, which can be stand-alone devices or embedded within a single housing (as shown). Generally speaking, the user and a corresponding environment, such as a living room, are located, at least in part, within the field of view of the depth sensing camera.
More specifically, the gesture recognition control system 110 may be configured to capture a depth map of the scene in real time and further process the depth map to identify the user, determine one or more user gestures, determine one or more user body parts, and generate corresponding control commands. The gesture recognition control system 110 may also determine specific motion data associated with user gestures, wherein the motion data may include coordinates of the user's hands or arms, and velocity and acceleration of the user's hands/arms. For this purpose, the gesture recognition control system 110 may generate a virtual skeleton of the user as shown in FIG. 3 and described below in greater details.
The handheld pointing device 220 may refer to a controller wand, remote control device (e.g., a gaming console remote controller), smart phone, cellular phone, PDA, tablet computer, or any other electronic device enabling the user 210 to generate specific commands by pressing dedicated buttons arranged thereon. The handheld pointing device 220 is configured to determine its velocity, acceleration and/or orientation within the space with the help of embedded acceleration sensors, gyroscopes, or other motion sensors and/or orientation sensors. The velocity, acceleration and/or orientation data can be transmitted to the gesture recognition control system 110 over a wireless or wire network. Accordingly, a communication module, which is configured to receive motion data (and optionally orientation data) associated with movements of handheld pointing device 220, may be embedded in the gesture recognition control system 110.
The gesture recognition control system 110 is also configured to determine the location of handheld pointing device 220 on the depth map by matching motion data associated with the gestures of one or more user's arms captured by the depth sensing camera and motion data (and optionally the orientation data) associated with movements of handheld pointing device 220 as received by the communication module. When the motions match each other, the gesture recognition control system 110 acknowledges that the handheld pointing device 220 is held in a particular hand of the user and then assigns coordinates of the user's hand to the handheld pointing device 220. In various embodiments, this technology can be used for determining that the handheld pointing device 220 is in "active use," which means that the handheld pointing device 220 is held by the user 210 who is located in the sensitive area of the depth sensing camera. FIG. 3A shows a simplified view of an exemplary virtual skeleton 300 as can be generated by the gesture recognition control system 110 based upon the depth map. As shown in the figure, the virtual skeleton 300 comprises a plurality of "bones" and "joints" 310 interconnecting the bones. The bones and joints, in combination, represent the user 210 in real time so that every motion of the user's limbs is represented by corresponding motions of the bones and joints.
According to various embodiments, each of the joints 310 may be associated with certain coordinates in a three-dimensional (3D) space defining its exact location. Hence, any motion of the user's limbs, such as an arm, may be interpreted by a plurality of coordinates or coordinate vectors related to the corresponding joint(s) 310. By tracking user motions via the virtual skeleton model, motion data can be generated for every limb movement. This motion data may include exact coordinates per period of time, velocity, direction, acceleration, orientation, and so forth.
FIG. 3B shows a simplified view of exemplary virtual skeleton 300 associated with the user 210 holding the handheld pointing device 220. In particular, when the gesture recognition control system 110 determines that the user 210 holds and the handheld pointing device 220 and then determines the location (coordinates) of the handheld pointing device 220, a corresponding mark or label can be generated on the virtual skeleton 300.
According to various embodiments, the gesture recognition control system 110 can determine an orientation of the handheld pointing device 220 by analyzing the virtual skeleton 300 and/or by acquiring orientation data from the handheld pointing device 220. In this case, the orientation of handheld pointing device 220 may be represented as a vector 320 as shown in FIG. 3B.
FIG. 4 shows an environment 400 suitable for implementing methods for determining a position of a handheld pointing device 220. As shown in this figure, there is provided the gesture recognition control system 110, which may comprise at least one depth sensing camera 410 configured to capture a depth map. The term "depth map," as used herein, refers to an image or image channel that contains information relating to the distance of the surfaces of scene objects from a depth sensing camera. In various embodiments, the depth sensing camera 410 may include an infrared (IR) projector to generate modulated light, and also an IR camera to capture 3D images. In yet more example embodiments, the gesture recognition control system 110 may optionally comprise a color video camera 420 to capture a series of 2D images in addition to 3D imagery created by the depth sensing camera 410. The series of 2D images captured by the color video camera 420 may be used to facilitate identification of the user on the depth map and/or various gestures of the user. It should be also noted that the depth sensing camera 410 and the color video camera 420 can be either stand alone devices or be encased within a single housing.
Furthermore, the gesture recognition control system 110 may also comprise a computing unit 430 for processing depth data and generating control commands for one or more electronic devices 460 (e.g., the entertainment system 130). The computing unit 430 is also configured to implement steps of methods for determining a position of the handheld pointing device 220 as described herein.
The gesture recognition control system 110 also includes a communication module 440 configured to communicate with the handheld pointing device 220 and one or more electronic devices 460. More specifically, the communication module 440 is configured to receive motion data and orientation data from the handheld pointing device 220 and transmit control commands to one or more electronic devices 460.
The gesture recognition control system 110 may also include a bus 450 interconnecting the depth sensing camera 410, color video camera 420, computing unit 430, and communication module 440.
The aforementioned one or more electronic devices 460 can refer, in general, to any electronic device configured to trigger one or more predefined actions upon receipt of a certain control command. Some examples of electronic devices 460 include, but are not limited to, computers (e.g., laptop computers, tablet computers), displays, audio systems, video systems, gaming consoles, entertainment systems, lighting devices, cellular phones, smart phones, TVs, and so forth.
The communication between the communication module 440 and the handheld pointing device 220 and/or one or more electronic devices 460 can be performed via a network (not shown). The network can be a wireless or wired network, or a combination thereof. For example, the network may include the Internet, local intranet, PAN (Personal Area Network), LAN (Local Area Network), WAN (Wide Area Network), MAN (Metropolitan Area Network), virtual private network (VPN), storage area network (SAN), frame relay connection, Advanced Intelligent Network (AIN) connection, synchronous optical network (SONET) connection, digital Tl, T3, El or E3 line, Digital Data Service (DDS) connection, DSL (Digital Subscriber Line) connection, Ethernet connection, ISDN (Integrated Services Digital Network) line, dial-up port such as a V.90, V.34 or V.34bis analog modem connection, cable modem, ATM (Asynchronous Transfer Mode) connection, or an FDDI (Fiber Distributed Data Interface) or CDDI (Copper Distributed Data Interface) connection. Furthermore, communications may also include links to any of a variety of wireless networks including WAP (Wireless Application Protocol), GPRS (General Packet Radio Service), GSM (Global System for Mobile Communication), CDMA (Code Division Multiple Access) or TDMA (Time Division Multiple Access), cellular phone networks, Global Positioning System (GPS), CDPD (cellular digital packet data), RIM (Research in Motion, Limited) duplex paging network, Bluetooth radio, or an IEEE 802.11 -based radio frequency network. The network can further include or interface with any one or more of the following: RS-232 serial connection, IEEE- 1394 (Firewire) connection, Fiber Channel connection, IrDA (infrared) port, SCSI (Small Computer Systems Interface) connection, USB (Universal Serial Bus) connection, or other wired or wireless, digital or analog interface or connection, mesh or Digi® networking.
FIG. 5 shows a simplified diagram of the handheld pointing device 220, according to an example embodiment. As shown in the figure, the handheld pointing device 220 comprises one or more motion sensors 510, one or more orientation sensors 520 and also a communication module 530. In various alternative embodiments, the handheld pointing device 220 may include additional modules (not shown), such as an input module, a computing module, a display, or any other modules, depending on the type of the handheld pointing device 220.
The motion sensors 510 and orientation sensors 520 may include gyroscopes, acceleration sensors, velocity sensors, and so forth. In general, the motion sensors 510 are configured to determine motion data which may include a velocity, momentum, and acceleration such as pitch, roll, and yaw (in other words, acceleration for X, Y, and Z movement in Cartesian axes) of the handheld pointing device 220. The orientation sensors 520 may determine a relative orientation of the handheld pointing device 220. In an example, the orientation sensors 520 may be configured to generate orientation data including one or more of the following: pitch angle, roll angle, and yaw angle related to the handheld pointing device 220. In operation, motion data and optionally orientation data are then transmitted to the gesture recognition control system 110 with the help of communication module 520. The motion data and orientation data can be transmitted via the network as described above.
FIG. 6 is a process flow diagram showing a method 600 for determining a position of the handheld pointing device 220, according to an example embodiment. The method 600 may be performed by processing logic that may comprise hardware (e.g., dedicated logic, programmable logic, and microcode), software (such as software run on a general-purpose computer system or a dedicated machine), or a combination of both. In one example embodiment, the processing logic resides at the gesture recognition control system 110.
The method 600 can be performed by the units/devices discussed above with reference to FIG. 4. Each of these units or devices can comprise processing logic. It will be appreciated by one of ordinary skill in the art that examples of the foregoing units/devices may be virtual, and instructions said to be executed by a unit/device may in fact be retrieved and executed by a processor. The foregoing units/devices may also include memory cards, servers, and/or computer discs. Although various modules may be configured to perform some or all of the various steps described herein, fewer or more units may be provided and still fall within the scope of example embodiments.
As shown in FIG. 6, the method 600 may commence at operation 610, with the depth sensing camera 410 generating a depth map by capturing a plurality of depth values of the scene in real time.
At operation 620, the depth map can be analyzed by the computing unit 430 to identify the user 210 on the depth map. At operation 630, the computing unit 430 segments the depth data of the user 210 so as to generate a virtual skeleton of the user 210.
At operation 640, the computing unit 430 determines coordinates of at least one user's hand (user's arm or user's limb). The coordinates of the at least one user's hand can be associated with the virtual skeleton as discussed above.
At operation 650, the computing unit 430 determines a motion of the at least one user's hand by processing a plurality of depth maps. At operation 660, the computing unit 430 generates motion data of the at least one user's hand. At operation 670, the computing unit 430 acquires motion data and optionally orientation data of the handheld electronic device 220 via the communication module 440.
At operation 680, the computing unit 430 compares the motion data (and optionally orientation data) of handheld electronic device 220 as acquired at operation 670 and the motion data of the at least one user's hand as generated at operation 660. If the motion data of handheld electronic device 220 correspond (or match or are relatively similar) to the motion data of the user's hand, the computing unit 430 selectively assigns the coordinates of the user's hand to the handheld pointing device 220 at operation 690. Thus, the location of handheld pointing device 220 is determined on the depth map. Further, the location of handheld pointing device 220 can be tracked in real time so that various gestures can be interpreted for generation of corresponding control commands for one or more electronic devices 460.
In various embodiments, the described technology can be used for determining that the handheld pointing device 220 is in active use by the user 210. As mentioned earlier, the term "active use" means that the user 210 is identified on the depth map (see operation 620) or, in other words, is located within the viewing area of depth sensing camera 410 when the handheld pointing device 220 is moved.
In addition, the method 600 may further include operations (not shown) when the computing unit 430 generates a vector defining the current orientation of the handheld pointing device 220. The orientation of handheld pointing device 220 may be represented as the vector 320 (see FIG. 3B). The computing unit 430 generates the vector 320 by processing the orientation data of the handheld electronic device 220 as acquired at operation 670 by transforming the orientation data tied to an axis system of the handheld electronic device 220 to orientation data tied to an axis system of the gesture recognition control system 110. In other words, the vector coordinates are calculated for the axis system associated with the gesture recognition control system 110 based upon the vector coordinates in the axis system of the handheld electronic device 220.
FIG. 7 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system 700, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. In example embodiments, the machine operates as a standalone device, or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate in the capacity of a server, a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a personal computer (PC), tablet PC, STB, PDA, cellular telephone, portable music player (e.g., a portable hard drive audio device, such as a Moving Picture Experts Group Audio Layer 3 (MP3) player), web appliance, network router, switch, bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that separately or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The example computer system 700 includes a processor or multiple processors 702 (e.g., a central processing unit (CPU), graphics processing unit (GPU), or both), main memory 704 and static memory 706, which communicate with each other via a bus 708. The computer system 700 can further include a video display unit 710 (e.g., a liquid crystal display (LCD) or cathode ray tube (CRT)). The computer system 700 also includes at least one input device 712, such as an alphanumeric input device (e.g., a keyboard), pointer control device (e.g., a mouse), microphone, digital camera, video camera, and so forth. The computer system 700 also includes a disk drive unit 714, signal generation device 716 (e.g., a speaker), and network interface device 718.
The disk drive unit 714 includes a computer-readable medium 720 that stores one or more sets of instructions and data structures (e.g., instructions 722) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 722 can also reside, completely or at least partially, within the main memory 704 and/or within the processors 702 during execution by the computer system 700. The main memory 704 and the processors 702 also constitute machine-readable media.
The instructions 722 can further be transmitted or received over the network 724 via the network interface device 718 utilizing any one of a number of well-known transfer protocols (e.g., Hyper Text Transfer Protocol (HTTP), CAN, Serial, and Modbus).
While the computer-readable medium 720 is shown in an example embodiment to be a single medium, the term "computer-readable medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "computer-readable medium" shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine, and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term "computer-readable medium" shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media. Such media may also include, without limitation, hard disks, floppy disks, flash memory cards, digital video disks, random access memory (RAM), read only memory (ROM), and the like.
The example embodiments described herein may be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions may be written in a computer programming language or may be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions may be executed on a variety of hardware platforms and for interfaces associated with a variety of operating systems. Although not limited thereto, computer software programs for implementing the present method may be written in any number of suitable programming languages such as, for example, C, C++, C#, Cobol, Eiffel, Haskell, Visual Basic, Java, JavaScript, Python, or other compilers, assemblers, interpreters, or other computer languages or platforms.
Thus, methods and systems for determining a position of a handheld pointing device have been described. Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

CLAIMS What is claimed is:
1. A computer-implemented method for determining a position of a handheld pointing device, the method comprising:
determining one or more motions of one or more user hands on a depth map; generating motion data associated with the one or more motions of the one or more user hands;
acquiring motion data of a handheld pointing device;
determining that the motion of the handheld pointing device is associated with the one or more motions of the one or more user hands; and
determining a current position of the handheld pointing device on the depth map.
2. The method of claim 1, further comprising generating the depth map by capturing a series of images.
3. The method of claim 1, further comprising generating a virtual skeleton of the user, the virtual skeleton comprising at least one virtual limb of the user.
4. The method of claim 3, further comprising:
determining coordinates of the one or more user hands, the coordinates to be associated with the virtual skeleton; and
generating motion data of the one or more user hands.
5. The method of claim 4, wherein determining that the motion of the handheld pointing device is associated with the one or more motions of the one or more user hands comprises comparing motion data of the one or more user hands and motion data of the handheld pointing device.
6. The method of claim 5, further comprising determining which hand is holding the pointing device.
7. The method of claim 6, further comprising selectively assigning the coordinates of the hand holding the handheld pointing device to the handheld pointing device.
8. The method of claim 7, further comprising determining an orientation of the handheld pointing device based upon the coordinates of various virtual skeleton joints related to the hand holding the handheld pointing device.
9. The method of claim 8, further comprising generating a vector associated with the orientation of the handheld pointing device.
10. The method of claim 1, further comprising acquiring orientation data of the handheld pointing device, wherein the orientation data comprising one or more of the following: a pitch angle, a roll angle, and a yaw angle.
11. The method of claim 1, wherein the motion data associated with a motion of the handheld pointing device comprises one or more of acceleration data, velocity data, and inertial data.
12. The method of claim 1, further comprising determining that the handheld pointing device is in active use by the user.
13. The method of claim 12, wherein the handheld pointing device is in active use by the user when the handheld pointing device is held and moved by the user and when the user is identified on the depth map.
14. The method of claim 1, further comprising identifying the user on the depth map.
15. The method of claim 1, further comprising tracking motions of the user hand.
16. A system for determining a position of a handheld pointing device, the system comprising:
a depth sensing device configured to generate a depth map;
a communication module configured to acquire motion data of a handheld pointing device; and
a computing unit communicatively coupled to the depth sensing device, the computing unit configured to:
determine one or more motions of one or more user hands on a depth map;
generate motion data associated with the one or more motions of the one or more user hands;
determine that the motion of the handheld pointing device is associated with the one or more motions of the one or more user hands; and
determine a current position of the handheld pointing device on the depth map.
17. The system of claim 16, further comprising a video camera communicatively coupled to the computing unit, the video camera being configured to facilitate generation of the depth map.
18. The system of claim 16, wherein the computing unit is further configured to generate a virtual skeleton of the user, the virtual skeleton comprising at least one virtual hand of the user.
19. The system of claim 16, wherein the computing unit is further configured to: determine coordinates of the one or more user hands, the coordinates to be associated with the virtual skeleton;
generate motion data of the one or more user hands; and
compare motion data of the one or more user hands and motion data of the handheld pointing device.
20. A processor-readable nontransitory medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to:
determine one or more motions of one or more user hands on a depth map;
generate motion data associated with the one or more motions of the one or more user hands;
acquire motion data of a handheld pointing device;
determine that the motion of the handheld pointing device is associated with the one or more motions of the one or more user hands; and
determine a current position of the handheld pointing device on the depth map.
PCT/RU2013/000188 2012-05-23 2013-03-12 Methods and systems for mapping pointing device on depth map WO2013176574A1 (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US13/478,457 US20130010207A1 (en) 2011-07-04 2012-05-23 Gesture based interactive control of electronic equipment
US13/478,378 2012-05-23
US13/478,457 2012-05-23
US13/478,378 US8823642B2 (en) 2011-07-04 2012-05-23 Methods and systems for controlling devices using gestures and related 3D sensor
US13/541,681 2012-07-04
US13/541,684 2012-07-04
US13/541,681 US8896522B2 (en) 2011-07-04 2012-07-04 User-centric three-dimensional interactive control environment
US13/541,684 US20130010071A1 (en) 2011-07-04 2012-07-04 Methods and systems for mapping pointing device on depth map

Publications (1)

Publication Number Publication Date
WO2013176574A1 true WO2013176574A1 (en) 2013-11-28

Family

ID=49624798

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2013/000188 WO2013176574A1 (en) 2012-05-23 2013-03-12 Methods and systems for mapping pointing device on depth map

Country Status (1)

Country Link
WO (1) WO2013176574A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486740A (en) * 2021-06-23 2021-10-08 深圳市加糖电子科技有限公司 Music switching system based on hand type recognition function

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060187196A1 (en) * 2005-02-08 2006-08-24 Underkoffler John S System and method for gesture based control system
US20070060336A1 (en) * 2003-09-15 2007-03-15 Sony Computer Entertainment Inc. Methods and systems for enabling depth and direction detection when interfacing with a computer program
US20080070684A1 (en) * 2006-09-14 2008-03-20 Mark Haigh-Hutchinson Method and apparatus for using a common pointing input to control 3D viewpoint and object targeting
US20100290698A1 (en) * 2007-06-19 2010-11-18 Prime Sense Ltd Distance-Varying Illumination and Imaging Techniques for Depth Mapping
US20110081969A1 (en) * 2005-08-22 2011-04-07 Akio Ikeda Video game system with wireless modular handheld controller

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070060336A1 (en) * 2003-09-15 2007-03-15 Sony Computer Entertainment Inc. Methods and systems for enabling depth and direction detection when interfacing with a computer program
US20060187196A1 (en) * 2005-02-08 2006-08-24 Underkoffler John S System and method for gesture based control system
US20110081969A1 (en) * 2005-08-22 2011-04-07 Akio Ikeda Video game system with wireless modular handheld controller
US20080070684A1 (en) * 2006-09-14 2008-03-20 Mark Haigh-Hutchinson Method and apparatus for using a common pointing input to control 3D viewpoint and object targeting
US20100290698A1 (en) * 2007-06-19 2010-11-18 Prime Sense Ltd Distance-Varying Illumination and Imaging Techniques for Depth Mapping

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486740A (en) * 2021-06-23 2021-10-08 深圳市加糖电子科技有限公司 Music switching system based on hand type recognition function

Similar Documents

Publication Publication Date Title
US20130010071A1 (en) Methods and systems for mapping pointing device on depth map
US11392212B2 (en) Systems and methods of creating a realistic displacement of a virtual object in virtual reality/augmented reality environments
US20140009384A1 (en) Methods and systems for determining location of handheld device within 3d environment
US10761612B2 (en) Gesture recognition techniques
US20150070274A1 (en) Methods and systems for determining 6dof location and orientation of head-mounted display and associated user movements
CN111694429A (en) Virtual object driving method and device, electronic equipment and readable storage
US9626801B2 (en) Visualization of physical characteristics in augmented reality
CN110457414A (en) Offline map processing, virtual objects display methods, device, medium and equipment
US20160054791A1 (en) Navigating augmented reality content with a watch
JP5807686B2 (en) Image processing apparatus, image processing method, and program
WO2017020766A1 (en) Scenario extraction method, object locating method and system therefor
US20120306850A1 (en) Distributed asynchronous localization and mapping for augmented reality
JP2013165366A (en) Image processing device, image processing method, and program
JP2015520471A (en) Fingertip location for gesture input
KR101470757B1 (en) Method and apparatus for providing augmented reality service
WO2014185808A1 (en) System and method for controlling multiple electronic devices
CN110568929B (en) Virtual scene interaction method and device based on virtual keyboard and electronic equipment
CN110473293A (en) Virtual objects processing method and processing device, storage medium and electronic equipment
Vokorokos et al. Motion sensors: Gesticulation efficiency across multiple platforms
CN108983954B (en) Data processing method, device and system based on virtual reality
WO2013176574A1 (en) Methods and systems for mapping pointing device on depth map
WO2015030623A1 (en) Methods and systems for locating substantially planar surfaces of 3d scene
KR101558094B1 (en) Multi-modal system using for intuitive hand motion and control method thereof
WO2023124113A1 (en) Interaction method and apparatus in three-dimensional space, storage medium, and electronic apparatus
CN115317907A (en) Multi-user virtual interaction method and device in AR application and AR equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13793634

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13793634

Country of ref document: EP

Kind code of ref document: A1