WO1999035633A2 - Human motion following computer mouse and game controller - Google Patents

Human motion following computer mouse and game controller Download PDF

Info

Publication number
WO1999035633A2
WO1999035633A2 PCT/US1999/000086 US9900086W WO9935633A2 WO 1999035633 A2 WO1999035633 A2 WO 1999035633A2 US 9900086 W US9900086 W US 9900086W WO 9935633 A2 WO9935633 A2 WO 9935633A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
controller
camera
computer
motion
Prior art date
Application number
PCT/US1999/000086
Other languages
French (fr)
Other versions
WO1999035633A3 (en
Inventor
Robert D. Frey
Kevin Grealish
Curtis A. Vock
Charles M. Marshall
Original Assignee
The Video Mouse Group
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Video Mouse Group filed Critical The Video Mouse Group
Priority to AU22117/99A priority Critical patent/AU2211799A/en
Publication of WO1999035633A2 publication Critical patent/WO1999035633A2/en
Publication of WO1999035633A3 publication Critical patent/WO1999035633A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/10Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
    • A63F2300/1012Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals involving biosensors worn by the player, e.g. for measuring heart beat, limb activity
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/10Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
    • A63F2300/1087Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/66Methods for processing data by generating or executing the game program for rendering three dimensional images
    • A63F2300/6661Methods for processing data by generating or executing the game program for rendering three dimensional images for changing the position of the virtual camera
    • A63F2300/6676Methods for processing data by generating or executing the game program for rendering three dimensional images for changing the position of the virtual camera by dedicated player input

Definitions

  • the primary human interfaces to today's computer are the keyboard, to enter textual information, and the mouse, to provide control over graphical information. These interfaces help users with word processing, presentation software, computer aided design packages, spreadsheet analyses, and other applications. These interfaces are also widely used for computer gaming entertainment; though they are often augmented or replaced by a joystick.
  • game complexity generally requires control of the (i) mouse and keyboard, or (ii) joystick and keyboard.
  • gaming applications usually require control in several axes of motion, including forward motion, reverse motion, left turn, right turn, left strafe (slide), right strafe, upward motion, downward motion.
  • many games permit viewing (within the game environment) in directions different from that in which the vehicle (e.g., the car, or person, simulated within the game) is moving, including up, down, left and right.
  • One object of the invention is to offer alternative approaches to human-computer interfaces for those incapable of using standard devices (e.g., mouse, keyboard and joystick) such as due to disability.
  • standard devices e.g., mouse, keyboard and joystick
  • Another object of the invention is to provide an alternative input device for laptop computers.
  • Laptop computers are used in locations which do not allow the use of a mouse, in airplanes or during business meetings in which there is no room to operate the mouse. Through the use of either a clip on camera or a camera built into the laptop display, the laptop user can control the mouse position or use the camera for teleconferencing while on the road.
  • Another object of the invention is to provide a means of human control of a graphical computer interface through the physical motion of the user in order to control the activity of a cursor in the manner usually accomplished with a computer mouse.
  • a further object of the invention is to provide additional degrees of freedom in the human computer interface in support of computer games and entertainment software.
  • Yet another object of the invention is to provide dual use of teleconferencing and video electronics with gaming and computer control systems.
  • cursor means a computer cursor associated with a computer screen.
  • Scene view means the view presented on a computer display to a user.
  • one scene view corresponds to the scene presented to a user during a computer game at any given moment in time.
  • the game might include displaying a scene whereby the user appears to be walking in a forest, and through trees.
  • a cursor might also be visible in the scene view as a mechanism for the user to select certain events or items on the scene (e.g., to open a door in a game, or to open a folder to access computer files).
  • camera refers to a solid state instrument used in imaging.
  • the camera also includes optical elements which refract light to form an image on the camera's detector elements (typically CCD or CMOS).
  • the camera's detector elements typically CCD or CMOS.
  • one camera of the invention derives from a video-conferencing camera used in conjunction with Internet communication.
  • the invention provides systems and methods to control computer cursor position (or, for example, the scene view or game position as displayed on the computer display) by motion of the user at the computer.
  • a camera rests on or near to the computer, or built into the computer, and connects therewith to collect "frames" of data corresponding to images of the user. These images provide information about user motion, over time.
  • Software within the computer assesses these frames and algorithmically adjusts cursor motion (or scene view, or mouse button, or some other operation of the computer) based upon this motion.
  • the motion may be imparted by up-down or left-right motion of the user's head, by the user's hands, or by other motions presented to the video camera (such as discussed herein).
  • a close up view of the users facial features is used to impart a translation in the cursor (or scene view) even through the features in fact rotate with the user's head.
  • the rotation is used to generate a corresponding rotation in computer game scene imagery.
  • the invention also provides a human factors approach to cursor movement in which the user's rate of motion determines the relative motion of the cursor (or scene view).
  • the faster the user's head travels over a set distance the further the corresponding cursor movement over the same time period.
  • the camera is either (a) a visible light camera utilizing ambient lighting conditions or (b) a camera sensitive in another band such as the near infrared ("IR"), the IR, or the ultraviolet (“UV”) spectrum.
  • the illumination preferably emanates from a source such as an IR lamp which is beyond human sensory perception.
  • the sensor is typically mounted facing the user so as to capture a picture of the user's face in the associated electromagnetic spectrum.
  • the lamp is typically integrated with the camera housing so as to facilitate production and ease of consumer set-up.
  • a system of the invention provides an IR camera (i.e., a camera which images infrared radiation) to image the user's face and to gauge the user's stress level associated with a game on the computer.
  • an IR camera i.e., a camera which images infrared radiation
  • the system detects increased heat intensity on the user's face, forehead or other body part by the imagery of the IR camera. This information is fed back into the game processor to provide further enhancement to the game. In this manner, the system gauges the user's reaction to the game and modifies game speed or operation in a meaningful way.
  • Games of the invention are thus made and sold to users with varying intelligence, age and/ or computer familiarity; and yet the system always "pushes the envelope" for any given user so as to make the game as interesting as possible, automatically.
  • images captured by the sensor are processed by a digital signal processor ("DSP") located either (a) in a PC card within the host computer or (b) in a housing integrated with the sensor.
  • DSP digital signal processor
  • sensor frames are sent to the PC card; and detected user motion (sometimes denoted herein as “difference information") is communicated to the user's operating system via a PCI (or USB or later standard) bus interface.
  • difference information commands are interpreted by a low overhead program resident at the user's main processor, which either updates the cursor position on the screen or provides motion information to the user's computer game (e.g., so as to change the scene view).
  • the DSP is contained within the camera housing; and frames are processed local to the camera to determine difference information. This information is then transmitted to the computer by a cable that connects to a bus port of the computer so that the host processor can make appropriate movements of the cursor or scene view.
  • the DSP is mounted in the camera housing such that the camera/ signal processing subsystem produces signals which emulate the mouse via the mouse input connector.
  • frames of image data are sent directly to the host computer through the computer bus; and that image data is manipulated by the computer processor directly.
  • sensor data frames can be sent directly to the host processor for all processing needs, in which case the PC card and/ or separate DSP are not required.
  • the update rates are likely too slow for practicality. Once GHz processors are on the market, a separate DSP may no longer be needed.
  • pixel format or pixel density of the camera drives the accuracy of the system.
  • Camera formats of 240 vertical by 320 horizontal generally provide satisfactory performance.
  • the number of pixels that may be utilized is determined by system cost factors. Greater numbers of pixels require more powerful DSPs (and thus more costly DSPs) in order to process the image sequences in real time. Current technology limits the processing density to a 64x64 window for consumer electronics. As prices are reduced, and power increases, the densities can increase to 128x128, 256x256 and so on.
  • Non-square pixel formats are also possible in accord with the invention, including a 64x128 detector array size.
  • the data transfer rate from the camera is 30 frames/ second at 240x320 pixels per frame. Assuming eight bits per pixel, the digital data transfer rate is therefore 18.432 megabits/ second. This is a fairly high transfer rate for consumer products using current technology. While the data transfer can be either analog or digital, the preferred method of image data transfer for this aspect is via a standard RS170 analog video interface.
  • a system of the invention defines two imaging zones (either within a single camera CCD or within multiple CCD cameras housed within a single housing).
  • One imaging zone covers the user's head; and the other covers the user's eyes.
  • This aspect includes processing means to process both zones whereby movement of the user's head provides one mechanism to control cursor movement (or scene view motion), and whereby the user's eyes provide another mechanism to control the movement.
  • this aspect increases the degrees of freedom in the control decision making of the system.
  • a user might look left or right within a game without moving his head; but by assessing movement of the user's eyes (or the pupils of those eyes), the scene view can be made to rotate or translate in the manner desired by the user.
  • a user might move his head for other reasons, and yet not move her eyes from a generally forward looking position; and this aspect can assess both movements (head and eyes) to select the most appropriate movement of the cursor or scene view, if any.
  • a system of the invention utilizes a camera with zoom optics to define the user's pupil and to make cursor or scene views move according to the pupil.
  • the system incorporates a neural net to "learn" about a user's eye movements so that more accurate movements are made, over time, in response to the user's eye movement.
  • a neural net is used to learn about other movements of the user to better specify cursor or scene view movement over time.
  • a system is provided with two CCD arrays (either within a single camera body or within two cameras). The arrays connect with the user's computer by the techniques discussed herein. One CCD array is used to image the user's head; and the other is used to image the user's body. Motion of the user is then evaluated for both head and body movement; and cursor or scene view movement is adjusted based upon both inputs.
  • a single CCD is used to image the user.
  • alternate frames are zoomed, electronically, so that one frame views the user's head, and the next frame views the user's eyes.
  • these separate frame sequences are processed separately and evaluated, together, to make the most appropriate cursor or scene view movement. If for example, the system clocks at 30Hz, then one set of frame sequences operates at 15Hz, and the other at 15Hz.
  • the advantage is that two movement information sets can be evaluated to invoke an appropriate movement in the cursor or scene view.
  • frame rates can be used; and frame rates for either sequence (head or eyes) can occur at different rates too.
  • the separate frame sequences can utilize other body parts, e.g., the head and the hand, to have two movement evaluations.
  • a separate camera or CCD array
  • CCD array can be used to image other body parts, for example one camera for the head and one for the hand.
  • the invention also provides methods for shifting cursor or scene views in response to user movement.
  • the scene view shifts left or right when the user shifts left and right.
  • the scene view rotates when the user's head rotates. This last aspect can be modified so that such rotation occurs so long as the eyes do not also rotate (in this situation, the user's head rotates, indicating that she wishes the scene view to rotate; but the eyes do not, indicating that she still watches the game in play).
  • the scene view rotates in response to the user's hand rotation (i.e., a camera or at least a CCD array of the system is arranged to view the player's hand).
  • the invention provides a multi-zone player gaming system whereby the user of a particular computer game can select which zone operates to move the cursor or the scene view.
  • the system can include one zone corresponding to a view of the user's head, where frames of data are captured by the system by a camera. Another zone optionally corresponds to the user's hand. Another zone optionally corresponds to the user's eyes.
  • Each zone is covered by a camera, or by a CCD array coupled within the same housing, or by optical zoom zones within a single CCD, or by separate optical elements that image different portions of the CCD array.
  • two zones can be covered with a single CCD array (i.e., a camera) when the zones are the user's head and eyes.
  • the camera images the head, for one zone, and images the eyes in another zone, since the zones are optically aligned (or nearly so).
  • two cameras or optionally two CCD arrays with separate optics
  • Zones in a single camera can also be identified by the computer by prompting the user for motion from corresponding body parts. For instance, the computer identifies the head zone by prompting the user to move his head. Then the computer identifies the foot zone by having the user move his foot. Once the zones are identified, the motion of each of these individual zones is tracked by the computer and the regions of interest in the camera image related to the zones moved as the targets in the zones move with respect to the camera.
  • the invention provides a system, including a camera and edge detection processing subsystem, which isolates edges of the user's body, for example, the side of the head. These edges are used to move the cursor or scene view. For example, if the left edge of head is imaged onto column X of one frame of the CCD within the camera, and yet the edge falls in column Y in the next frame, then a corresponding movement of the cursor or scene view is commanded by the system. For example, movement of the edge from one column to the next might correspond to ten screen pixels, or other magnification. In one aspect, this magnification is selected by the user. Up and down motion can also be detected by similar edge detection.
  • an edge movement in the up or down dimension is formed (e.g., if the bottom edge of the chin moves from one row to the next, in adjacent frames, then a corresponding movement of the cursor or scene view is made - magnification again preferably set manually with a default starting magnification).
  • Other images can also serve to define edges.
  • a user's eyelash can be used to move the cursor (or scene view) up or downwards; though typically the eye blink is used to reset the cursor command cycle.
  • an optical matched filter is used to center image zones onto the appropriate images.
  • one aspect preferably utilizes 64x64 pixels as the image frame from which cursor motion is determined. Many cameras have, however, many more pixels. These 64x64 arrays are therefore preferably established through matched filtering.
  • an image of a standard pair of user's eyes is stored within memory (according to one aspect of the invention). This image field is cross-correlated with frames of data from the actual image from the camera to "center" the image at the desired point. With eyes, specifically, ideally the 64x64 sample array is centered so as to view both eyes within the 64x64 array.
  • a standard head image is stored within memory, according to one aspect, and correlated with the actual image to center the head view.
  • an appropriate frame size can be established from an image having more or fewer pixels, by redundantly allocating data into adjacent pixels or by eliminating intermediate pixels, or similar technique.
  • a camera is provided which optically "zooms" to provide optimal imaging for a desired image zone.
  • the invention of one aspect takes an image of the user's head, determines the location of the user's eyes (such as by matched filtering), and optically zooms the image through movement of optics to provide an image of the eyes in the desired processing size format.
  • autofocus capability preferably operates in most of the aspects of the invention where imaging is a feature of the processing.
  • the camera utilizes a very small aperture which results in a very large depth of field. In such a situation, autofocus is not required or desired. The optical requirements for the lenses are also reduced.
  • game controllers can now include feedback corresponding to the user's actual movement.
  • the cursor or scene view
  • the scene view can also be made to rotate, reflecting that movement.
  • a processing subsystem (connected with the camera) is used to make cursor movement (or scene view movement) correspond to user's motion.
  • This processing subsystem of another aspect further detects when the user twists his head, to add an additional dimension to the movement.
  • a system of the invention includes an IR detector which is used to determine when a person sweats or heats up (by imaging, for example, part of the user's head onto the IR detector); and then the system adjusts game speed in a way corresponding to this movement.
  • a heartbeat sensor is tied to the person to sense increased excitement during a game and the system speeds or slows the game in a similar manner.
  • a heartbeat sensor can be constructed, in one aspect of the invention, by thermal imaging of the user's face, detecting blood flow oscillations indicative of heartbeat.
  • the heartbeat sensor is physically tied to the user, such as within the computer mouse or joystick.
  • a computer of the invention adapts to user control as selected by a particular user. For example, in the case of a handicapped person, a particular user might select certain hand-movements, e.g., a single finger up, to move the cursor up; and another finger down to move the cursor left.
  • a neural network is used to assist the processing system in establishing proper cursor movement.
  • the computer for example learns to print something by movement of the user's finger (or other body part).
  • tipping of the user's head is used to provide another degree of freedom in moving the cursor or adjusting the scene view.
  • a tilt of the head as imaged by the camera, can be set to command a rotation of the scene view.
  • a camera of the invention uses autozoom to move in and out of a given scene view.
  • the camera is first focussed on the user's face in one frame; but in a subsequent frame the camera must focus to closer to compensate for the fact that the user moved closer to the camera (typically, the camera is on the monitor, so this also means that the user moved closer to the scene view).
  • This autofzoom is used, in one aspect, to make the scene view appear as if the user is "creeping" into the scene. By moving the scene in and out, the user will perceive that he is moving in or out of the scene view.
  • a camera images an object held by the user.
  • the object has a well-defined shape.
  • the system images the object and determines difference information corresponding to movement of the object.
  • rotating the object upside down results in difference information that is upside down; and then the scene view inverts by operation of the system.
  • twisting of the object rotates the scene view left or right, or rotates the scene in the direction of the twisting.
  • two cameras image the user: one camera pointed at the front of the users face or hand and the other down at the top of the users head or hand.
  • the front facing camera is used to detect rotational and linear translation in up-down and left-right directions.
  • the top viewing camera determined front-back, left right translation.
  • the front-back translation observed by the top camera is used to control forward and back motion in the users 3-D view.
  • the top sensed left-right translation controls the users left right slide or strafe.
  • the top sensed left-right motion is removed from the front view left-right translation with the remaining front view measure representative of left-right twist. All of the front view up-down translation can be interpreted as up-down twist.
  • Figure 1 illustrates one human computer interface system constructed according to the invention
  • Figure 1A shows an exemplary computer display illustrating cursor movement made through the system of Figure 1;
  • Figure IB illustrates overlayed scene views, displayed in two moments of time on the display in Figure 1, of a shifting scene made in response to user movement captured by the camera of Figure 1;
  • Figure 1C shows an illustrative frame of data taken by the system of Figure 1;
  • Figure 2 illustrates selected functions for a printed circuit card used in the system of Figure 1;
  • Figure 3 illustrates an algorithm block diagram that preferably operates with the system of Figure 1;
  • Figure 4 illustrates one preferred algorithm process used in accord with the invention to determine and quantify body motion
  • Figure 5 shows one process of the invention for communicating body motion data to a host processor, in accord with the invention, for augmented control of cursor position or scene view;
  • Figure 5A shows representative frame of data of a user taken by a camera of the invention, and further illustrates adding symbols to key body parts to facilitate processing;
  • Figure 6 illustrates a two camera imaging system for implementing the teachings of the invention
  • Figure 7 illustrates two positions of a user as captured by a camera of the invention
  • Figure 7A illustrates two positions of a scene view on a display as repositioned in response to movement of the user illustrated in Figure 7;
  • Figure 8 illustrates motion of a user - and specifically twisting of the user's head - as captured by a system of the invention
  • Figure 8A illustrates a first scene view corresponding to a representative computer display before the twisting
  • Figure 8B illustrates a second scene view corresponding to a rotation of the first scene view in response to the twisting by the user
  • Figure 8C shows processing features of the processing section of Figure 8
  • Figure 8D illustrates multiple image frames stored in memory for matched filtering with raw images acquired by the system of Figure 8;
  • Figure 9 illustrates a two camera system of the invention for collecting N zones of user movement and for repositioning the cursor or scene view as a function of the N movements;
  • Figure 9A illustrates a representative thermal image captured by the system of Figure 9;
  • Figure 9B illustrates process methodology for processing thermal images as a real time input to game processing speed, in accord with the invention;
  • Figure 10 illustrates another two camera system of the invention for targeting multiple image movement zones on a user, and further illustrating optional DSP processing at the camera section;
  • Figure 11 illustrates framing multiple movement zones with a single imaging array, in accord with the invention
  • Figure 12 illustrates framing a user's eyes in accord with the invention
  • Figure 12A shows a representative image frame of a user's eyes
  • Figure 13 illustrates one system of the invention, including zoom, neural nets, and autofocus to facilitate image capture;
  • Figures 14, 14A and 14B illustrate autofocus motion control in accord with the invention
  • FIGS. 15 and 15A illustrate one other motion detect system algorithm utilizing edge detection, in accord with the invention
  • Figure 16 illustrates one other motion detect system algorithm utilizing well- characterized object maniputions , in accord with the invention
  • Figure 17 illustrates one other motion detect system algorithm utilizing varied body motions, in accord with the invention.
  • Figure 18 illustrates a two camera system of the invention with a camera observing the user's face while the other observes the top of the user's head;
  • Figure 19 shows a blink detect system of the invention.
  • Figure 20 shows a re-calibration system constructed according to the invention.
  • Figure 1 illustrates, in a top view, certain major components of a human computer interface system 10 of the invention.
  • a user 12 of the system 10 sits facing a computer monitor 14 with display 14a.
  • a camera 16 is mounted on the computer monitor 14 facing the user 12.
  • the camera 16 is mounted in such a way that the user's face 12a is imaged within the camera's field of view 16a.
  • the camera 16 can alternatively image other locations, such as the user's hand, eyes, or on other objects; so imaging of the user's face, in Figure 1, must be considered illustrative, rather than limiting.
  • the camera location can also reside at places other than on top of the monitor 14.
  • the camera 16 interfaces with a printed circuit card 18 mounted within the user's computer chassis 20 (which connects with the monitor 14 by common cabling 20a).
  • the camera 16 interfaces to the printed circuit card 18 via a camera interface cable 22.
  • the circuit card 18 also has processing section 18a, such as a digital signal processing (“DSP”) chip and software, to process images from the camera 16.
  • DSP digital signal processing
  • the camera 16 and card 18 capture frames of image data corresponding to user movement 25.
  • the processing section 18a algorithmically processes the image data to quantify that movement 25; and then communicates this information to the host processor 30 within the computer 20.
  • the host processor 30 then commands movement of the computer cursor in a corresponding movement 25a, Figure 1A ( Figure 1A illustrates a representative front view of the display 14a, and also illustrates movement 25a of the cursor 26 moving within the display 14a in response to user movement 25).
  • Figure IB illustrates an alternative (or supplemental) process whereby the scene view shifts in response to user movement 25.
  • Figure IB illustrates a first scene view 35a, which generally corresponds to a forest prior to the user's movement 25; and an overlayed scene view 35b (shown in dotted line, for purposes of illustration) that is shifted by an amount 37 in response to the user's movement 25.
  • the shift 37 in the scene view 35 is accomplished by combined operation and processing of the processing section 18a and host CPU 30.
  • Figure 1C shows a representative frame 41 of data 43 as taken by the camera 16.
  • data 43 represents the user's face 12a taken at a given moment of time.
  • Subsequent frames are used to determine user motion 25 relative to the frame 41, as discussed herein.
  • the frame 41 is made up of the plurality of pixel data 45, as known in the art.
  • FIG. 2 illustrates certain functions processed within the printed circuit board 18 of Figure 1.
  • a camera interface circuit 50 receives video data from the camera 16 through interface cable 22.
  • This video data can be RSI 70 format or digital, for example.
  • circuit 50 decodes the analog video data to determine video timing signals embedded in the analog data. These timing signals are used for control of the analog- to-digital (A/D) converter included in circuit 50 that converts analog pixel data into digital images.
  • the analog data is digitized into 6-bits, though any number of bits greater may be acceptable and/ or required for features as discussed herein.
  • camera interface 50 accepts the digital data without additional quantization, although interface 50 can digitally pre-process the digital images if desired to acquire desired image features.
  • the frame difference electronics 52 receives digital data from the camera interface circuit 50.
  • the frame difference electronics 52 include a multiple frame memory, a subtraction circuit and a state machine controller/ memory addresser to control data flow.
  • the frame memory holds previously digitized frame data.
  • the preferred implementation uses the frame just previous to the current frame, though an older frame which resides in the frame memory may be used.
  • the resulting difference is output to an N-frame video memory 54.
  • the new frame pixel data is then stored into the frame memory of the frame difference electronics 52.
  • the N frame video memory electronics 54 either receives differenced frames output by the frame difference electronics 52 (discussed above) or raw digitized frames from the camera interface 50. The choice of where the data derives from is made by software resident on the DSP 56.
  • the frame video memory 54 is sized to hold more than one full frame of video and up to N number of frames. The number of frames N is to be driven by hardware and software design.
  • the DSP 56 implements an algorithm discussed below. This algorithm determines the rate of head motion of the user in two dimensions.
  • the digital signal processor 56 also detects the eye blink of the user in order to emulate the click and double click action of a standard mouse button. In support of these functions, the DSP 56 commands the N frame video memory 54 to supply either the differenced frames or the raw digitized frames.
  • the digital signal processor thus preferably utilizes a supporting program memory 58 made up of electrically reprogrammable memory (EPROM) and data memory 59 including standard volatile random access memory (RAM).
  • EPROM electrically reprogrammable memory
  • RAM standard volatile random access memory
  • the DSP 56 also interfaces to the PCI bus interface electronics 60 through which cursor and button emulation is passed to the user's main processor (e.g., the CPU 30, Figure 1).
  • the PCI interface 60 also passes raw digitized video to the main processor as an optional feature. Interface 60 also permits reprogramming of program memory 58, to allow for future software upgrades permitting additional features and performance.
  • the PCI interface electronics 60 thus provides an industry standard bus interface supporting the aforementioned communication path between the printed circuit card 18 and the user's main processor 30.
  • the printed circuit card 18 and camera 16 can provide compressed video to the user's main processor 30.
  • This compressed video supports using the system 10 in teleconferencing applications, providing dual use as either human computer interface system 10 and/ or as a teleconferencing system in an economical solution to two distinct applications.
  • Figure 3 describes one preferred head motion block diagram algorithm 70 used in accord with the invention. Not all of the functions shown in Figure 3 are implemented in software in the DSP 56. Rather, this algorithm relies on the correlation of images from one frame to the next, and particularly relies on the use of frame differenced images in the correlation process.
  • the frame differencing operation removes parts of the camera images that are unchanged from the previous frame. For example, room background (such as object 13, Figure 1) behind the user 12 is removed from the image. This greatly simplifies detection of feature motion. Even the image of the user's face image consists of regions of uniform illumination such that even with the user's facial motion, these uniform regions (i.e. cheeks, forehead, chin) may also be removed.
  • the user's face 12a also consists of typically dynamic features such as the nose, eyes, eyebrows and mouth, each of which typically has enough spatial detail that will be evident in the differenced image. As the user moves his face with respect to room lighting, the shape and distribution of these features will change; but the frame rate of the camera 16 ensures that these features look similar from one frame to the next. The correlation process therefore operates to determine how these differenced features are moving from one frame to the next in order to determine user head motion 25.
  • the algorithm of block diagram 70, Figure 3 receives video images 72 of the user as imaged by camera 16 over time. Each received image is passed to both a frame memory 74 and a differencer 76. Though the preferred embodiment is to buffer a single frame in memory 74, the memory 74 may optionally store many frames, buffered such that the first frame input is the first frame output. The delayed frame is read from the frame memory 74 and subtracted from the current frame using the differencer 76. Frame output from the differencer 76 is provided to both a correlation process 78 and a difference frame memory 80.
  • frame memory 80 utilizes a single difference frame; however the difference frame memory 80 can hold many difference frames in sequence for a finite time period.
  • the delayed difference frame is read from the difference frame memory and provided to the correlation function 78. Difference frames are preferably selectable by system algorithms.
  • the correlation process 78 determines the best combination of row and column shifts in order to minimize the difference between the current difference frame and the delayed difference frame. The number of rows and columns required to align these difference images provides information as to the user's motion.
  • the best-fit function algorithm 82 determines the row and column shift to provide optimal alignment.
  • the best-fit function can consist of a peak detect algorithm. This algorithm may either be implemented in hardware or in software.
  • the best-fit function algorithm determines relative motion in rows and columns of the observed user's features.
  • the cursor update compute function algorithm 84 translates this measured motion into the position change required of the cursor (e.g., the cursor 26, Figure 1 A). Typically, this is a non-linear process that, with greater head motion, the cursor moves a non-proportionally greater distance. For example a 1-pixel user motion can cause the cursor to move one screen pixel while a 10-pixel user motion may cause a 100-pixel screen cursor motion. However, these magnifications can be adjusted for desired result.
  • This algorithm may either be implemented in hardware or in software such as through an ASIC or FPGA.
  • Video cursor control 86 provides a user interface to enable and disable the operation of cursor control described above. This control is implemented, for example, through a combination of keystrokes on the user's keyboard (for example as connected to the host computer 20, Figure 1). Alternatively, cursor control is activated or deactivated by sensing the eye-blink of the user (or some other predetermined movement). In this alternative embodiment, an output signal 85 from the correlation section 78 is sent to the video enable section 86; and the output signal 85 corresponds to blink data from the user's face 12a ( Figure 1A). In another embodiment, the video cursor control section 86 activates or deactivates cursor control by recognizing voice commands.
  • a microphone 87 detects the user's voice and a voice recognition section 89 converts the voice to certain activate or deactivate signals.
  • the section 89 can be set to respond to "activate” as a voice command that will enable cursor control; and "deactivate” as a command that disables cursor control.
  • the functionality of the video cursor control 86 provides the user with the equivalent of a mouse pick-up, put-down action. As the user moves the cursor from left to right across the screen, the user would de-activate motion-based cursor control in order to allow the user to move his head back to the left. Once the user has recentered his head, the user would once again activate the cursor control and continue to move the cursor about the screen.
  • the activation/ deactivation of the mouse input is represented by the switch 90, such that the open position of the switch disables human motion control of the cursor and supplies a zero change input to the summation operation 92 in such conditions.
  • control of scene view may also be implemented by an algorithm such as shown in Figure 3. Specifically, a similar algorithm can provide movement of the current scene view, in accord with the invention.
  • the result of the cursor update compute function 84 is added to the known current cursor position at the summing operation 92.
  • This summation has a x component and a y component.
  • the result of the summation 92 is used to update the cursor position (or scene view) on the user's screen via the user's operating system. Cursor position may thus be controlled by both user motion as well as the motion imparted by another input device such as a standard computer mouse.
  • FIG. 4 provides a detailed description of the preferred implementation of the algorithm described in functions 73, 76, 78, 80 and 82 of Figure 3.
  • Video data is received by the processing electronics in both a single frame memory 100 and a differencer 102.
  • the output of the frame memory 100 is also provided to the differencer 102 such that the previous frame is subtracted from the current frame.
  • This differenced frame is than processed by a two dimensional FFT 104.
  • the complex result of the FFT 104 is provided to a complex multiplier 106 and a complex memory 108.
  • the complex memory 108 is the size of the processed image, each location containing both a real and imaginary component of a complex number.
  • the previous FFT result contained in the complex memory 108
  • the conjugate operation 110 is provided to the conjugate operation 110.
  • the complex conjugate of each element is computed and provided to the complex multiplier 106. In this manner, the FFT of the previous frame difference is conjugated and multiplied against the FFT of the current difference image.
  • item 76 has similar functionality to item 102; item 78 has similar functionality to items 104, 106, 108, 110, 112; item 80 has similar functionality to item 108; and item 82 has similar functionality to item 114.
  • the two dimensional array of complex products output by the complex multiplier 106 is provided to a two dimensional inverse FFT operation 112. This operation creates an image of the correlation function 114 between the latest pair of difference images.
  • the correlation image is processed by the peak detection function 114 in order to determine the shift required in aligning the two difference images.
  • the x-y magnitude of this shift is representative of the user's motion. This x-y magnitude is provided to the software used to update the cursor position as described in Figure 3.
  • Figure 5 shows an algorithm process 130 of the invention and which applies motion correlation operations over sub-frames of the video image.
  • This allows motions of various body parts to convey input with specialized meaning to applications operating on the host computer.
  • motion of the hands, arms and legs provide for greater degrees of freedom for the user to interact with the host application (e.g., a game).
  • Commands of this type are useful in combative games where computer animated opponents fight under control of the user.
  • the hand, arm and leg motions of the user become punch, chop and kick commands to the computer after process 130.
  • This command mode can also be used in situations where the user does not have ready access to a keyboard, to augment cursor control of the previously described head position correlator.
  • Process 130 identifies the functions required to derive commands from general motions of the user's body.
  • the scene analyzer function 132 receives digitized video frames from the camera (e.g., the camera 16 of Figure 1) and identifies sub-frames within the video for tracking various parts of the user's body.
  • the frame difference function 134 and correlator function 136 provide similar functions as processes 74, 76 and 78 of Figure 3.
  • the correlation analyzer 138 receives correlated difference frames from the correlator function 136 and sub-frame definitions from the scene analyzer 132.
  • the correlation analyzer 138 applies a peak detection function to each sub-frame to identify the shift required to achieve best alignment of the two images.
  • the motion interpreter 140 receives motion vectors for each sub-frame from the correlation analyzer 138.
  • the motion interpreter 140 links the motion vector from each sub-frame with a particular body segment and passes this information onto the host interface 142.
  • the host interface 142 provides for communication with the host processor (e.g., CPU 30, Figure 1). It sends data packets to the host to identify detected body motions, their directions and their amplitudes.
  • the host interface 142 also receives instruction from the host as to which body segments to track which it then passed along to the motion interpreter 140 and the scene analyzer 132.
  • the scene analyzer 132 first identifies the location of the user's body in the image and locates the position of various parts of the user's body such as hands, forearms, head, and legs.
  • the techniques and methods used to identify the user's body location and body part positions can be accomplished using techniques well known to those skilled in the art (by way of example, via matched filtering).
  • Body identification can also be augmented by marking different locations on the user's body with unique visual symbols. Unique symbols are assigned to key body joints such as elbows, shoulders, hands, neck, knees, and waist and are mounted on the body. See for example Figure 5A.
  • Figure 5A illustrates one frame 149 of data of an image of the user 150 as taken by a camera of the invention.
  • the image corresponds to a full body image of the user 150, including arms 151, legs 152, elbows 151a, hands 153, head 154, neck 155, ears 156, and forhead 157.
  • These parts 151-157 are identified by processes of the invention (e.g., spatial location in the image, by matched filtering or other image recognition technique), and the image is preferably marked with unique symbols (e.g., "X” for center of the face, "Y” for center of the hand 153, "T” for center of the user's foot, "Z” for body center, and "F” for forehead 157).
  • process 130 locates various body parts and preferably marks them with symbols to fill in connecting logic (e.g., the left wrist and left elbow symbol identify the location of the left forearm).
  • connecting logic e.g., the left wrist and left elbow symbol identify the location of the left forearm.
  • sub-frames surrounding each of the body segments identified by the host processor are generated.
  • a sub-frame is a generally regularly shaped region within the image that surrounds a particular body part.
  • the sub-frames are sized to center the subject body part in the sub-frame and to provide enough room around the body part to accommodate typical body motions.
  • One sub-frame 160 is shown in frame 149, Figure 5 A, surrounding the user's foot "T".
  • the scene analyzer 132 will generally not operate on each frame of video since continuously changing the sub- frames adds unnecessary complication to the correlation analyzer 138. Instead, the scene analyzer 132 runs as a background process updating the sub-frame locations periodically.
  • Figure 4 provides a detailed description of one algorithm which can be used to implement processes 134-138 of Figure 5.
  • the invention of one embodiment can thus track the motion of the user's body using symbols attached to key joints.
  • the position of the user's left lower arm can be determined by locating the unique symbol for the left hand " ⁇ i" and left elbow "•".
  • Unique symbols thus allow the processor to rapidly locate
  • the algorithm e.g., Figure 4
  • the algorithm compares the position of the relevant body parts in consecutive frames and determines how they moved (for example, using geometry). Once motion is determined, it is then passed to the host CPU where the motion is acted on as appropriate for the particular application.
  • FIG. 6 illustrates a two camera system 200, constructed according to the invention.
  • the cameras 202a, 202b are arranged to view separate parts of the user: camera 202a images the user's face 204; and camera 202b images the user's hand 206.
  • the cameras 202 conveniently rest on top of the computer display 208 coupled to the host computer 210 by cabling 216.
  • the cameras 202 couple to the signal processing card 212 residing within the computer 210 by cabling 213.
  • motion of the user's head 204 and/ or hand 206 are detected by the signal processing card 212, and difference information is communicated to the computer's CPU 210a via the computer bus 214.
  • This difference information corresponds to composite movement of the head 204 and hand 206; and is used by the CPU 210a to command movement of display items on the display 208 (for example, the display items can include the cursor or scene view as shown on the display 208 to the user).
  • Information shown on the display 208 is communicated from the computer 210 to the display 208 along standard cable 216.
  • Figures 7 and 7 A illustrate how motion of a user's head is for example translated to motion of the cursor and/ or scene view, in accord with the invention.
  • Figure 7 shows a representative image 220 of a user captured within a frame of data by a camera of the invention.
  • Figure 7 also shows a representative image 222 (in dotted outline, for clarity of illustration) of the user in a subsequent frame of data, indicating that the user moved "M" inches.
  • Figure 7 A illustrates corresponding scene views on a computer display 224 that is coupled to processing algorithms of the invention (i.e., within a system that includes a camera that captures the images 220, 222 of Figure 7).
  • the display 224 illustratively shows a scene view that includes a road 224a that extends off into the distance, and a house 224b adjacent to the road 224a.
  • a computer cursor 224c is also illustratively shown on the display 224 as such a cursor is common even within computer games, providing a place for the user to select items (such as the road or house 224a, 224b) within the display 224.
  • the display 224 also shows, with dotted outlines 226, the scene view of road and house which are shown on the display 224 after motion by the user from 220 to 222, Figure 7 (the cursor 224c is for example repositioned to position 224c').
  • the repositioning of the scene view from 224a, 224b to 226 occurs immediately (typically less than 1/30 second, depending upon the camera) after the movement of the user of Figure 7 from 220 to 222.
  • the scene view is repositioned by x-pixels on the display 224, so that M/x corresponds to the magnification between user movement and scene view repositioning.
  • This magnification can be set by parameters within the system; and can also be set by the user, if desired, at the computer keyboard.
  • the rate at which the scene view moves the distance of x-pixels preferably occurs at the same rate as the rate of travel along distance M.
  • the magnification can be dependent on the rate of motion such that a larger displacement of x-pixels will occur for a given motion M if the rate of change of M is larger.
  • Figure 8 illustrates a further motion that can be captured by a camera of the invention and processed to reposition a scene view, as shown in Figures 8 A and 8B. More particularly, Figure 8 illustrates a camera 250 connected to a processing section 252 which converts user motion 254 to corresponding repositioning of the computer scene view. As above, the user 256 is captured by the camera's field of view 258 and frames of data are captured by the processing section 256. In Figure 8, motion 254 corresponds to a twisting of the user's head 256; and processing section 252 detects this twisting and provides repositioning information to the host computer (not shown). Processing section 252 can also incorporate head-translation motion (e.g., illustrated in Figure 15A) into the scene view movement above; and can similarly reject translational movement too, if desired, so that no scene motion is observed for translation of the user 256.
  • head-translation motion e.g., illustrated in Figure 15A
  • Figure 8A shows a representative scene view 260 on a display 262 coupled to the host computer.
  • Figure 8B illustrates repositioning of the scene view 260' after the processing section 252 detects motion 254 and updates the host computer with difference information (e.g., that information which the host computer uses to rotate or translate the scene view).
  • Figure 8A also illustrates the intent of the rotating scene view feature.
  • a person 260a is shown in the scene view 260, except that the person 260a is almost completely obscured by the edge 262a of the display 262.
  • the scene view 262 is rotated in the corresponding direction - as shown by scene view 260' in Figure 8B - so that the user 260a' is completely visible within the scene view 260'.
  • FIG 8C illustrates further detail of the processing section 252.
  • Camera data such as frames of images of a user are input to the section 252 at data port 266.
  • the data are conditioned in the image conditioning section 268 (for example, to reduce correlated noise or other image artifacts).
  • the camera data is compared and correlated in the image correlation section 270, which compares the present frame image with a series of stored images from the image memory 272.
  • the present data image frame 249 is cross-correlated with each of the images within the image memory 272 to find a match.
  • These images correspond to a series of images of the user in known positions, as illustrated in Figure 8D.
  • various images are stored representing various known positions of relevant part, here the user's head 256.
  • the 0° stored memory image would provide the greatest cross- correlation value indicating a matched image position. Accordingly, the scene view would adjust to a zero position. If, however, the image correlated to a -90° position, the scene would rotate to such a position. Other movements cause additional scene view motions, including tilt and tip of the head, as shown in the two images "0°, Down 45°" image and the "0°, Up 45°". These images cause the scene view to move upwards or to tilt up or down, when the process section 252 correlates the current frame to these images. As indicated, these images have no left or right component, though other images (not shown) can certainly include left, right and tip motion simultaneously.
  • Figure 9 shows a system 300 constructed according to the invention and including a camera section 302 including an IR imager 304 and a camera 306, both of which view and capture frames of data from a user 308.
  • the IR imager 304 can include, for example, a microbolometer array (i.e., "uncooled” detectors known in the art) which produces a frame of data corresponding to the infrared energy emitted from the user, such as illustrated in Figure 9A.
  • Figure 9A shows a representative frame of IR image data 310, with zones 312 of relatively hot image data emitting from regions of forehead, nose and mouth of the user 308.
  • the cameras 304, 306 send image data back to the signal processing section 314.
  • Data from the camera 306 is processed, if desired, as above, to determine difference information signal 322 used by a connected computer to reposition the cursor and/ or scene view.
  • Data from camera 304 is used to evaluate how much (or how hot) zones 312 appear on the user during play of the computer.
  • the signal processing section 314 assesses the zones 312 for temperature and/ or size over the course of a computer game and generate a "game speed control" signal 320 which is communicated to the user's computer (i.e., that computer used in conjunction with the system 300 of Figure 9).
  • the user's computer processes the signal 320 to increase or decrease the speed of a computer game in process on the computer.
  • the IR camera 304 can be used without the features of the invention which assess user movement. Rather, this aspect should be considered stand-alone, if desired, to provide active feedback into gaming speed based upon user temperature and/ or stress. Note that the camera 304 can also be used to detect heartbeat since the zones 312 generally pulse at the user's heartbeat, so that heartbeat rate can also be considered as a parameter used in the generation of the game speed control signal 320. Alternatively, a pulse rate can be determined by known pulse rate systems that are physically connected to the user 308.
  • An IR lamp 324 can be used in system 300 to illuminate the user 308, with IR radiation 324a, such that sufficient IR illumination reflects off of the user 308 whereby motion control of the cursor and/ or scene view can be made without the additional camera 306.
  • the lamp 324 can be, and preferably is, made integrally with the section 302 to facilitate production packaging.
  • An IR lamp 324 operating in the near-IR can also be used with visible cameras of the invention which typically respond to near-IR wavelengths.
  • certain camera systems now available incorporate six IR emitters around the lens to illuminate the object without distraction to the user who cannot see the near-IR emitted light. Such a camera is suitable for use with the invention.
  • Figure 9B shows process methodology of the invention to process thermal user images in accord with the preferred embodiment of the invention.
  • a system such as system 300 first acquires a thermal image map in process block 326. This image is compared to a reference image ("REF") in process block 327.
  • REF can either be a temperature of the user (i.e., a temperature of one hot spot of a non- stressed user, or the temperature of one hot spot of the user at an initial, pre-game condition) or an amount of the area 312, Figure 9A, of the user in a non-stressed condition or initial pre-game condition).
  • REF can be an image such as the frame 310 of Figure 9 A.
  • the system 300 detects this change and determines that the image map exceeded the REF condition, as illustrated in process block 328. Should the map exceed the REF condition, the system 300 communicates this to the host processor which in turn adjusts the gaming speed, as desired. If the map does not exceed the REF condition, then the next IR image frame is acquired at block 326.
  • System 300 and the process steps of Figure 9B are thus suitable to adjust gaming speed in real time, depending upon user stress level.
  • the gaming speed is increased automatically such that the image map exceeds the REF signal for greater than about 50% of the time, so that all users, regardless of their ability, are pushed in the particular game.
  • multi-camera embodiments of the invention can and preferably are incorporated into a common housing 338, such as shown in Figure 10.
  • cameras can also be made from detector arrays 340, processing electronics 342, and optics 344.
  • Each camera 340, 342, 344 is constructed to process the correct electromagnetic spectrum, e.g., IR (using, for example, germanium lenses 344 and microbolometer detectors 340).
  • Each camera has its own field of view 350a, 350b and focal distance 352a, 352b to image at least a part of the user. These field of views 350 can overlap, to view the same area such as the user's face, or they can view separate locations, such as the user's head and hand.
  • Cameras of the invention can also include a DSP section 356 such as described above to process user motion data.
  • the DSP section 356 processes user motion data and sends difference information to the user's host computer.
  • the host computer thereafter repositions the cursor and/ or scene view based upon the difference information so that the user observes corresponding motion on the computer display, as described above. Accordingly, the DSP section need not reside within the computer so long as difference information is isolated and communicated to the host computer CPU.
  • Figure 11 illustrates frame capture by one camera of the invention to isolate zones of imaging according to expected motion patterns.
  • one frame 370 of data for example covers the user's eyes 371, corresponding to one image zone; and another frame 372 of data can cover the user's head 373, corresponding to another image zone.
  • the frames 370, 372 are 64x64 pixels each, or 256x256 (or higher powers of two) to provide FFT capability on the image within the frame.
  • a single camera can however provide both frames 370 and 372, in accord with the invention.
  • a dense CCD detector array (e.g., 480x740 pixels, 1000x1000 pixels, or higher) is used within the camera such that the whole array captures an image frame 376 of data, at least covering the available image format of the computer display 378.
  • a matched filtering (or other image locate process) is processed on the frame 376 to locate the center 371a of the user's eyes (in the matched filtering process, an image data set of the user's eyes is stored in memory and correlated to the frame 376 such that a peak correlation is found at position 371a). Thereafter, a 64x64 array of data is centered about the eyes 371 to set the frame 370.
  • every other pixel is discarded so that, again, a 64x64 array is set for the frame 372 (alternatively, each adjacent pair of pixels is added and averaged to provide a single number, again reducing the total number of pixels to 64x64).
  • this process is reasonable since the width of the eyes is at least V 2 the width of the user's face. Nevertheless, further compression can be obtained by utilizing every third pixel (or averaging three adjacent pixels) to obtain a larger image area in the frame 372. Note that the compression in the width and length dimensions need not be the same.
  • Framing of the information in Figure 11 can occur in several ways. Most cameras image at 30Hz so that image motion is smooth to the human eye. In one embodiment, one frame 370 is taken in between each frame 372, to minimize data throughput and processing; and yet to maintain dual processing of the two zones imaged in Figure 11. Alternatively, both frames 370, 372 are processed concurrently since frame 376 is typically the 30Hz frame.
  • Figure 11 also illustrates how framing can occur around the user's eyes 371 to acquire "blink" information to reset cursor control.
  • a blink detected by the user's eyes in frame 370 can be used to (a) disable or enable control of cursor or scene movement based upon user control, or (b) simulate pick-up and replacement of the computer mouse (i.e., reinitializing movement in a particular direction).
  • a system of the invention can disable human motion following control such as described herein.
  • Blinking can also be used to continue motion in a particular direction. For example, movement of the cursor can be made to follow movement of the user's head, as described above.
  • a blink can thus also serve to reposition the head back to a normal starting position so that further movement in the desired direction can be made.
  • Figure 12 illustrates a similar capture of a user's eyes 400, in accord with the invention.
  • a frame 402 can thus be acquired by a camera of the invention.
  • Figure 12 illustrates a similar capture of a user's eyes 400, in accord with the invention.
  • a frame 402 can thus be acquired by a camera of the invention.
  • FIG. 12 A illustrates further detail of one representative frame 402, illustrating that the user's pupils 404 are also captured.
  • Figures 3 and 4 describe certain algorithms of the invention that are also applicable to motion of the user's pupils 404, as illustrated by left and right motion 406 and up and down motion 408. Accordingly, by zooming in on the user's eyes, another movement zone is created that causes repositioning of the cursor or the scene view based upon the movements 406, 408, much like the head movement described and illustrated in Figures 1-4.
  • Figures 1-4 and 12-12A can be combined within a two zone movement system so that, for example, both head motion and pupil motion can be evaluated for image motion.
  • the cursor and/ or scene view can be repositioned, therefore, based upon movements from both zones.
  • repositioning of items within the display e.g., the cursor and/ or scene view
  • the user moves his head, but not his eyes, he is focussed on the game and intends rotation of the scene view, in another example.
  • Other combinations are also possible.
  • Cameras of the invention can also include zoom optics which (a) reduce or enlarge the image frame captured by a particular camera, or which (b) provide autofocus capability.
  • Figure 13 shows one system 430 constructed according to the invention.
  • a camera 432 includes camera electronics 432a and a zoom attachment
  • the system 430 can isolate the user's eyes, such as described herein, and command the camera
  • the feedback electronics can also command motion of the camera to change its boresight alignment (i.e., to change where the camera image is centered) by commanding movement of the camera when resting on one or more linear drives 438, as known in the art.
  • processing section 440 operates to detect user motion and to communicate difference information to the user's computer, as described above.
  • the system 430 of Figure 13 can also be used to process user motion based upon motion towards and away from the camera.
  • Figure 14 illustrates such a system, including a camera 450 with autofocus capability to find the best focus 452 relative to a user 454 within the field of view 456.
  • the camera 450 provides a signal 450a to the image interpretation and feedback electronics 434, Figure 13, which indicates where the user is along the "z" axis from the camera 450 to the user 454.
  • This signal 450a is thus used much like the other motion signals described herein, to move the cursor and/ or scene view in response to such movements.
  • Figure 14A illustrates a representative scene view 462 when, for example, the user is at best focus 452.
  • the scene view 462 includes a house image 464 with a door 465.
  • the house and door 464', 465' of the scene view 462' enlarge, since the user moved closer to the camera 450.
  • Such a motion might reveal, for example, additional objects within the house, such as illustrated by object 466, Figure 14B.
  • the autofocus feature of the invention provides yet another degree of freedom in motion control, in accord with the invention.
  • Image data, manipulation, and human interface control can be improved, over time, by using neural net algorithms.
  • a neural net update section 435 can for example couple to the feedback electronics 434 so as to assimilate movement information and to improve data transmitted to the host computer, over time.
  • Use of neural nets are known in the art.
  • Figure 15 illustrates a frame of data 490 used in accord with the invention to implement a simplified left, right, up, down movement algorithm to control cursor movement and/ or scene view movement.
  • Frame 490 is captured by a camera of the invention; and preferably the camera incorporates autofocus, as described above, to provide a crisp image of the user 492 regardless of her position within the camera's field of view.
  • image frame 490 provides very sharp edges to the user's face, including a left edge 494a, right edge 494b, and chin 494c. These edges need only approximate vertical or horizontal position. Movement of the user results in movement of the edges 494, such as shown in Figure 15A.
  • Figure 15A shows that once such edges are acquired, they conveniently permit subsequent movement analysis and control of scene view and/ or cursor position. Specifically, Figure 15 A shows movement of the user's "edges" from 494a-c to 494a-c', indicating that the user moved left (as viewed from the camera's position) and that her chin raised slightly, indicating that an upward tilt of the head. This information is assessed by the process sections as discussed above and relayed to the host computer as difference information to augment or provide cursor and/ or scene movement in response to the user's movement.
  • edge movements roughly correspond to movement along rows and columns of the detector array.
  • Detected movement from one row to another (or one column to another) can readily calculate the actual motion of the user from information of the user's best focus position and from the focal length of the camera's lens. This information may then be used to set the magnification of movement of items in the computer display (e.g., cursor and/ or scene view).
  • Figure 16 illustrates an image of one object 500 used in accord with the invention to provide image manipulation in response to motion of the object.
  • the object 500 is held by the user 501 to manipulate motion of his cursor 502 and/ or scene view 504 on his computer display 506.
  • the object 500 is used because it exhibits an optical shape that is easily recognized through image correlation (such as matched filtering).
  • a camera 510 is used to image the object 500; and frames of data are sent to the frame processor 512.
  • the processor 512 determines image position - relative to a starting position - and thereafter communicates difference information to the user's computer 505 along data line 514.
  • the difference information is used by the computer's CPU and operating system to reposition items on the display 506 in response to motion of the object 500.
  • Almost any motion, including rotation, tilting and translation are accomplished with the object 500 relative to a start position.
  • This start position can be triggered by the user 501 at the start of a game by commanding that the camera 510 take a reference frame (“REF") that is stored in memory 513.
  • the user 501 commands that REF imagery be taken and stored through the keyboard 505a, connected to the computer 505, which in turn commands the processor 512 and camera 510 to take the reference frame REF.
  • Motion of the object 500 is thus made possible with enhanced accuracy by comparing subsequent frames of the object 500 with REF.
  • motion of rotation, tilt or translation are detected (for example, by using the techniques of Figures 2-4, 8-
  • the techniques of the invention permit control of the scene view and/ or cursor on a computer screen by motion of one or more parts of the user's body. Accordingly, as shown in Figure 17, complete motion of the user 598 can be replicated, in the invention, by correlated motion of an action figure 599 within a game.
  • user 598 is imaged by a camera 602 of the invention; and frames from the camera 602 are processed by process section 604, such as described herein.
  • the user 598 is captured and processed, in digital imagery, and annotated with appropriate user segments, e.g., segments 1-6 indicating the user's hands, feet, head and main body. Motion of the segments 1-6 are communicated to the host computer 606 from the process section 604.
  • the computer's operating system then updates the associated display 608 so that the action figure 599 (corresponding to an action figure within a computer game) moves like user 598. Accordingly, user motion of action figure 599 is made by the user 598 by performing stunts (e.g., striking and kicking) that he would like the action figure 599 to perform, such as to knock out an opponent within the display 608.
  • stunts e.g., striking and kicking
  • icons can be used to simplify image and motion recognition of user segments such as segments 1-6.
  • a star-shaped object on her hand e.g., segment 1
  • that star symbol is more easily recognized by algorithms such as described herein to determine motion.
  • the hand of user 598 can be covered with a glove that has an "+" symbol on the glove. That "+” symbol can be used to more easily interpret user motion as compared to, for example, actually interpreting motion of the user's hand, which is rounded with five fingers.
  • user 598 can wear a article of clothing such as shirt 598a with a "+" symbol 598b; and the inventio can be used to track the icon "+" 598b with great precision since it is a relatively easy object to track as compared to actual body parts. It should be apparent to those in the art that icons such as symbol 598b can be painted or pasted onto the individual too to obtain similar results.
  • Figure 18 illustrates a two camera system 700 used to determine translation and rotation.
  • the forward viewing camera 702 observes the user's face 703 and determines the right-left ( ⁇ xi) and up-down ( ⁇ yi) translation of the user's face 703.
  • the top viewing camera 704 observes the top of the user's face or head 705 and determines the right-left ( ⁇ x 2 ) and forward-backward motion ( ⁇ y 2 ) of the user's face or head.
  • the two cameras 702, 704 are each processed through motion sensing algorithms 706 using the teachings above, and results are shown on the computer display 710.
  • the display 710 shows an image of the user; while the image can be, for example, an action figure or other computer object (including the computer cursor), as desired, which follows tracking motions ⁇ xi, ⁇ yi, ⁇ x 2 , ⁇ y 2 .
  • ⁇ y can be directly applied to motion control of the user's forward and reverse motion (note, these motions are illustrated as within a computer display 710 as processed by algorithms 706).
  • ⁇ xi can be directly applied to the users left-right sideways or strafe motion;
  • ⁇ yi can be directly applied to control the users up-down viewpoint, each as illustrated on display 710a.
  • the results of the difference between ⁇ x 2 and ⁇ xi can be applied to control the user's left-right turn or viewpoint.
  • the techniques of Figure 18 can be further extended to front, side and top view cameras for complete motion detection.
  • the top camera determines the user's left-right, front-back motion while the front facing camera determines the user's rotational up-down, left-right motion.
  • Figure 19 describes an algorithm to detect user eye blink.
  • the video imagery is stored into a multiple frame buffer 800.
  • the algorithm selects the current frame and a frame from the frame buffer and differences these frames using the adder 802.
  • the difference frame consists of the pixel by pixel difference of the delayed frame and the current frame.
  • the difference frame includes motion information used by the algorithms of teachings above. It also contains information on the user eye blink.
  • the frames differenced by the adder 802 are separated temporally enough to ensure that one frame contains an image of the users face with the eyes open, the other image is of the user's face with the eyes closed.
  • the difference image contains a two strong features, one for each eye. These features are spatially separated by the distance between the user's eyes.
  • the blink detect function 808 inspects the image for this pair of strong features which are aligned horizontally and spaced within an expected distance based on the variation from one human face to another and the variation in seating distance expected from user to user.
  • the recognition of the blink features may be accomplished using a matched filter or by recognition of expected frequency peaks in the frequency domain at the expected spatial frequency for human eye separation.
  • the blink detect function 708 identifies the occurrence of a blink to a controlling function to either disable the cursor motion or take some other action.
  • Figure 20 illustrates a sound re-recalibration system 800 constructed according to the invention.
  • a camera 802 is arranged to view a user, a part of a user (e.g., a hand), or an object through the camera's field of view 804.
  • a processing section 806 correlates the framing image data from camera 802 to induce movement of a scene view or cursor on the user's display 810.
  • the scene view or cursor is shown illustratively as a dot 808 on display 810; and movement 812 of the cursor 808 from position 808a to 808b represents a typical movement of the cursor or scene view 808 in response to movement within the field of view 804, as described above.
  • a re-calibration section 816 is used to reset the cursor or scene view 808 back to an initial position 808a, if desired.
  • section 816 is a microphone that responds to sound 818 generated from a sound event 818a, such as a snap of the user's fingers, or a particular word uttered by the user, to generate a signal for processing section 806 along signal line 816a; and section 806 processes the signal to move the cursor or scene view 808 back to original position 808a.
  • re-calibration section 816 can also correspond to a processing section within the processing hardware/ software of system 800 to, for example, respond to the blink of a user's eyes to cause movement of the cursor 808 back to position 808a.
  • the following Matlab source code provides non-limiting computer code suitable for use to control the cursor on a display such as described herein.
  • the Matlab source code thus provides an operational demonstration of the concepts described and claimed herein.
  • the Matlab source code is platform independent and needs only a sequence of input images. It includes a centroid operation on the correlation peak which is not included on the PC version (described below), providing a finer measurement on the motion in the image. More particularly, the centroid operation provides a refinement on locating the correlation peak.
  • the PC code discussed below, uses the pixel location nearest the correlation peak while the centroiding operation improves the resolution of the peak location to levels below a pixel.
  • % % This following script file reads in a sequence of images of a computer user's face. % It then processes the image sequence using the methods of difference % frame correlation processing used for a human-computer interace. % This code includes a centroiding operation and demonstrates the % difference frame correlation approach.
  • the following PC source code labeled videoMouseDlg.doc and videoMouseDSP.doc, provide non-limiting and nearly operable DSP code for control of the cursor, as described herein.
  • the code is not smooth; and there are other files required to compile this code to an executable, as will be apparent to those skilled in the art, including header files (*.h), resource files and compiler directives.
  • float complexMatrixl [FFTSIZE] [FFTSIZEx2]; /* Input matrix */ float complexMatrix2 [FFTSIZE] [FFTSIZEx2]; /* Input matrix */ float correlationMatrix[FFTSIZE][FFTSIZEx2]; long previousFrame[FFTSIZE] [FFTSIZE] ; float *p_localRam,*fPtrl,*fPtr2; float correlationPeak; float *block0 (float *)BLOCK0, *mml [FFTSIZE], *mm2 [FFTSIZE], *mm3 [FFTSIZE];
  • PROCESSINGINFO Imagelnfol PROCESSINGINFO ImageInfo2; LONG lErrorStatus;
  • ong peakRow ong peakCol; ong row; ong col; ong pixel; ong index 1; ong index2; ong index3;
  • lErrorStatus P_SUCCESS
  • ulPCData APPLICATION_RUNNING
  • lFifoStatus P_EMPTY
  • DDFJSRSetllOFO P_INTERRUPT_USER_MASK, (VOID *) DBU Appl Interrupt
  • G_lApplUserMaskIntCount 0; /* */
  • DDK_PKTSend P_PACKET_USER_INTERFACE, &ulValue, IL, P WAITFORCOMPLETE
  • DDK_PKTInterfaceStatus P_PACKET_USER_INTERFACE, &lFifoStatus, &!OutputFifoStatus
  • DDK_PKTInterfaceStatus P PACKETJJSERJ TERFACE, &lFifoStatus, &lOutputFifoStatus; ⁇ /* End while. */
  • CVideomouseDlg& pcdd *(reinterpret_cast ⁇ CVideomouseDlg*>(pclass)); CString dataString;
  • DPK_XCCPushOpcode P_ID_USER_FUNCTIONl , P_PCOU T_USER_FUNCTIONl );
  • DPK_XCCPushLong ((unsigned long) pcdd.mAnputImageNumber2); /* 2 */ DPK_XCCPushLong ((unsigned long) pcdd.m_inputImageNumberl); /* 1 */
  • detectx detectx-FRAMESIZE
  • detecty FRAMESIZE-detecty
  • detecty -detecty; // double multiplier
  • ptCursor.x- (long) detectx
  • ptCursor.y- (long) detecty
  • DPK_XCCSetWaitMode P_WAIT_COMPLETE
  • DPK_EndPCK DPK_EndPCK (); AfxMessageBox("Exited Thread”);
  • DPK_XCCSetWaitMode P_WAIT_COMPLETE
  • DPK EndPCK DPK EndPCK (); AfxMessageBox("Exited Thread”);
  • CAboutDlg: :CAboutDlg() CDialog(CAboutDlg::IDD)
  • CDialog :DoDataExchange(pDX); // ⁇ ⁇ AFX_D ATA_MAP(CAboutDlg) // ⁇ ⁇ AFX JD ATA_MAP
  • CDialog :DoDataExchange(pDX); // ⁇ ⁇ AFX_DATA_MAP(CVideomouseDlg) DDX_Control(pDX, IDC_FRAMENUMBER, m_frameNumber); DDX_Control(pDX, IDC_AVERAGE, m_average); // ⁇ ⁇ AFX_D ATA_MAP
  • ON_WM_PAINT ON_WM_QUERYDRAGICON() ON_BN_CLICKED(IDC_ENABLE, OnEnable) ONJ8N_CLICKED(LDC_STOP, OnStop) // ⁇ ⁇ AFX_MSG_MAP
  • CDialog :OnInitDialog()
  • IDM_ABOUTBOX must be in the system command range.
  • ASSERT((IDM_ABOUTBOX & OxFFFO) IDM_ABOUTBOX); ASSERT(IDM_ABOUTBOX ⁇ OxFOOO);
  • CDialog :OnSysCommand(nID, lParam);
  • CDialog :OnPaint()
  • DPK_XCCSefWaitMode P_WAIT_COMPLETE
  • m_status DBF_SetGrabWindow(P_DEFAULT_QGS, 256, 128, 176, 128);
  • ⁇ m_inputImageNumber2 m_inputImageNumberl + 1;

Abstract

A human motion following controller (10) is provided by the invention to augment motion of items (e.g., computer cursor or scene view) shown on a computer display. The display (14) is coupled to the computer (20) which controls positioning of the items through operating system controls. A camera (16) captures frames of data corresponding to a first image of at least part of a user (e.g., eyes, hands) at the computer display. Signal processing electronics (18) coupled to the camera (a) detects differences between successive frames of data corresponding to motion of the first image, and (b) communicates differences information to the computer to reposition display of the items through the operating system controls. The items are thus repositioned on the display by an amount corresponding to the motion of first image.

Description

HUMAN MOTION FOLLOWING COMPUTER MOUSE AND GAME
CONTROLLER
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of any one of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Related Applications
This application is a continuing application of commonly-owned and co- pending U.S. provisional application number 60/070,512, filed on January 6, 1998, and U.S. provisional application number 60/100,046, filed on September 11, 1998, each of which is incorporated herein by reference.
Background
The primary human interfaces to today's computer are the keyboard, to enter textual information, and the mouse, to provide control over graphical information. These interfaces help users with word processing, presentation software, computer aided design packages, spreadsheet analyses, and other applications. These interfaces are also widely used for computer gaming entertainment; though they are often augmented or replaced by a joystick.
In daily use of business software applications, control of cursor position on the screen requires that the user remove his/her hand from the keyboard in order to use the standard mechanical mouse. The use of the mouse introduces several issues. In a desk environment, the mouse requires maintenance of space on the desk area. The mouse cord must also remain free from obstruction to facilitate movement. Additionally, the use of the mouse is a major contributing factor of carpal-tunnel syndrome. It would be advantageous therefore to find an alternative to the mechanical mouse.
In computer gaming, game complexity generally requires control of the (i) mouse and keyboard, or (ii) joystick and keyboard. Further, gaming applications usually require control in several axes of motion, including forward motion, reverse motion, left turn, right turn, left strafe (slide), right strafe, upward motion, downward motion. To further complicate game maneuvers and control, many games permit viewing (within the game environment) in directions different from that in which the vehicle (e.g., the car, or person, simulated within the game) is moving, including up, down, left and right. These many complexities of motion in- fact increase or modify the complexity and enjoyment of the game.
Nevertheless, these complexities require that the user have utmost dexterity and control of his/her body. One object of the invention, therefore, is to offer alternative approaches to human-computer interfaces for those incapable of using standard devices (e.g., mouse, keyboard and joystick) such as due to disability.
Another object of the invention is to provide an alternative input device for laptop computers. Laptop computers are used in locations which do not allow the use of a mouse, in airplanes or during business meetings in which there is no room to operate the mouse. Through the use of either a clip on camera or a camera built into the laptop display, the laptop user can control the mouse position or use the camera for teleconferencing while on the road.
Other objects of the invention are to replace or augment existing human computer interfaces to facilitate enhanced gaming and/ or control within game environments.
In the prior art, certain systems exist which attempt to reduce the amount of physical interaction required with game controllers. However, such systems are prohibitively expensive to the general public as their costs are driven by techniques and algorithms which detect user head motion based upon a detectable target worn by the user. Other costly and cumbersome systems require the user to wear apparatus which emits or detects a signal. It is thus one other object of this invention to provide a system which detects user motion without the aid or augmentation of artificial devices placed on the user operator.
Another object of the invention is to provide a means of human control of a graphical computer interface through the physical motion of the user in order to control the activity of a cursor in the manner usually accomplished with a computer mouse.
A further object of the invention is to provide additional degrees of freedom in the human computer interface in support of computer games and entertainment software.
Yet another object of the invention is to provide dual use of teleconferencing and video electronics with gaming and computer control systems.
These and other objects will be apparent in the description which follows.
Summary of Invention
As used herein, "cursor" means a computer cursor associated with a computer screen. "Scene view" means the view presented on a computer display to a user. For example, one scene view corresponds to the scene presented to a user during a computer game at any given moment in time. The game might include displaying a scene whereby the user appears to be walking in a forest, and through trees. In another example, a cursor might also be visible in the scene view as a mechanism for the user to select certain events or items on the scene (e.g., to open a door in a game, or to open a folder to access computer files). As used herein, "camera" refers to a solid state instrument used in imaging.
Typically, the camera also includes optical elements which refract light to form an image on the camera's detector elements (typically CCD or CMOS). For example, one camera of the invention derives from a video-conferencing camera used in conjunction with Internet communication.
In one aspect, the invention provides systems and methods to control computer cursor position (or, for example, the scene view or game position as displayed on the computer display) by motion of the user at the computer. A camera rests on or near to the computer, or built into the computer, and connects therewith to collect "frames" of data corresponding to images of the user. These images provide information about user motion, over time. Software within the computer assesses these frames and algorithmically adjusts cursor motion (or scene view, or mouse button, or some other operation of the computer) based upon this motion. The motion may be imparted by up-down or left-right motion of the user's head, by the user's hands, or by other motions presented to the video camera (such as discussed herein). In one aspect, a close up view of the users facial features is used to impart a translation in the cursor (or scene view) even through the features in fact rotate with the user's head. In yet another aspect, the rotation is used to generate a corresponding rotation in computer game scene imagery.
In one aspect, the invention also provides a human factors approach to cursor movement in which the user's rate of motion determines the relative motion of the cursor (or scene view). By way of example, the faster the user's head travels over a set distance, the further the corresponding cursor movement over the same time period.
In other aspects of the invention, the camera is either (a) a visible light camera utilizing ambient lighting conditions or (b) a camera sensitive in another band such as the near infrared ("IR"), the IR, or the ultraviolet ("UV") spectrum. In the latter case (b), the illumination preferably emanates from a source such as an IR lamp which is beyond human sensory perception. The sensor is typically mounted facing the user so as to capture a picture of the user's face in the associated electromagnetic spectrum. The lamp is typically integrated with the camera housing so as to facilitate production and ease of consumer set-up.
In one aspect, a system of the invention provides an IR camera (i.e., a camera which images infrared radiation) to image the user's face and to gauge the user's stress level associated with a game on the computer. As the user's intensity increases (such as in a fast moving computer game using a joystick or the methods discussed herein), the system detects increased heat intensity on the user's face, forehead or other body part by the imagery of the IR camera. This information is fed back into the game processor to provide further enhancement to the game. In this manner, the system gauges the user's reaction to the game and modifies game speed or operation in a meaningful way. For example, suppose such a system determined that a particular user was bored of the present game speed (a determination of boring can be made by assessing low IR output over large portions of the user's face). The computer processor and game software can then cooperate to increase the gaming speed and thereby increase this particular user's stress. Games of the invention are thus made and sold to users with varying intelligence, age and/ or computer familiarity; and yet the system always "pushes the envelope" for any given user so as to make the game as interesting as possible, automatically.
In accord with one aspect of the invention, images captured by the sensor are processed by a digital signal processor ("DSP") located either (a) in a PC card within the host computer or (b) in a housing integrated with the sensor. In case (a), sensor frames are sent to the PC card; and detected user motion (sometimes denoted herein as "difference information") is communicated to the user's operating system via a PCI (or USB or later standard) bus interface. These difference information commands are interpreted by a low overhead program resident at the user's main processor, which either updates the cursor position on the screen or provides motion information to the user's computer game (e.g., so as to change the scene view). In case (b), the DSP is contained within the camera housing; and frames are processed local to the camera to determine difference information. This information is then transmitted to the computer by a cable that connects to a bus port of the computer so that the host processor can make appropriate movements of the cursor or scene view. In another aspect, the DSP is mounted in the camera housing such that the camera/ signal processing subsystem produces signals which emulate the mouse via the mouse input connector.
In an alternative configuration, frames of image data are sent directly to the host computer through the computer bus; and that image data is manipulated by the computer processor directly. With increasing computer processing speed, it is expected that sensor data frames can be sent directly to the host processor for all processing needs, in which case the PC card and/ or separate DSP are not required. Although this is possible today, the update rates are likely too slow for practicality. Once GHz processors are on the market, a separate DSP may no longer be needed.
In one aspect of the invention, pixel format or pixel density of the camera drives the accuracy of the system. Higher pixel density in the image of the user's face, for example, increases the attainable resolution and cursor control (or the attainable control of scene view motion). Camera formats of 240 vertical by 320 horizontal generally provide satisfactory performance. The number of pixels that may be utilized is determined by system cost factors. Greater numbers of pixels require more powerful DSPs (and thus more costly DSPs) in order to process the image sequences in real time. Current technology limits the processing density to a 64x64 window for consumer electronics. As prices are reduced, and power increases, the densities can increase to 128x128, 256x256 and so on. While 64x64 density is satisfactory for general household users, a higher fidelity system using a greater number of pixels is possible, in accord with the invention, for higher end applications at a proportionally higher cost. Non-square pixel formats are also possible in accord with the invention, including a 64x128 detector array size. In one aspect, the data transfer rate from the camera is 30 frames/ second at 240x320 pixels per frame. Assuming eight bits per pixel, the digital data transfer rate is therefore 18.432 megabits/ second. This is a fairly high transfer rate for consumer products using current technology. While the data transfer can be either analog or digital, the preferred method of image data transfer for this aspect is via a standard RS170 analog video interface.
In accord with one aspect, a system of the invention defines two imaging zones (either within a single camera CCD or within multiple CCD cameras housed within a single housing). One imaging zone covers the user's head; and the other covers the user's eyes. This aspect includes processing means to process both zones whereby movement of the user's head provides one mechanism to control cursor movement (or scene view motion), and whereby the user's eyes provide another mechanism to control the movement. In essence, this aspect increases the degrees of freedom in the control decision making of the system. By way of example, a user might look left or right within a game without moving his head; but by assessing movement of the user's eyes (or the pupils of those eyes), the scene view can be made to rotate or translate in the manner desired by the user. Further, a user might move his head for other reasons, and yet not move her eyes from a generally forward looking position; and this aspect can assess both movements (head and eyes) to select the most appropriate movement of the cursor or scene view, if any.
In another aspect, a system of the invention utilizes a camera with zoom optics to define the user's pupil and to make cursor or scene views move according to the pupil. In another aspect, the system incorporates a neural net to "learn" about a user's eye movements so that more accurate movements are made, over time, in response to the user's eye movement.
In still another aspect, a neural net is used to learn about other movements of the user to better specify cursor or scene view movement over time. In yet another aspect of the invention, a system is provided with two CCD arrays (either within a single camera body or within two cameras). The arrays connect with the user's computer by the techniques discussed herein. One CCD array is used to image the user's head; and the other is used to image the user's body. Motion of the user is then evaluated for both head and body movement; and cursor or scene view movement is adjusted based upon both inputs.
In another aspect of the invention, a single CCD is used to image the user. However, alternate frames are zoomed, electronically, so that one frame views the user's head, and the next frame views the user's eyes. With the algorithm discussed herein, these separate frame sequences (one for the eyes, one for the head) are processed separately and evaluated, together, to make the most appropriate cursor or scene view movement. If for example, the system clocks at 30Hz, then one set of frame sequences operates at 15Hz, and the other at 15Hz. However, the advantage is that two movement information sets can be evaluated to invoke an appropriate movement in the cursor or scene view.
Those skilled in the art should appreciate that different frame rates can be used; and frame rates for either sequence (head or eyes) can occur at different rates too. Further, the separate frame sequences can utilize other body parts, e.g., the head and the hand, to have two movement evaluations. Alternatively, a separate camera (or CCD array) can be used to image other body parts, for example one camera for the head and one for the hand.
The invention also provides methods for shifting cursor or scene views in response to user movement. In one aspect, the scene view shifts left or right when the user shifts left and right. In another aspect, the scene view rotates when the user's head rotates. This last aspect can be modified so that such rotation occurs so long as the eyes do not also rotate (in this situation, the user's head rotates, indicating that she wishes the scene view to rotate; but the eyes do not, indicating that she still watches the game in play). In another aspect, the scene view rotates in response to the user's hand rotation (i.e., a camera or at least a CCD array of the system is arranged to view the player's hand).
In another aspect, the invention provides a multi-zone player gaming system whereby the user of a particular computer game can select which zone operates to move the cursor or the scene view. By way of example, the system can include one zone corresponding to a view of the user's head, where frames of data are captured by the system by a camera. Another zone optionally corresponds to the user's hand. Another zone optionally corresponds to the user's eyes. Each zone is covered by a camera, or by a CCD array coupled within the same housing, or by optical zoom zones within a single CCD, or by separate optical elements that image different portions of the CCD array. By way of example, two zones can be covered with a single CCD array (i.e., a camera) when the zones are the user's head and eyes. The camera images the head, for one zone, and images the eyes in another zone, since the zones are optically aligned (or nearly so). However, two cameras (or optionally two CCD arrays with separate optics) can view two zones such as the user's head and the user's hand. Combinations of zones is also possible and envisioned in accord with the invention.
Zones in a single camera can also be identified by the computer by prompting the user for motion from corresponding body parts. For instance, the computer identifies the head zone by prompting the user to move his head. Then the computer identifies the foot zone by having the user move his foot. Once the zones are identified, the motion of each of these individual zones is tracked by the computer and the regions of interest in the camera image related to the zones moved as the targets in the zones move with respect to the camera.
In one aspect, the invention provides a system, including a camera and edge detection processing subsystem, which isolates edges of the user's body, for example, the side of the head. These edges are used to move the cursor or scene view. For example, if the left edge of head is imaged onto column X of one frame of the CCD within the camera, and yet the edge falls in column Y in the next frame, then a corresponding movement of the cursor or scene view is commanded by the system. For example, movement of the edge from one column to the next might correspond to ten screen pixels, or other magnification. In one aspect, this magnification is selected by the user. Up and down motion can also be detected by similar edge detection. For example, by imaging the user's chin, an edge movement in the up or down dimension is formed (e.g., if the bottom edge of the chin moves from one row to the next, in adjacent frames, then a corresponding movement of the cursor or scene view is made - magnification again preferably set manually with a default starting magnification). Other images can also serve to define edges. For example, in one aspect, a user's eyelash can be used to move the cursor (or scene view) up or downwards; though typically the eye blink is used to reset the cursor command cycle.
In one aspect, an optical matched filter is used to center image zones onto the appropriate images. For example, as discussed above, one aspect preferably utilizes 64x64 pixels as the image frame from which cursor motion is determined. Many cameras have, however, many more pixels. These 64x64 arrays are therefore preferably established through matched filtering. By way of example, an image of a standard pair of user's eyes is stored within memory (according to one aspect of the invention). This image field is cross-correlated with frames of data from the actual image from the camera to "center" the image at the desired point. With eyes, specifically, ideally the 64x64 sample array is centered so as to view both eyes within the 64x64 array. Similarly, to process sequences of head data, a standard head image is stored within memory, according to one aspect, and correlated with the actual image to center the head view.
Those skilled in the art should appreciate that an appropriate frame size can be established from an image having more or fewer pixels, by redundantly allocating data into adjacent pixels or by eliminating intermediate pixels, or similar technique. In another aspect, a camera is provided which optically "zooms" to provide optimal imaging for a desired image zone. By way of example, the invention of one aspect takes an image of the user's head, determines the location of the user's eyes (such as by matched filtering), and optically zooms the image through movement of optics to provide an image of the eyes in the desired processing size format.
Many aspects of the invention are preferably enhanced by autofocus.
Specifically, it is often desirable to have a crisp image of the user (or a part of the user, e.g., the user's eyes) in order to accurately process desired cursor or scene view movement. Thus, autofocus capability preferably operates in most of the aspects of the invention where imaging is a feature of the processing.
In one aspect, the camera utilizes a very small aperture which results in a very large depth of field. In such a situation, autofocus is not required or desired. The optical requirements for the lenses are also reduced.
The invention thus provides several advantages over the art. For example, game controllers can now include feedback corresponding to the user's actual movement. By way of another example, if the user moves left or right (or head or hand or eyes move left or right, depending on the image zone), then the cursor (or scene view) can also be set to move left or right. When the user twists her head, for example, the scene view can also be made to rotate, reflecting that movement.
Those skilled in the art should appreciate that the direction in which the scene moves, left or right, is a matter of design choice. That is, certain games might find it desirable to move the opposite direction from what the user moves, to add certain challenges to the game. Further, in other aspects, this direction can change during the game to further complicate game control.
In accord with one aspect of the invention, a processing subsystem (connected with the camera) is used to make cursor movement (or scene view movement) correspond to user's motion. This processing subsystem of another aspect further detects when the user twists his head, to add an additional dimension to the movement.
In one aspect, a system of the invention includes an IR detector which is used to determine when a person sweats or heats up (by imaging, for example, part of the user's head onto the IR detector); and then the system adjusts game speed in a way corresponding to this movement. Alternatively, a heartbeat sensor is tied to the person to sense increased excitement during a game and the system speeds or slows the game in a similar manner. Note that a heartbeat sensor can be constructed, in one aspect of the invention, by thermal imaging of the user's face, detecting blood flow oscillations indicative of heartbeat. In other aspects, the heartbeat sensor is physically tied to the user, such as within the computer mouse or joystick.
In one aspect, a computer of the invention adapts to user control as selected by a particular user. For example, in the case of a handicapped person, a particular user might select certain hand-movements, e.g., a single finger up, to move the cursor up; and another finger down to move the cursor left. An infinite combination of controls can be established; however this is one advantage of the invention in that users with many different disabilities can program cursor or scene view movement. In one aspect, a neural network is used to assist the processing system in establishing proper cursor movement. In another aspect, the computer for example learns to print something by movement of the user's finger (or other body part).
In one aspect, tipping of the user's head (or other body part, or object) is used to provide another degree of freedom in moving the cursor or adjusting the scene view. By way of example, a tilt of the head, as imaged by the camera, can be set to command a rotation of the scene view.
In still another aspect, a camera of the invention uses autozoom to move in and out of a given scene view. By way of example, the camera is first focussed on the user's face in one frame; but in a subsequent frame the camera must focus to closer to compensate for the fact that the user moved closer to the camera (typically, the camera is on the monitor, so this also means that the user moved closer to the scene view). This autofzoom is used, in one aspect, to make the scene view appear as if the user is "creeping" into the scene. By moving the scene in and out, the user will perceive that he is moving in or out of the scene view.
In another aspect, a camera images an object held by the user. Preferably, the object has a well-defined shape. The system images the object and determines difference information corresponding to movement of the object. By way of example, rotating the object upside down results in difference information that is upside down; and then the scene view inverts by operation of the system. In another example, twisting of the object rotates the scene view left or right, or rotates the scene in the direction of the twisting.
In another aspect, two cameras image the user: one camera pointed at the front of the users face or hand and the other down at the top of the users head or hand. The front facing camera is used to detect rotational and linear translation in up-down and left-right directions. The top viewing camera determined front-back, left right translation. The front-back translation observed by the top camera is used to control forward and back motion in the users 3-D view. The top sensed left-right translation controls the users left right slide or strafe. The top sensed left-right motion is removed from the front view left-right translation with the remaining front view measure representative of left-right twist. All of the front view up-down translation can be interpreted as up-down twist.
Brief Description of the Drawings
Figure 1 illustrates one human computer interface system constructed according to the invention; Figure 1A shows an exemplary computer display illustrating cursor movement made through the system of Figure 1;
Figure IB illustrates overlayed scene views, displayed in two moments of time on the display in Figure 1, of a shifting scene made in response to user movement captured by the camera of Figure 1;
Figure 1C shows an illustrative frame of data taken by the system of Figure 1;
Figure 2 illustrates selected functions for a printed circuit card used in the system of Figure 1;
Figure 3 illustrates an algorithm block diagram that preferably operates with the system of Figure 1;
Figure 4 illustrates one preferred algorithm process used in accord with the invention to determine and quantify body motion;
Figure 5 shows one process of the invention for communicating body motion data to a host processor, in accord with the invention, for augmented control of cursor position or scene view;
Figure 5A shows representative frame of data of a user taken by a camera of the invention, and further illustrates adding symbols to key body parts to facilitate processing;
Figure 6 illustrates a two camera imaging system for implementing the teachings of the invention; Figure 7 illustrates two positions of a user as captured by a camera of the invention; and Figure 7A illustrates two positions of a scene view on a display as repositioned in response to movement of the user illustrated in Figure 7;
Figure 8 illustrates motion of a user - and specifically twisting of the user's head - as captured by a system of the invention; Figure 8A illustrates a first scene view corresponding to a representative computer display before the twisting; Figure 8B illustrates a second scene view corresponding to a rotation of the first scene view in response to the twisting by the user; Figure 8C shows processing features of the processing section of Figure 8; and Figure 8D illustrates multiple image frames stored in memory for matched filtering with raw images acquired by the system of Figure 8;
Figure 9 illustrates a two camera system of the invention for collecting N zones of user movement and for repositioning the cursor or scene view as a function of the N movements; Figure 9A illustrates a representative thermal image captured by the system of Figure 9; and Figure 9B illustrates process methodology for processing thermal images as a real time input to game processing speed, in accord with the invention;
Figure 10 illustrates another two camera system of the invention for targeting multiple image movement zones on a user, and further illustrating optional DSP processing at the camera section;
Figure 11 illustrates framing multiple movement zones with a single imaging array, in accord with the invention;
Figure 12 illustrates framing a user's eyes in accord with the invention; and Figure 12A shows a representative image frame of a user's eyes; Figure 13 illustrates one system of the invention, including zoom, neural nets, and autofocus to facilitate image capture;
Figures 14, 14A and 14B illustrate autofocus motion control in accord with the invention;
Figures 15 and 15A illustrate one other motion detect system algorithm utilizing edge detection, in accord with the invention;
Figure 16 illustrates one other motion detect system algorithm utilizing well- characterized object maniputions , in accord with the invention;
Figure 17 illustrates one other motion detect system algorithm utilizing varied body motions, in accord with the invention;
Figure 18 illustrates a two camera system of the invention with a camera observing the user's face while the other observes the top of the user's head;
Figure 19 shows a blink detect system of the invention; and
Figure 20 shows a re-calibration system constructed according to the invention.
Detailed Description of the Drawings
Figure 1 illustrates, in a top view, certain major components of a human computer interface system 10 of the invention. A user 12 of the system 10 sits facing a computer monitor 14 with display 14a. A camera 16 is mounted on the computer monitor 14 facing the user 12. In the illustrated embodiment, the camera 16 is mounted in such a way that the user's face 12a is imaged within the camera's field of view 16a. However, as discussed herein, the camera 16 can alternatively image other locations, such as the user's hand, eyes, or on other objects; so imaging of the user's face, in Figure 1, must be considered illustrative, rather than limiting. Further, the camera location can also reside at places other than on top of the monitor 14.
With further regard to Figure 1, the camera 16 interfaces with a printed circuit card 18 mounted within the user's computer chassis 20 (which connects with the monitor 14 by common cabling 20a). The camera 16 interfaces to the printed circuit card 18 via a camera interface cable 22. The circuit card 18 also has processing section 18a, such as a digital signal processing ("DSP") chip and software, to process images from the camera 16.
In operation, the camera 16 and card 18 capture frames of image data corresponding to user movement 25. The processing section 18a algorithmically processes the image data to quantify that movement 25; and then communicates this information to the host processor 30 within the computer 20. The host processor 30 then commands movement of the computer cursor in a corresponding movement 25a, Figure 1A (Figure 1A illustrates a representative front view of the display 14a, and also illustrates movement 25a of the cursor 26 moving within the display 14a in response to user movement 25).
Figure IB illustrates an alternative (or supplemental) process whereby the scene view shifts in response to user movement 25. Specifically, Figure IB illustrates a first scene view 35a, which generally corresponds to a forest prior to the user's movement 25; and an overlayed scene view 35b (shown in dotted line, for purposes of illustration) that is shifted by an amount 37 in response to the user's movement 25. The shift 37 in the scene view 35 is accomplished by combined operation and processing of the processing section 18a and host CPU 30.
Figure 1C shows a representative frame 41 of data 43 as taken by the camera 16. As illustrated, data 43 represents the user's face 12a taken at a given moment of time. Subsequent frames (not shown) are used to determine user motion 25 relative to the frame 41, as discussed herein. The frame 41 is made up of the plurality of pixel data 45, as known in the art.
Figure 2 illustrates certain functions processed within the printed circuit board 18 of Figure 1. A camera interface circuit 50 receives video data from the camera 16 through interface cable 22. This video data can be RSI 70 format or digital, for example. For analog RS170 format, circuit 50 decodes the analog video data to determine video timing signals embedded in the analog data. These timing signals are used for control of the analog- to-digital (A/D) converter included in circuit 50 that converts analog pixel data into digital images. In the preferred embodiment, the analog data is digitized into 6-bits, though any number of bits greater may be acceptable and/ or required for features as discussed herein. For digital data format, camera interface 50 accepts the digital data without additional quantization, although interface 50 can digitally pre-process the digital images if desired to acquire desired image features.
The frame difference electronics 52 receives digital data from the camera interface circuit 50. The frame difference electronics 52 include a multiple frame memory, a subtraction circuit and a state machine controller/ memory addresser to control data flow. The frame memory holds previously digitized frame data. As each digitized pixel is received by the frame difference electronics 52, the corresponding pixel from a previous frame is read from the frame memory and subtracted from the current frame. The preferred implementation uses the frame just previous to the current frame, though an older frame which resides in the frame memory may be used. The resulting difference is output to an N-frame video memory 54. The new frame pixel data is then stored into the frame memory of the frame difference electronics 52.
The N frame video memory electronics 54 either receives differenced frames output by the frame difference electronics 52 (discussed above) or raw digitized frames from the camera interface 50. The choice of where the data derives from is made by software resident on the DSP 56. The frame video memory 54 is sized to hold more than one full frame of video and up to N number of frames. The number of frames N is to be driven by hardware and software design.
In the preferred embodiment, the DSP 56 implements an algorithm discussed below. This algorithm determines the rate of head motion of the user in two dimensions. The digital signal processor 56 also detects the eye blink of the user in order to emulate the click and double click action of a standard mouse button. In support of these functions, the DSP 56 commands the N frame video memory 54 to supply either the differenced frames or the raw digitized frames. The digital signal processor thus preferably utilizes a supporting program memory 58 made up of electrically reprogrammable memory (EPROM) and data memory 59 including standard volatile random access memory (RAM). The DSP 56 also interfaces to the PCI bus interface electronics 60 through which cursor and button emulation is passed to the user's main processor (e.g., the CPU 30, Figure 1). The PCI interface 60 also passes raw digitized video to the main processor as an optional feature. Interface 60 also permits reprogramming of program memory 58, to allow for future software upgrades permitting additional features and performance.
The PCI interface electronics 60 thus provides an industry standard bus interface supporting the aforementioned communication path between the printed circuit card 18 and the user's main processor 30.
With optional MPEG compression electronics 62, the printed circuit card 18 and camera 16 can provide compressed video to the user's main processor 30. This compressed video supports using the system 10 in teleconferencing applications, providing dual use as either human computer interface system 10 and/ or as a teleconferencing system in an economical solution to two distinct applications.
Figure 3 describes one preferred head motion block diagram algorithm 70 used in accord with the invention. Not all of the functions shown in Figure 3 are implemented in software in the DSP 56. Rather, this algorithm relies on the correlation of images from one frame to the next, and particularly relies on the use of frame differenced images in the correlation process. The frame differencing operation removes parts of the camera images that are unchanged from the previous frame. For example, room background (such as object 13, Figure 1) behind the user 12 is removed from the image. This greatly simplifies detection of feature motion. Even the image of the user's face image consists of regions of uniform illumination such that even with the user's facial motion, these uniform regions (i.e. cheeks, forehead, chin) may also be removed. The user's face 12a also consists of typically dynamic features such as the nose, eyes, eyebrows and mouth, each of which typically has enough spatial detail that will be evident in the differenced image. As the user moves his face with respect to room lighting, the shape and distribution of these features will change; but the frame rate of the camera 16 ensures that these features look similar from one frame to the next. The correlation process therefore operates to determine how these differenced features are moving from one frame to the next in order to determine user head motion 25.
The algorithm of block diagram 70, Figure 3, receives video images 72 of the user as imaged by camera 16 over time. Each received image is passed to both a frame memory 74 and a differencer 76. Though the preferred embodiment is to buffer a single frame in memory 74, the memory 74 may optionally store many frames, buffered such that the first frame input is the first frame output. The delayed frame is read from the frame memory 74 and subtracted from the current frame using the differencer 76. Frame output from the differencer 76 is provided to both a correlation process 78 and a difference frame memory 80.
Like the frame memory 74, the preferred embodiment of frame memory 80 utilizes a single difference frame; however the difference frame memory 80 can hold many difference frames in sequence for a finite time period. The delayed difference frame is read from the difference frame memory and provided to the correlation function 78. Difference frames are preferably selectable by system algorithms. The correlation process 78 determines the best combination of row and column shifts in order to minimize the difference between the current difference frame and the delayed difference frame. The number of rows and columns required to align these difference images provides information as to the user's motion.
The best-fit function algorithm 82 determines the row and column shift to provide optimal alignment. In the case of a classical correlation process, the best-fit function can consist of a peak detect algorithm. This algorithm may either be implemented in hardware or in software.
The best-fit function algorithm determines relative motion in rows and columns of the observed user's features. The cursor update compute function algorithm 84 translates this measured motion into the position change required of the cursor (e.g., the cursor 26, Figure 1 A). Typically, this is a non-linear process that, with greater head motion, the cursor moves a non-proportionally greater distance. For example a 1-pixel user motion can cause the cursor to move one screen pixel while a 10-pixel user motion may cause a 100-pixel screen cursor motion. However, these magnifications can be adjusted for desired result. This algorithm may either be implemented in hardware or in software such as through an ASIC or FPGA.
Video cursor control 86 provides a user interface to enable and disable the operation of cursor control described above. This control is implemented, for example, through a combination of keystrokes on the user's keyboard (for example as connected to the host computer 20, Figure 1). Alternatively, cursor control is activated or deactivated by sensing the eye-blink of the user (or some other predetermined movement). In this alternative embodiment, an output signal 85 from the correlation section 78 is sent to the video enable section 86; and the output signal 85 corresponds to blink data from the user's face 12a (Figure 1A). In another embodiment, the video cursor control section 86 activates or deactivates cursor control by recognizing voice commands. A microphone 87 detects the user's voice and a voice recognition section 89 converts the voice to certain activate or deactivate signals. For example, the section 89 can be set to respond to "activate" as a voice command that will enable cursor control; and "deactivate" as a command that disables cursor control.
The functionality of the video cursor control 86 provides the user with the equivalent of a mouse pick-up, put-down action. As the user moves the cursor from left to right across the screen, the user would de-activate motion-based cursor control in order to allow the user to move his head back to the left. Once the user has recentered his head, the user would once again activate the cursor control and continue to move the cursor about the screen. The activation/ deactivation of the mouse input is represented by the switch 90, such that the open position of the switch disables human motion control of the cursor and supplies a zero change input to the summation operation 92 in such conditions.
Those skilled in the art should appreciate that control of scene view may also be implemented by an algorithm such as shown in Figure 3. Specifically, a similar algorithm can provide movement of the current scene view, in accord with the invention.
With video cursor control enabled, the result of the cursor update compute function 84 is added to the known current cursor position at the summing operation 92. This summation has a x component and a y component. The result of the summation 92 is used to update the cursor position (or scene view) on the user's screen via the user's operating system. Cursor position may thus be controlled by both user motion as well as the motion imparted by another input device such as a standard computer mouse.
Figure 4 provides a detailed description of the preferred implementation of the algorithm described in functions 73, 76, 78, 80 and 82 of Figure 3. Video data is received by the processing electronics in both a single frame memory 100 and a differencer 102. The output of the frame memory 100 is also provided to the differencer 102 such that the previous frame is subtracted from the current frame. This differenced frame is than processed by a two dimensional FFT 104. The complex result of the FFT 104 is provided to a complex multiplier 106 and a complex memory 108. The complex memory 108 is the size of the processed image, each location containing both a real and imaginary component of a complex number. With each new FFT operation 104, the previous FFT result, contained in the complex memory 108, is provided to the conjugate operation 110. The complex conjugate of each element is computed and provided to the complex multiplier 106. In this manner, the FFT of the previous frame difference is conjugated and multiplied against the FFT of the current difference image.
By way of comparison between Figures 3 and 4, item 76 has similar functionality to item 102; item 78 has similar functionality to items 104, 106, 108, 110, 112; item 80 has similar functionality to item 108; and item 82 has similar functionality to item 114.
The two dimensional array of complex products output by the complex multiplier 106 is provided to a two dimensional inverse FFT operation 112. This operation creates an image of the correlation function 114 between the latest pair of difference images. The correlation image is processed by the peak detection function 114 in order to determine the shift required in aligning the two difference images. The x-y magnitude of this shift is representative of the user's motion. This x-y magnitude is provided to the software used to update the cursor position as described in Figure 3.
Figure 5 shows an algorithm process 130 of the invention and which applies motion correlation operations over sub-frames of the video image. This allows motions of various body parts to convey input with specialized meaning to applications operating on the host computer. In addition to head motion, motion of the hands, arms and legs provide for greater degrees of freedom for the user to interact with the host application (e.g., a game). Commands of this type are useful in combative games where computer animated opponents fight under control of the user. In that instance, the hand, arm and leg motions of the user become punch, chop and kick commands to the computer after process 130. This command mode can also be used in situations where the user does not have ready access to a keyboard, to augment cursor control of the previously described head position correlator.
Process 130 identifies the functions required to derive commands from general motions of the user's body. The scene analyzer function 132 receives digitized video frames from the camera (e.g., the camera 16 of Figure 1) and identifies sub-frames within the video for tracking various parts of the user's body. The frame difference function 134 and correlator function 136 provide similar functions as processes 74, 76 and 78 of Figure 3. The correlation analyzer 138 receives correlated difference frames from the correlator function 136 and sub-frame definitions from the scene analyzer 132. The correlation analyzer 138 applies a peak detection function to each sub-frame to identify the shift required to achieve best alignment of the two images. Correlation peaks occurring in the center of the sub- frame indicate no motion, while peaks occurring elsewhere indicate the direction and magnitude of the user's motion. The motion interpreter 140 receives motion vectors for each sub-frame from the correlation analyzer 138. The motion interpreter 140 links the motion vector from each sub-frame with a particular body segment and passes this information onto the host interface 142. The host interface 142 provides for communication with the host processor (e.g., CPU 30, Figure 1). It sends data packets to the host to identify detected body motions, their directions and their amplitudes. The host interface 142 also receives instruction from the host as to which body segments to track which it then passed along to the motion interpreter 140 and the scene analyzer 132.
The scene analyzer 132 first identifies the location of the user's body in the image and locates the position of various parts of the user's body such as hands, forearms, head, and legs. The techniques and methods used to identify the user's body location and body part positions can be accomplished using techniques well known to those skilled in the art (by way of example, via matched filtering). Body identification can also be augmented by marking different locations on the user's body with unique visual symbols. Unique symbols are assigned to key body joints such as elbows, shoulders, hands, neck, knees, and waist and are mounted on the body. See for example Figure 5A.
Figure 5A illustrates one frame 149 of data of an image of the user 150 as taken by a camera of the invention. The image corresponds to a full body image of the user 150, including arms 151, legs 152, elbows 151a, hands 153, head 154, neck 155, ears 156, and forhead 157. These parts 151-157 are identified by processes of the invention (e.g., spatial location in the image, by matched filtering or other image recognition technique), and the image is preferably marked with unique symbols (e.g., "X" for center of the face, "Y" for center of the hand 153, "T" for center of the user's foot, "Z" for body center, and "F" for forehead 157).
With further reference to Figure 5, process 130 locates various body parts and preferably marks them with symbols to fill in connecting logic (e.g., the left wrist and left elbow symbol identify the location of the left forearm). Once the user's body parts are located, sub-frames surrounding each of the body segments identified by the host processor are generated. A sub-frame is a generally regularly shaped region within the image that surrounds a particular body part. The sub-frames are sized to center the subject body part in the sub-frame and to provide enough room around the body part to accommodate typical body motions. One sub-frame 160 is shown in frame 149, Figure 5 A, surrounding the user's foot "T". The scene analyzer 132 will generally not operate on each frame of video since continuously changing the sub- frames adds unnecessary complication to the correlation analyzer 138. Instead, the scene analyzer 132 runs as a background process updating the sub-frame locations periodically.
Figure 4 provides a detailed description of one algorithm which can be used to implement processes 134-138 of Figure 5. The invention of one embodiment can thus track the motion of the user's body using symbols attached to key joints. As an example, the position of the user's left lower arm can be determined by locating the unique symbol for the left hand "γi" and left elbow "•". Unique symbols thus allow the processor to rapidly locate
each portion of the user's body in a video frame. To determine the motion of a particular part of the user's body, the algorithm (e.g., Figure 4) compares the position of the relevant body parts in consecutive frames and determines how they moved (for example, using geometry). Once motion is determined, it is then passed to the host CPU where the motion is acted on as appropriate for the particular application.
Figure 6 illustrates a two camera system 200, constructed according to the invention. The cameras 202a, 202b are arranged to view separate parts of the user: camera 202a images the user's face 204; and camera 202b images the user's hand 206. The cameras 202 conveniently rest on top of the computer display 208 coupled to the host computer 210 by cabling 216. The cameras 202 couple to the signal processing card 212 residing within the computer 210 by cabling 213. As discussed herein, motion of the user's head 204 and/ or hand 206 are detected by the signal processing card 212, and difference information is communicated to the computer's CPU 210a via the computer bus 214. This difference information corresponds to composite movement of the head 204 and hand 206; and is used by the CPU 210a to command movement of display items on the display 208 (for example, the display items can include the cursor or scene view as shown on the display 208 to the user). Information shown on the display 208 is communicated from the computer 210 to the display 208 along standard cable 216.
Figures 7 and 7 A illustrate how motion of a user's head is for example translated to motion of the cursor and/ or scene view, in accord with the invention. Figure 7 shows a representative image 220 of a user captured within a frame of data by a camera of the invention. Figure 7 also shows a representative image 222 (in dotted outline, for clarity of illustration) of the user in a subsequent frame of data, indicating that the user moved "M" inches. Figure 7 A illustrates corresponding scene views on a computer display 224 that is coupled to processing algorithms of the invention (i.e., within a system that includes a camera that captures the images 220, 222 of Figure 7). The display 224 illustratively shows a scene view that includes a road 224a that extends off into the distance, and a house 224b adjacent to the road 224a. A computer cursor 224c is also illustratively shown on the display 224 as such a cursor is common even within computer games, providing a place for the user to select items (such as the road or house 224a, 224b) within the display 224. The display 224 also shows, with dotted outlines 226, the scene view of road and house which are shown on the display 224 after motion by the user from 220 to 222, Figure 7 (the cursor 224c is for example repositioned to position 224c'). The repositioning of the scene view from 224a, 224b to 226 occurs immediately (typically less than 1/30 second, depending upon the camera) after the movement of the user of Figure 7 from 220 to 222. The scene view is repositioned by x-pixels on the display 224, so that M/x corresponds to the magnification between user movement and scene view repositioning. This magnification can be set by parameters within the system; and can also be set by the user, if desired, at the computer keyboard. Furthermore, the rate at which the scene view moves the distance of x-pixels preferably occurs at the same rate as the rate of travel along distance M. Alternatively, the magnification can be dependent on the rate of motion such that a larger displacement of x-pixels will occur for a given motion M if the rate of change of M is larger.
Figure 8 illustrates a further motion that can be captured by a camera of the invention and processed to reposition a scene view, as shown in Figures 8 A and 8B. More particularly, Figure 8 illustrates a camera 250 connected to a processing section 252 which converts user motion 254 to corresponding repositioning of the computer scene view. As above, the user 256 is captured by the camera's field of view 258 and frames of data are captured by the processing section 256. In Figure 8, motion 254 corresponds to a twisting of the user's head 256; and processing section 252 detects this twisting and provides repositioning information to the host computer (not shown). Processing section 252 can also incorporate head-translation motion (e.g., illustrated in Figure 15A) into the scene view movement above; and can similarly reject translational movement too, if desired, so that no scene motion is observed for translation of the user 256.
Figure 8A shows a representative scene view 260 on a display 262 coupled to the host computer. Figure 8B illustrates repositioning of the scene view 260' after the processing section 252 detects motion 254 and updates the host computer with difference information (e.g., that information which the host computer uses to rotate or translate the scene view).
Figure 8A also illustrates the intent of the rotating scene view feature. In Figure 8A, a person 260a is shown in the scene view 260, except that the person 260a is almost completely obscured by the edge 262a of the display 262. By twisting the head 256 in motion 254, the scene view 262 is rotated in the corresponding direction - as shown by scene view 260' in Figure 8B - so that the user 260a' is completely visible within the scene view 260'.
Figure 8C illustrates further detail of the processing section 252. Camera data such as frames of images of a user are input to the section 252 at data port 266. The data are conditioned in the image conditioning section 268 (for example, to reduce correlated noise or other image artifacts). Thereafter, the camera data is compared and correlated in the image correlation section 270, which compares the present frame image with a series of stored images from the image memory 272. In the preferred embodiment, the present data image frame 249 is cross-correlated with each of the images within the image memory 272 to find a match. These images correspond to a series of images of the user in known positions, as illustrated in Figure 8D. In Figure 8D, various images are stored representing various known positions of relevant part, here the user's head 256. In the position of Figure 8, for example a straight on face shot, the 0° stored memory image would provide the greatest cross- correlation value indicating a matched image position. Accordingly, the scene view would adjust to a zero position. If, however, the image correlated to a -90° position, the scene would rotate to such a position. Other movements cause additional scene view motions, including tilt and tip of the head, as shown in the two images "0°, Down 45°" image and the "0°, Up 45°". These images cause the scene view to move upwards or to tilt up or down, when the process section 252 correlates the current frame to these images. As indicated, these images have no left or right component, though other images (not shown) can certainly include left, right and tip motion simultaneously.
Figure 9 shows a system 300 constructed according to the invention and including a camera section 302 including an IR imager 304 and a camera 306, both of which view and capture frames of data from a user 308. The IR imager 304 can include, for example, a microbolometer array (i.e., "uncooled" detectors known in the art) which produces a frame of data corresponding to the infrared energy emitted from the user, such as illustrated in Figure 9A. Figure 9A shows a representative frame of IR image data 310, with zones 312 of relatively hot image data emitting from regions of forehead, nose and mouth of the user 308.
The cameras 304, 306 send image data back to the signal processing section 314. Data from the camera 306 is processed, if desired, as above, to determine difference information signal 322 used by a connected computer to reposition the cursor and/ or scene view. Data from camera 304, on the other hand, is used to evaluate how much (or how hot) zones 312 appear on the user during play of the computer. The signal processing section 314 assesses the zones 312 for temperature and/ or size over the course of a computer game and generate a "game speed control" signal 320 which is communicated to the user's computer (i.e., that computer used in conjunction with the system 300 of Figure 9). The user's computer processes the signal 320 to increase or decrease the speed of a computer game in process on the computer.
Those skilled in the art should appreciate that the IR camera 304 can be used without the features of the invention which assess user movement. Rather, this aspect should be considered stand-alone, if desired, to provide active feedback into gaming speed based upon user temperature and/ or stress. Note that the camera 304 can also be used to detect heartbeat since the zones 312 generally pulse at the user's heartbeat, so that heartbeat rate can also be considered as a parameter used in the generation of the game speed control signal 320. Alternatively, a pulse rate can be determined by known pulse rate systems that are physically connected to the user 308.
An IR lamp 324 can be used in system 300 to illuminate the user 308, with IR radiation 324a, such that sufficient IR illumination reflects off of the user 308 whereby motion control of the cursor and/ or scene view can be made without the additional camera 306. The lamp 324 can be, and preferably is, made integrally with the section 302 to facilitate production packaging.
An IR lamp 324 operating in the near-IR can also be used with visible cameras of the invention which typically respond to near-IR wavelengths. By way of example, certain camera systems now available incorporate six IR emitters around the lens to illuminate the object without distraction to the user who cannot see the near-IR emitted light. Such a camera is suitable for use with the invention.
Figure 9B shows process methodology of the invention to process thermal user images in accord with the preferred embodiment of the invention. Specifically, a system such as system 300 first acquires a thermal image map in process block 326. This image is compared to a reference image ("REF") in process block 327. REF can either be a temperature of the user (i.e., a temperature of one hot spot of a non- stressed user, or the temperature of one hot spot of the user at an initial, pre-game condition) or an amount of the area 312, Figure 9A, of the user in a non-stressed condition or initial pre-game condition). By way of example, REF can be an image such as the frame 310 of Figure 9 A. When the temperature or area of the region 312 increases, the system 300 detects this change and determines that the image map exceeded the REF condition, as illustrated in process block 328. Should the map exceed the REF condition, the system 300 communicates this to the host processor which in turn adjusts the gaming speed, as desired. If the map does not exceed the REF condition, then the next IR image frame is acquired at block 326.
System 300 and the process steps of Figure 9B are thus suitable to adjust gaming speed in real time, depending upon user stress level. In the preferred embodiment, the gaming speed is increased automatically such that the image map exceeds the REF signal for greater than about 50% of the time, so that all users, regardless of their ability, are pushed in the particular game.
Those skilled in the art should appreciate that multi-camera embodiments of the invention can and preferably are incorporated into a common housing 338, such as shown in Figure 10. Further, as illustrated in Figure 10, cameras can also be made from detector arrays 340, processing electronics 342, and optics 344. Each camera 340, 342, 344 is constructed to process the correct electromagnetic spectrum, e.g., IR (using, for example, germanium lenses 344 and microbolometer detectors 340). Each camera has its own field of view 350a, 350b and focal distance 352a, 352b to image at least a part of the user. These field of views 350 can overlap, to view the same area such as the user's face, or they can view separate locations, such as the user's head and hand.
Cameras of the invention can also include a DSP section 356 such as described above to process user motion data. The DSP section 356 processes user motion data and sends difference information to the user's host computer. The host computer thereafter repositions the cursor and/ or scene view based upon the difference information so that the user observes corresponding motion on the computer display, as described above. Accordingly, the DSP section need not reside within the computer so long as difference information is isolated and communicated to the host computer CPU.
Those skilled in the art should appreciate that algorithms of the invention can also be processed directly by the computer's CPU, provided it has sufficient power and processing speed, to eliminate a separate DSP chip or section. DSP or equivalent processing capability can also provided within the computer's chassis by way of a computer printed circuit card installed in the chassis and connected with the camera. The location and amount of processing power, therefore, should be considered a matter of design choice and current state of the art, each technique being within the scope of the invention.
Figure 11 illustrates frame capture by one camera of the invention to isolate zones of imaging according to expected motion patterns. In Figure 11, one frame 370 of data for example covers the user's eyes 371, corresponding to one image zone; and another frame 372 of data can cover the user's head 373, corresponding to another image zone. As mentioned previously, preferably the frames 370, 372 are 64x64 pixels each, or 256x256 (or higher powers of two) to provide FFT capability on the image within the frame. A single camera can however provide both frames 370 and 372, in accord with the invention. Specifically, a dense CCD detector array (e.g., 480x740 pixels, 1000x1000 pixels, or higher) is used within the camera such that the whole array captures an image frame 376 of data, at least covering the available image format of the computer display 378. A matched filtering (or other image locate process) is processed on the frame 376 to locate the center 371a of the user's eyes (in the matched filtering process, an image data set of the user's eyes is stored in memory and correlated to the frame 376 such that a peak correlation is found at position 371a). Thereafter, a 64x64 array of data is centered about the eyes 371 to set the frame 370. To acquire the frame 372, every other pixel is discarded so that, again, a 64x64 array is set for the frame 372 (alternatively, each adjacent pair of pixels is added and averaged to provide a single number, again reducing the total number of pixels to 64x64). Note that this process is reasonable since the width of the eyes is at least V2 the width of the user's face. Nevertheless, further compression can be obtained by utilizing every third pixel (or averaging three adjacent pixels) to obtain a larger image area in the frame 372. Note that the compression in the width and length dimensions need not be the same.
Framing of the information in Figure 11 can occur in several ways. Most cameras image at 30Hz so that image motion is smooth to the human eye. In one embodiment, one frame 370 is taken in between each frame 372, to minimize data throughput and processing; and yet to maintain dual processing of the two zones imaged in Figure 11. Alternatively, both frames 370, 372 are processed concurrently since frame 376 is typically the 30Hz frame.
Figure 11 also illustrates how framing can occur around the user's eyes 371 to acquire "blink" information to reset cursor control. A blink detected by the user's eyes in frame 370 (or other frame) can be used to (a) disable or enable control of cursor or scene movement based upon user control, or (b) simulate pick-up and replacement of the computer mouse (i.e., reinitializing movement in a particular direction). For example, by detecting a blink of the eyes 371, a system of the invention can disable human motion following control such as described herein.
Another blink can be used to enable human motion following control. Blinking can also be used to continue motion in a particular direction. For example, movement of the cursor can be made to follow movement of the user's head, as described above.
However, after a while, the person has to move to an uncomfortable position to keep moving the cursor (or scene). A blink can thus also serve to reposition the head back to a normal starting position so that further movement in the desired direction can be made.
Figure 12 illustrates a similar capture of a user's eyes 400, in accord with the invention. A frame 402 can thus be acquired by a camera of the invention. Figure
12 A illustrates further detail of one representative frame 402, illustrating that the user's pupils 404 are also captured. Figures 3 and 4 describe certain algorithms of the invention that are also applicable to motion of the user's pupils 404, as illustrated by left and right motion 406 and up and down motion 408. Accordingly, by zooming in on the user's eyes, another movement zone is created that causes repositioning of the cursor or the scene view based upon the movements 406, 408, much like the head movement described and illustrated in Figures 1-4.
Note that the teachings of Figures 1-4 and 12-12A can be combined within a two zone movement system so that, for example, both head motion and pupil motion can be evaluated for image motion. The cursor and/ or scene view can be repositioned, therefore, based upon movements from both zones. By way of example, repositioning of items within the display (e.g., the cursor and/ or scene view) can be made when the head moves but not if the head and eyes move, which might indicate that the user is simply looking elsewhere in the room due to a distraction. However, if the user moves his head, but not his eyes, he is focussed on the game and intends rotation of the scene view, in another example. Other combinations are also possible.
Cameras of the invention can also include zoom optics which (a) reduce or enlarge the image frame captured by a particular camera, or which (b) provide autofocus capability. Figure 13 shows one system 430 constructed according to the invention. A camera 432 includes camera electronics 432a and a zoom attachment
432b. Data from the camera 432 is relayed to image and interpretation feedback electronics 434 for evaluation. For purposes of image magnification, the feedback electronics serve to evaluate a given image size relative to desired image goals. For example, to image the user's eyes with high fidelity might require high density of pixels at the user's eyes (e.g., at the zone 370, Figure 11). Accordingly, the system 430 can isolate the user's eyes, such as described herein, and command the camera
(through command lines 436) to increase or decrease magnification on the user's eyes so as to achieve desired resolution. The feedback electronics can also command motion of the camera to change its boresight alignment (i.e., to change where the camera image is centered) by commanding movement of the camera when resting on one or more linear drives 438, as known in the art.
Once aligned on the desired user location, e.g., on the eyes with desired accuracy, the system 430 continues processing data such as described herein to create human interface control of items displayed on the user's computer, e.g., cursor and/ or scene view. Accordingly, processing section 440 operates to detect user motion and to communicate difference information to the user's computer, as described above.
With regard to autofocus, the system 430 of Figure 13 can also be used to process user motion based upon motion towards and away from the camera. Figure 14 illustrates such a system, including a camera 450 with autofocus capability to find the best focus 452 relative to a user 454 within the field of view 456. For example, when the user 454 moves to position 460 (the user being shown in outline form 454a), the new best focus has changed to 452a. The camera 450 provides a signal 450a to the image interpretation and feedback electronics 434, Figure 13, which indicates where the user is along the "z" axis from the camera 450 to the user 454. This signal 450a is thus used much like the other motion signals described herein, to move the cursor and/ or scene view in response to such movements. Figure 14A illustrates a representative scene view 462 when, for example, the user is at best focus 452. The scene view 462 includes a house image 464 with a door 465. When the user moves to position 460, the house and door 464', 465' of the scene view 462' enlarge, since the user moved closer to the camera 450. Such a motion might reveal, for example, additional objects within the house, such as illustrated by object 466, Figure 14B. Accordingly, the autofocus feature of the invention provides yet another degree of freedom in motion control, in accord with the invention.
Image data, manipulation, and human interface control can be improved, over time, by using neural net algorithms. As shown in Figure 13, a neural net update section 435 can for example couple to the feedback electronics 434 so as to assimilate movement information and to improve data transmitted to the host computer, over time. Use of neural nets are known in the art.
Figure 15 illustrates a frame of data 490 used in accord with the invention to implement a simplified left, right, up, down movement algorithm to control cursor movement and/ or scene view movement. Frame 490 is captured by a camera of the invention; and preferably the camera incorporates autofocus, as described above, to provide a crisp image of the user 492 regardless of her position within the camera's field of view. As shown, image frame 490 provides very sharp edges to the user's face, including a left edge 494a, right edge 494b, and chin 494c. These edges need only approximate vertical or horizontal position. Movement of the user results in movement of the edges 494, such as shown in Figure 15A. Figure 15A shows that once such edges are acquired, they conveniently permit subsequent movement analysis and control of scene view and/ or cursor position. Specifically, Figure 15 A shows movement of the user's "edges" from 494a-c to 494a-c', indicating that the user moved left (as viewed from the camera's position) and that her chin raised slightly, indicating that an upward tilt of the head. This information is assessed by the process sections as discussed above and relayed to the host computer as difference information to augment or provide cursor and/ or scene movement in response to the user's movement.
Note that such edge movements roughly correspond to movement along rows and columns of the detector array. Detected movement from one row to another (or one column to another) can readily calculate the actual motion of the user from information of the user's best focus position and from the focal length of the camera's lens. This information may then be used to set the magnification of movement of items in the computer display (e.g., cursor and/ or scene view).
Figure 16 illustrates an image of one object 500 used in accord with the invention to provide image manipulation in response to motion of the object. The object 500 is held by the user 501 to manipulate motion of his cursor 502 and/ or scene view 504 on his computer display 506. The object 500 is used because it exhibits an optical shape that is easily recognized through image correlation (such as matched filtering). In accord with the invention, a camera 510 is used to image the object 500; and frames of data are sent to the frame processor 512. The processor 512 determines image position - relative to a starting position - and thereafter communicates difference information to the user's computer 505 along data line 514. The difference information is used by the computer's CPU and operating system to reposition items on the display 506 in response to motion of the object 500. Almost any motion, including rotation, tilting and translation are accomplished with the object 500 relative to a start position. This start position can be triggered by the user 501 at the start of a game by commanding that the camera 510 take a reference frame ("REF") that is stored in memory 513. The user 501 commands that REF imagery be taken and stored through the keyboard 505a, connected to the computer 505, which in turn commands the processor 512 and camera 510 to take the reference frame REF.
Motion of the object 500 is thus made possible with enhanced accuracy by comparing subsequent frames of the object 500 with REF. When motion of rotation, tilt or translation are detected (for example, by using the techniques of Figures 2-4, 8-
8D), then repositioning of items (502, 504) on the display 506 are follow that movement.
The techniques of the invention permit control of the scene view and/ or cursor on a computer screen by motion of one or more parts of the user's body. Accordingly, as shown in Figure 17, complete motion of the user 598 can be replicated, in the invention, by correlated motion of an action figure 599 within a game. In Figure 17, user 598 is imaged by a camera 602 of the invention; and frames from the camera 602 are processed by process section 604, such as described herein. The user 598 is captured and processed, in digital imagery, and annotated with appropriate user segments, e.g., segments 1-6 indicating the user's hands, feet, head and main body. Motion of the segments 1-6 are communicated to the host computer 606 from the process section 604. The computer's operating system then updates the associated display 608 so that the action figure 599 (corresponding to an action figure within a computer game) moves like user 598. Accordingly, user motion of action figure 599 is made by the user 598 by performing stunts (e.g., striking and kicking) that he would like the action figure 599 to perform, such as to knock out an opponent within the display 608.
In an alternative embodiment, icons can be used to simplify image and motion recognition of user segments such as segments 1-6. For example, if user 598 is marked with a star-shaped object on her hand (e.g., segment 1), then that star symbol is more easily recognized by algorithms such as described herein to determine motion. By way of another example, the hand of user 598 can be covered with a glove that has an "+" symbol on the glove. That "+" symbol can be used to more easily interpret user motion as compared to, for example, actually interpreting motion of the user's hand, which is rounded with five fingers. In a third example, user 598 can wear a article of clothing such as shirt 598a with a "+" symbol 598b; and the inventio can be used to track the icon "+" 598b with great precision since it is a relatively easy object to track as compared to actual body parts. It should be apparent to those in the art that icons such as symbol 598b can be painted or pasted onto the individual too to obtain similar results.
Figure 18 illustrates a two camera system 700 used to determine translation and rotation. The forward viewing camera 702 observes the user's face 703 and determines the right-left (Δxi) and up-down (Δyi) translation of the user's face 703. The top viewing camera 704 observes the top of the user's face or head 705 and determines the right-left (Δx2) and forward-backward motion (Δy2) of the user's face or head. The two cameras 702, 704 are each processed through motion sensing algorithms 706 using the teachings above, and results are shown on the computer display 710. For purposes of illustration, the display 710 shows an image of the user; while the image can be, for example, an action figure or other computer object (including the computer cursor), as desired, which follows tracking motions Δxi, Δyi, Δx2, Δy2. As indicated in Figure 18, for example, Δy can be directly applied to motion control of the user's forward and reverse motion (note, these motions are illustrated as within a computer display 710 as processed by algorithms 706). Δxi can be directly applied to the users left-right sideways or strafe motion; Δyi can be directly applied to control the users up-down viewpoint, each as illustrated on display 710a. The results of the difference between Δx2 and Δxi can be applied to control the user's left-right turn or viewpoint.
The techniques of Figure 18 can be further extended to front, side and top view cameras for complete motion detection. The top camera determines the user's left-right, front-back motion while the front facing camera determines the user's rotational up-down, left-right motion.
Figure 19 describes an algorithm to detect user eye blink. The video imagery is stored into a multiple frame buffer 800. The algorithm selects the current frame and a frame from the frame buffer and differences these frames using the adder 802. The difference frame consists of the pixel by pixel difference of the delayed frame and the current frame. The difference frame includes motion information used by the algorithms of teachings above. It also contains information on the user eye blink. The frames differenced by the adder 802 are separated temporally enough to ensure that one frame contains an image of the users face with the eyes open, the other image is of the user's face with the eyes closed. The difference image contains a two strong features, one for each eye. These features are spatially separated by the distance between the user's eyes. The blink detect function 808 inspects the image for this pair of strong features which are aligned horizontally and spaced within an expected distance based on the variation from one human face to another and the variation in seating distance expected from user to user. The recognition of the blink features may be accomplished using a matched filter or by recognition of expected frequency peaks in the frequency domain at the expected spatial frequency for human eye separation. The blink detect function 708 identifies the occurrence of a blink to a controlling function to either disable the cursor motion or take some other action. Figure 20 illustrates a sound re-recalibration system 800 constructed according to the invention. As above, a camera 802 is arranged to view a user, a part of a user (e.g., a hand), or an object through the camera's field of view 804. A processing section 806 correlates the framing image data from camera 802 to induce movement of a scene view or cursor on the user's display 810. For purposes of illustration, the scene view or cursor is shown illustratively as a dot 808 on display 810; and movement 812 of the cursor 808 from position 808a to 808b represents a typical movement of the cursor or scene view 808 in response to movement within the field of view 804, as described above. A re-calibration section 816 is used to reset the cursor or scene view 808 back to an initial position 808a, if desired. Specifically, in one embodiment, section 816 is a microphone that responds to sound 818 generated from a sound event 818a, such as a snap of the user's fingers, or a particular word uttered by the user, to generate a signal for processing section 806 along signal line 816a; and section 806 processes the signal to move the cursor or scene view 808 back to original position 808a. In another embodiment, re-calibration section 816 can also correspond to a processing section within the processing hardware/ software of system 800 to, for example, respond to the blink of a user's eyes to cause movement of the cursor 808 back to position 808a.
The invention thus attains the objects set forth above, among those apparent from the preceding description. Since certain changes may be made in the above methods and systems without departing from the scope of the invention, it is intended that all matter contained in the above description or shown in the accompanying drawing be interpreted as illustrative and not in a limiting sense. By way of example, although FFT correlation is often discussed in the above description, it should be apparent to those skilled in the art that other correlation techniques can be used with the invention to achieve similar results without departing from the scope hereof.
The following Matlab source code provides non-limiting computer code suitable for use to control the cursor on a display such as described herein. The Matlab source code thus provides an operational demonstration of the concepts described and claimed herein. The Matlab source code is platform independent and needs only a sequence of input images. It includes a centroid operation on the correlation peak which is not included on the PC version (described below), providing a finer measurement on the motion in the image. More particularly, the centroid operation provides a refinement on locating the correlation peak. The PC code, discussed below, uses the pixel location nearest the correlation peak while the centroiding operation improves the resolution of the peak location to levels below a pixel.
% Copyright (C) 1998
% Video Mouse Group Partnership
% % This following script file reads in a sequence of images of a computer user's face. % It then processes the image sequence using the methods of difference % frame correlation processing used for a human-computer interace. % This code includes a centroiding operation and demonstrates the % difference frame correlation approach.
[filename,fιlepath]=uigetfιle('d:\videomouse\*.mat'); fιles=dir([fιlepath '*.mat']); previousFrame=zeros(64); inputMatrixl=zeros(64); inputMatrix2=zeros(64); cursorPosX(l)=0; cursorPosY(l)=0; for i=l :size(files,l); load([fιlepath 'frame' num2str(i)]); camera=conv2(double(camera'),ones(2)/4,'same'); camera=camera(l :2:480, 1 :2:640); camera=round(camera(120-32:120+31,160-32:160+31)/4); ifmod( ) %Compute difference frame inputMatrix2=camera-previousFrame ;
%Save current difference frame for next iteration previousFrame=camera; %FFT difference frame inputMatrix2=fft2(inputMatrix2); %Perform difference frame correlation by multiplying difference frame by
%complex conjugate of previous frame correlationMatrix=0.00001 *conj (inputMatrix2. *conj (inputMatrix 1 )) ; else %Compute difference frame inputMatrix 1 =camera-previousFrame; %Save current difference frame for next iteration previousFrame=camera; %FFT difference frame inputMatrix 1 =fft2 (inputMatrix 1 ) ;
%Perform difference frame correlation by multiplying difference frame by % complex conjugate of previous frame correlationMatrix=0.00001 *conj(inputMatrixl .*conj(inputMatrix2)); end
%Compute inverse FFT on correlation matrix coιτelationMatrix=real(fft2(correlationMatrix)); %Shift correlation matrix to center correlation peak temp=fftshift(correlationMatrix); %Find maximum value of correlation matrix correlationPeak=max(correlationMatrix(:));
%Perfonn centroiding on correlation peak in order to find peak location %The centroiding operation is not currently incorporated into the application version mask=temp>.50*correlationPeak(l); maskSize(i)=sum(mask( : )) ; if maskSize(i)<100 colSums=sum((temp-.50*correlationPeak(l)).*mask); xPos(i)=sum(colSums.*(-32:31))/sum(colSums); rowSums=sum((temp'-.50*correlationPeak(l)).*mask'); yPos(i)=sum(rowSums.*(-32:31))/sum(rowSums); else xPos(i)=0; yPos(i)=0; end if i>l cursorPosX(i)=cursorPosX(i-l)+xPos(i); cursorPosY(i)=cursorPosY(i-l)+yPos(i); end end
The following PC source code, labeled videoMouseDlg.doc and videoMouseDSP.doc, provide non-limiting and nearly operable DSP code for control of the cursor, as described herein. The code is not smooth; and there are other files required to compile this code to an executable, as will be apparent to those skilled in the art, including header files (*.h), resource files and compiler directives.
*
Copyright (C) 1998 * Video Mouse Group Partnership
Module : dspcode.c
MODIFICATION HISTORY -.**/
/* */
MODULE ENVIRONMENT= =**
/*
* Include files
*/
#include "dddefs.h" /* XPG definitions and prototypes. */
#include "ddptypes.h" /* XPG function prototypes. */
#include "dberrors.h" /* XPG error codes. */
#include "xpgreg.h" /* XPG register definitions */
#include "grabisr.h" /* grab isr include file */
#include "intpt40.h" /* Parallel Runtime Support Library intertuupts header */
#include "protocol.h" /* Protocol constants defined for this application */
/** — -**/
/*
* Function prototypes.
*/ extern VOID DBU_App 1 Interrupt (VOID) ; extern VOID DDK_PKTDelay (ULONG);
/* * _____________ =**/ /*
* Global Variables
*/ volatile LONG G App 1 UserMasklntCount = 0;
/**==============_ ==_ = CODE =—=——— ====**/
1*============================== =========
* Name : DBU_Correlation *
* Parameters :
* llnputlmagel Image number of the 1st input image. * Hnputlmage2 Image number of the 2nd input image.
* Returns : Error status
* P_SUCCESS * * IMotionX
* IMotionY
* lFrameRate
/* FFT STUFF FROM Tl */
#defme FFTSIZE 64 /* FFT size (n) */ #define LOGFFTSIZE 6 /* log(FFT size) */ #defme FFTSIZEx2 128
#defme BLOCK0 0x002ffB00 /* on-chip RAM buffer */
extern void cfftc(); /* C-callable complex FFT */
float complexMatrixl [FFTSIZE] [FFTSIZEx2]; /* Input matrix */ float complexMatrix2 [FFTSIZE] [FFTSIZEx2]; /* Input matrix */ float correlationMatrix[FFTSIZE][FFTSIZEx2]; long previousFrame[FFTSIZE] [FFTSIZE] ; float *p_localRam,*fPtrl,*fPtr2; float correlationPeak; float *block0 = (float *)BLOCK0, *mml [FFTSIZE], *mm2 [FFTSIZE], *mm3 [FFTSIZE];
/*End FFT Stuff*/
LONG DBUJQserFunction (LONG llnputlmagel, LONG llnputlmage2)
PROCESSINGINFO Imagelnfol; PROCESSINGINFO ImageInfo2; LONG lErrorStatus;
LONG lFifoStatus;
LONG lOutputFifoStatus; LONG lFrameCount = 0;
LONG IStatus; register LONG *pl Address 1 ; register LONG *plAddress2; register LONG *p_image Address; long *lPtr; float *currentDifference; float *previousDifference;
ULONG ulPCData; ULONG ulTempO;
ULONG ulValue;
ULONG pulPacket[5];
ong peakRow; ong peakCol; ong row; ong col; ong pixel; ong index 1; ong index2; ong index3;
int targetCoA 16,targetRow = 16,targetDirection=0;
/* */
/* Initialize some variables. */ /* */
lErrorStatus = P_SUCCESS; ulPCData = APPLICATION_RUNNING; lFifoStatus = P_EMPTY;
/* */ /* Install the ISR found in the file INTERRUP.ASM to interrupt */ /* resource IIOFO. Initialize the G_lApplUserMaskIntCount. */
/* *j
DDFJSRSetllOFO (P_INTERRUPT_USER_MASK, (VOID *) DBU Appl Interrupt);
G_lApplUserMaskIntCount = 0; /* */
/* Set the ID of the second image for double buffering. */
/* Perform a quick grab setup on the input image number. */
/* Call Quick Grab with no wait. This application runs the grab */ /* in continuous mode, the grab will not return until the DMA */
/* has started. */
/* */
if ((IStatus = DBF SetSecondlmagelD (P_DEFAULT_QGS, llnputlmage2)) != P_SUCCESS)
{ ulPCData = END_APPLICATION_REQUEST; lErrorStatus = IStatus;
} /* End if. */
if ((IStatus = DBF_QuickGrabSetup (P_DEFAULT_QGS, llnputlmagel))
!= P_SUCCESS)
{ ulPCData = END_APPLICATION_REQUEST; lErrorStatus = IStatus;
} /* End if. */
if ((IStatus = DBF_QuickGrab (P_DEFAULT_QGS, P_GRAB_INIT, P_GRAB_NO_WAIT)) != P_SUCCESS)
{ ulPCData = END_APPLICATION_REQUEST; lErrorStatus = IStatus;
} /* End if. */
/* */
/* Enable hardware interrupts on IIOFO */
/* */
INT_ENABLE (); set_iif_flag (IIOF0_EIIOF);
/* */
/* Initialize pointers to the two image buffers */ /* */
if ((IStatus = DBK_MmtGetImageInfo (llnputlmagel, &ImageInfol)) != P_SUCCESS)
{ ulPCData = END_APPLICATION_REQUEST; lErrorStatus = IStatus; } /* End if. */ if ((IStatus = DBKJVfmtGetlmagelnfo (Hnputlmage2, &ImageInfo2)) != P_SUCCESS)
{ ulPCData = END_APPLICATION_REQUEST; lErrorStatus = IStatus;
} /* End if. */
plAddressl = (LONG *) (ImageInfol.PRO_MappedAddress); plAddress2 = (LONG *) (ImageInfo2.PRO_MappedAddress);
/* FFT Initialization Stuff */ asm(" or 1800h,st"); /* cache enable */
/*End FFT Initialization */
/* */
/* While the input fifo from the PC does not contain */
/* any data, continue processing frames and returning */ /* the results to the PC. */
/* */
while (ulPCData != END APPLIC ATION_REQUEST) { if (lFrameCount < (G I App 1 UserMasklntCount)) { lFrameCount = G_lAppl UserMasklntCount;
p_imageAddress = (G_lAppl UserMasklntCount % 2) ? plAddressl : plAddress2; if (GJApp lUserMasklntCount % 2) { p_imageAddress=plAddress2;*/ cunentDifference==&complexMatrix2[0] [0] ; previousDifference=&complexMatrix 1 [0] [0] ;
} else{ p_imageAddress=p 1 Address 1 ;*/ currentDifference=&complexMatrix 1 [0] [0] ; previousDifference=&complexMatrix2[0][0];
Compute FFT on Difference Frame Rows *************/
for (row=0;row<FFTSIZE;row++) { for (col=0;col<FFTSIZE;col++){ lPtr=p_imageAddress+2*col+256*row; pixel=*lPtr+*(lPtr+l)+*(lPtr+128)+*(lPtr+129); /* Compute Difference Frame */ block0[2*col]=(fk)at) (pixel - previousFrame[row][col]); block0[2*col+l]=0.0; /*Save current frame for next iteration*/ previousFrame[row][col]=pixel;
} cfftc(blockO,FFTSIZE,LOGFFTSIZE); fPtrl=currentDifference+row*FFTSIZEx2; for (indexl=0;indexl<FFTSIZEx2;indexl++) fPtr 1 [index 1 ]=block0[indexl ] ;
for (coA0;col<FFTSIZE;col++) {
index3=2*col; for (index2=0;index2<FFTSIZEx2;index2=index2+2) { block0[index2]=currentDifference[index3]; block0[index2+l]=currentDifference[index3+l]; index3+=FFTSIZEx2;
}
/*Comρlete column FFT of difference frame*/ cfftc(block0,FFTSIZE,LOGFFTSIZE);
index3=2*col; for(index2=0;index2<FFTSIZEx2;index2=index2+2){
/*Save FFT of difference frame */ currentDifference[index3]=block0[index2]; currentDifference[index3+ 1 ]=block0[index2+l ] ;
block0[index2]=currentDifference[index3] *previousDifference[indexl]
+currentDifference[index3+l ] *previousDifference[indexl+l]; block0[index2+l]=currentDifference[index3] *previousDifference[indexl +1 ] -currentDifference[index3+l]
*previousDifference[indexl]; index3+=FFTSIZEx2;
}
/* Compute inverse FFT for conelation matrix column*/ cfftc(block0,FFTSIZE,LOGFFTSIZE);
/*Save to correlation frame*/ fPtr 1 =&conelationMatrix[0] [0] ; index3=2*col; for (index2=0;index2<FFTSIZEx2;index2=index2+2) { fPtrl[index3]=block0[index2]; fPtr 1 [index3+l ]=block0[index2+l ] ; index3+=FFTSIZEx2;
} } correlationPeak=- 100000000; for (row=0;row<FFTSIZE;row++) { fPtrl =&correlationMatrix[row] [0] ; for (index 1 =0;index 1 <FFTSIZEx2;index 1 ++) block0[index 1 ]=fPtrl [indexl ] ; /* Inverse FFT on correlation matrix row */ cfftc(block0,FFTSIZE,LOGFFTSIZE);
fPtr 1 =&correlationMatrix[row] [0] ; for (col=0;col<FFTSIZE;col++) { index l=2*col; if (correlationPeak<block0[indexl]) { correlationPeak=block0[index 1 ] ; peakRow=row; peakColAndexl ; } fPtr 1 [index 1 ]=block0[indexl ] ;
}
/* */
/* Send protocol, lAverage and lFrameCount to the PC. */ /* */
pulPacket[0] = APPLICATION_RUNNING; pulPacket[l] = peakCol; pulPacket[2] = peakRow; pulPacket[3] = (long) (correlationPeak*.0001);
DDK_PKTSend (P_PACKET_USER_INTERFACE, pulPacket,
4 * sizeof (LONG),P_WAITFORCOMPLETE, P_PC_INTERRUPT); } /* End if. */
DDK_PKTInterfaceStatus (P_PACKET_USER_INTERFACE, &lFifoStatus,
&10utputFifoStatus); if (lFifoStatus != P_EMPTY){ DDK_PKTRecv (P_PACKETJUSER_INTERFACE, pulPacket,
4 * sizeof (LONG),P_WAITFORCOMPLETE, P_NO_PC_INTERRUPT); ulPCData = pulPacket[0];
} /* End if. */ } /* End while. */
/* */ /* Disable the IIOFO interrupt. */ /* */
reset_iif_flag (IIOF0_EIIOF); DDF ISRDisablellOFO ();
/* */
/* Abort the continuous grab. */ /# */
DBF_AbortGrab ();
DBF_QuickGrabStatus (P_GRAB_WAIT);
/* */
/* Send back a protocol word indicating the */ /* last packet of data. (Pad to correct size) */
/* */
ulValue = APPLICATION TERMINATED;
DDK_PKTSend (P_PACKET_USER_INTERFACE, &ulValue, IL, P WAITFORCOMPLETE,
P_NO_PC_INTERRUPT); DDK_PKTSend (PJ>ACKET_USER_INTERFACE, &lErrorStatus, IL,
P WAITFORCOMPLETE, P_NO_PC_INTERRUPT); DDK_PKTSend (P_PACKET_USER_INTERFACE, &lErrorStatus, IL, P_WAITFORCOMPLETE,P_PC_INTERRUPT);
/* */
/* Empty USER input FIFO and output FIFO. */
/* (Host won't get the END message until */ /* the output FIFO is empty !) */
/* */
DDK_PKTInterfaceStatus (P_PACKET_USER_INTERFACE, &lFifoStatus, &!OutputFifoStatus);
while ((lFifoStatus != P_EMPTY) || (lOutputFifoStatus != P_EMPTY)) { if (lFifoStatus != P_EMPTY){ DDK_PKTRecv (P_PACKET_USER_INTERFACE, &ulPCData, IL,
P WAITFORCOMPLETE, P_NO_PC_INTERRUPT); } /* End if. */
DDK_PKTInterfaceStatus (P PACKETJJSERJ TERFACE, &lFifoStatus, &lOutputFifoStatus); } /* End while. */
return P_SUCCESS;
} /* End of the DBU_UserFunction function. */
/* * Copyright (C) 1998
Video Mouse Group Partnership
*/
/*=
**
*/
// videomouseDlg.cpp : implementation file
//
#include "stdafx.h" #include "videomouse.h" #include "videomouseDlg.h"
#include "dpdefs.h" /* XPG definitions and prototypes. */ #include "dpptypes.h" /* XPG function prototypes. */ #include "dberrors.h" /* XPG error codes. */ #include "protocol.h" /* Protocol constants define for this application */ #include <conio.h> /* getch */ #include <math.h>
static int s_runDSPLoopThreadProc;
#ifdef_DEBUG #defme new DEBUG_NEW #undefTHISJFILE static char THIS_FILE[] = _FILE_ #endif
#defme P_ID_USER_FUNCTIONl OL
#defme P PCOUNT USER FUNCTION 1 #defme FRAMESIZE 64
UINT DSPLoopProc(LPVOID pclass) {
CVideomouseDlg& pcdd = *(reinterpret_cast<CVideomouseDlg*>(pclass)); CString dataString;
int savedCommandMode - DPK_XCCSetCommandType (P_USER); DPK_XCCSetWaitMode (P_NO_WAIT);
DPK_XCCPushOpcode (P_ID_USER_FUNCTIONl , P_PCOU T_USER_FUNCTIONl );
DPK_XCCPushLong ((unsigned long) pcdd.mAnputImageNumber2); /* 2 */ DPK_XCCPushLong ((unsigned long) pcdd.m_inputImageNumberl); /* 1 */
DPK_XCCSetCommandType (savedCommandMode); long status = DPK_XCCCheckStatus (P_PCOUT, P_XCCFIX); double framesPerSecond; time_t start, finish; time( &start ); int count er=0; if (status=P_SUCCESS) {
while(s_runDSPLoopThreadProc) { counter++; status = DPK_PKTRecv (P_PACKET_USER_INTERFACE, (HVOID
*
) pcdd.m_DSPPacket, 4 * sizeof (LONG),
P_WAIT_COMPLETE); long protocol = pcdd.m_DSPPacket[0]; dataString.Format("%d",pcdd.m_DSPPacket[l]); pcdd.m_average. S etWindowText(dataString) ; dataString.Format("%d",pcdd.m_DSPPacket[2]); pcdd.m_frameNumber.SetWindowText(dataString); time( &finish ); double elapsedTime = difftime( finish, start ); framesPerSecond=(double) counter/ (double)elapsedTime; long max = pcdd.m_DSPPacket[3];
if (max>0.0){ double detectx=(double) pcdd.m_DSPPacket[2]; double detecty=0.5*(double)ρcdd.m_DSPPacket[l];
if (detectx > 31) detectx =detectx-FRAMESIZE; if (detecty > 31 ) detecty = FRAMESIZE-detecty ; else detecty = -detecty; // double multiplier;/
// if (detectx<2) multiplier=1.0;
// else if (detectx<10) multiplier = exp((detectx-2.0)/2.5);
// else multiplier^;
// detectx*=multiplier;
// if (detecty<2) multiplier=l .0;
// else if (detecty<10) multiplier = exp((detecty-2.0)/2.5);
// else multiplier=0; // detecty*=multiplier;
static POINT ptCursor; GetCursorPos(&ptCursor);
ptCursor.x-=(long) detectx; ptCursor.y-=(long) detecty;
SetCursorPos(ptCursor.x,ptCursor.y);
} } savedCommandMode = DPK_XCCSetCommandType (savedCommandMode);
DPK_XCCSetWaitMode (P_WAIT_COMPLETE); DPK_EndPCK (); AfxMessageBox("Exited Thread");
} else{ s_runDSPLooρThreadProc=false; savedCommandMode = DPK__XCCSetCommandType (savedCommandMode);
DPK_XCCSetWaitMode (P_WAIT_COMPLETE); DPK EndPCK (); AfxMessageBox("Exited Thread");
} time( &finish ); double elapsedTime = difftime( finish, start ); framesPerSecond=(double) counter/ (double)elapsedTime; return 1 ;
}
II CAboutDlg dialog used for App About
class CAboutDlg : public CDialog { public:
CAboutDlgQ; // Dialog Data
//{ {AFX_DATA(CAboutDlg) enum { IDD = IDD_ABOUTBOX }; //} }AFX_DATA
// ClassWizard generated virtual function overrides //{ {AFX_VIRTUAL(CAboutDlg) protected: virtual void DoDataExchange(CDataExchange* pDX); // DDX/DDV support
//} }AFX_VIRTUAL
// Implementation protected: //{ {AFX_MSG(CAboutDlg)
//} }AFX_MSG
DECL ARE_MES S AGE_MAP()
} ; CAboutDlg: :CAboutDlg() : CDialog(CAboutDlg::IDD)
{
//{ {AFX_DATA_INIT(CAboutDlg)
//} }AFX_DATA_INIT
} void CAboutDlg: :DoDataExchange(CDataExchange* pDX)
{
CDialog: :DoDataExchange(pDX); // { { AFX_D ATA_MAP(CAboutDlg) //} } AFX JD ATA_MAP
}
BEGIN_MESSAGE_MAP(CAboutDlg, CDialog)
//{ {AFX_MSG_MAP(CAboutDlg) // No message handlers
//} }AFX_MSG_MAP END_MESSAGE_MAP()
// CVideomouseDlg dialog
CVideomouseDlg::CVideomouseDlg(CWnd* pParent /*=NULL*/) : CDialog(CVideomouseDlg::IDD, pParent)
{ // { { AFX D ATAJNIT(CVideomouseDlg)
//} }AFX_DATA_INIT // Note that Loadlcon does not require a subsequent Destroylcon in Win32 m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);
} void CVideomouseDlg::DoDataExchange(CDataExchange* pDX) {
CDialog: :DoDataExchange(pDX); // { { AFX_DATA_MAP(CVideomouseDlg) DDX_Control(pDX, IDC_FRAMENUMBER, m_frameNumber); DDX_Control(pDX, IDC_AVERAGE, m_average); //} } AFX_D ATA_MAP
}
BEGIN_MESSAGE_MAP(CVideomouseDlg, CDialog) //{ {AFX_MSG_MAP(CVideomouseDlg) ON_WM_SYSCOMMAND()
ON_WM_PAINT() ON_WM_QUERYDRAGICON() ON_BN_CLICKED(IDC_ENABLE, OnEnable) ONJ8N_CLICKED(LDC_STOP, OnStop) //} } AFX_MSG_MAP
END_MES S AGE_MAP()
II CVideomouseDlg message handlers
BOOL CVideomouseDlg: :OnInitDialog()
{
CDialog: :OnInitDialog();
// Add "About..." menu item to system menu.
// IDM_ABOUTBOX must be in the system command range. ASSERT((IDM_ABOUTBOX & OxFFFO) = IDM_ABOUTBOX); ASSERT(IDM_ABOUTBOX < OxFOOO);
CMenu* pSysMenu = GetSystemMenu(FALSE); if (pSysMenu != NULL)
I
CString strAboutMenu; strAboutMenu.LoadString(IDS_ABOUTBOX); if ( ! strAboutMenu.IsEmptyO)
{ pSysMenu->AppendMenu(MF_SEPARATOR); pSysMenu->AppendMenu(MF_STRING, IDM_ABOUTBOX, strAboutMenu);
} } // Set the icon for this dialog. The framework does this automatically // when the application's main window is not a dialog SetIcon(m_hIcon, TRUE); // Set big icon SetIcon(m_hIcon, FALSE); // Set small icon
InitializeFrameGrabberO ;
return TRUE; // return TRUE unless you set the focus to a control }
void CVideomouseDlg: :OnSysCommand(UINT nID, LPARAM lParam)
{ if ((nID & OxFFFO) == IDM_ABOUTBOX) {
CAboutDlg dlgAbout; dlgAbout.DoModalO;
} else {
CDialog: :OnSysCommand(nID, lParam);
} } // If you add a minimize button to your dialog, you will need the code below // to draw the icon. For MFC applications using the document/view model, // this is automatically done for you by the framework.
void CVideomouseDlg: :OnPaint()
{ if (IsIconic())
{
CPaintDC dc(this); // device context for painting
SendMessage(WM_ICONERASEBKGND, (WPARAM) dc.GetSafeHdcf), 0);
// Center icon in client rectangle int cxlcon = GetSystemMetrics(SM_CXICON); int cylcon = GetSystemMetrics(SM_CYICON); CRect rect;
GetClientRect(&rect); int x = (rect.Width() - cxlcon + 1) / 2; int y = (rect.Height() - cylcon + 1) / 2;
// Draw the icon dc.DrawIcon(x, y, m_hIcon); else
{
CDialog: :OnPaint();
} }
// The system calls this to obtain the cursor to display while the user drags // the minimized window.
HCURSOR CVideomouseDlg: :OnQueryDragIcon() { return (HCURSOR) m_hIcon;
I void CVideomouseDlg ::OnEnable() { s runDSPLoopThreadProc = true; m_pDSPLoopThread = AfxBeginThread (DSPLoopProc, this);
}
void CVideomouseDlg: :OnStop()
{ if(s_runDSPLoopThreadProc) { m_DSPPacket[0] = END_APPLICATION_REQUEST;
DPK_PKTSend (PJPACKETJ SERJNTERFACE, m_DSPPacket,4 * sizeof
(LONG),
P_WAIT_COMPLETE); s_runDSPLoopThreadProc=false; }
}
#defme INIT_FAILURE 1 long CVideomouseDlg: :InitializeFrameGrabber()
{ DPKJnitPCK(l);
if ((m_status=DPK_InitXPG (0, P_IFB_RELOAD_COFF_FILE |
P_IFB_CHECK_REVISION,"videoMouse.out")) != P_SUCCESS){ DPK EndPCK (); m_errorMessage.Format("Error initializing FPG, status = %ld.\n", m_status);
AfxMessageBox(m_enorMessage); return INIT_FAILURE;
}
DPK_XCCSefWaitMode (P_WAIT_COMPLETE); long cpsNumber = DPF_LoadCPF("vidmouse.cpf '); if (cpsNumber != P_SUCCESS){ m_errorMessage.Format("Ercor loading a CPF, status = %ld.\n", m_status); AfxMessageBox(m_errorMessage); DPK_EndPCK (); return INIT_FAILURE;
} if ((m_status - DBF_SelectCPS (cpsNumber)) < P_SUCCESS) { m_enorMessage.Format("Error selecting a CPS, status = %ld.\n", m_status); AfxMessageBox(m_errorMessage); DPK_EndPCK (); return INIT__FAILURE; }
m_status=DBF_SetGrabWindow(P_DEFAULT_QGS, 256, 128, 176, 128);
if ((m_status = DBF_GetGrab Window (P_DEFAULT_QGS, &m_startCol, &m_numCols,
&m_startRow, &m_numRows)) != P_SUCCESS){ m_errorMessage.Format("Error DBF_GetGrab Window: %ld.\n", m_status);
AfxMessageBox(m_enorMessage);
DPK_EndPCK (); return INIT_FAILURE;
} if ((m_status = DBK_MmtCreateImage (m_numCols, m_numRows, P_DATA_SIZE_BYTE, P_DATA_TYPE_INTEGER, 2, &m_inputImageNumberl,
&m_numberImagesCreated))
!= P_SUCCESS){ m_enorMessage.Format("Error creating the input image, status = %ld.\n", m_status); AfxMessageBox(m_errorMessage);
DPK_EndPCK (); return INIT_F ALLURE;
} m_inputImageNumber2 = m_inputImageNumberl + 1;
return 0;
It should be understood that the following claims are to cover all generic and specific features of the invention described herein, and all statements of the scope of the invention which, as a matter of language, might be said to fall there between.
Having described the invention, what is claimed is:

Claims

1. A human motion following controller for augmenting motion of items shown on a computer display, the display being coupled to a computer of the type which controls positioning of the items through operating system controls, comprising:
a camera for capturing frames of data corresponding to a first image of at least part of a user at the computer display;
signal processing means coupled to the camera for (a) detecting differences between successive frames of data corresponding to motion of the first image, and (b) communicating differences information to the computer to reposition display of the items through operating system controls, the items being repositioned on the display by an amount corresponding to the motion of first image.
A controller of claim 1, wherein the items comprise a computer cursor.
A controller of claim 1, wherein the items comprise a scene view.
4. A controller of claim 1, further comprising a PC card for installation within the computer and for communication on a computer bus, the signal processing means being substantially resident with the PC card for communicating differences information to the bus.
5. A controller of claim 1, wherein the camera comprises means for capturing augmented frames of data corresponding to a second image of part of the user at the computer display, the signal processing means further comprising means for detecting differences between successive augmented frames of data corresponding to motion of the second image and for communicating augmented difference information to the computer to reposition display of the items through operating system controls, the items being repositioned on the display by an amount corresponding to motion of the first and second images.
6. A controller of claim 1, further comprising frame difference electronics for storing and subsequently subtracting pixel-by-pixel difference data.
7. A controller of claim 6, wherein the difference electronics comprise multiple frame memory, a subtraction circuit, and a state machine controller/ memory addresser to control data flow.
8. A controller of claim 1, further comprising N frame video memory for storing frames of image data.
9. A controller of claim 1, further comprising a DSP for implementing select algorithms on difference frames or raw frames of image data.
10. A controller of claim 9, further comprising memory selected from the group of EPROM and RAM.
11. A controller of claim 9, further comprising means for interfacing the DSP to a PCI bus in the computer.
12. A controller of claim 1, further comprising MPEG compression electronics for compressing video for the computer.
13. A controller of claim 1, wherein the signal processing means comprises frame differencing means for removing unchanged information from image frames.
14. A controller of claim 1, wherein the signal processing means comprises frame memory to buffer one or more image frames.
15. A controller of claim 14, wherein the signal processing means comprises a frame differencer for reading a delayed frame from the frame memory and for subtracting the delayed frame from a current image frame.
16. A controller of claim 1, wherein the signal processing means comprises correlation means for determining row and column shifts corresponding to differences between a current image frame and a delayed image frame.
17. A controller of claim 16, wherein the signal processing means comprises best fit algorithm means for minimizing the shifts to provide alignment.
18. A controller of claim 17, wherein the best fit algorithm means utilizes a peak detect algorithm.
19. A controller of claim 1, wherein the signal processing means comprises video cursor control for enabling and alternatively disabling cursor control.
20. A controller of claim 19, wherein the video cursor control comprises means responsive to keystrokes at the computer.
21. A controller of claim 19, wherein the video cursor control comprises means responsive to a blink of an eye of the user.
22. A controller of claim 19, wherein the video cursor control comprises means responsive to sound generated by the user.
23. A controller of claim 22, further comprising a microphone to detect the sound.
24. A controller of claim 1, wherein the signal processing means comprises a complex multiplier for providing a two dimensional inverse FFT operation.
25. A controller of claim 24, wherein the signal processing means comprises a peak detect for determining a shift associated with aligning difference images.
26. A controller of claim 1, wherein the signal processing means comprises FFT means for providing a two dimensional FFT of image data.
27. A controller of claim 1, wherein the signal processing means comprises means for identifying parts of the user, the parts being selected from the group of a hand, elbow, head, neck, ears, and forehead.
28. A controller of claim 1, wherein the signal processing means comprises means for detecting left and right movement of a head of the user and for shifting the items in response to the left and right movement.
29. A controller of claim 1, wherein the signal processing means comprises means for detecting rotational movement of a head of the user and for rotating the items in response to the left and right movement.
29. A controller of claim 1, wherein the signal processing means comprises means for repositioning the items, if appropriate, at approximately every l/30th of a second.
30. A controller of claim 1, wherein the signal processing means comprises means for repositioning the items at a selected magnification as compared to actual movement of the user.
31. A controller of claim 1, wherein the signal processing means comprises means for storing image data of a head of the user at various orientations relative to the camera and for correlating image data to the stored image data to define head orientation, the head orientation being used to reposition the items.
32. A controller of claim 1, wherein the signal processing means comprises IR means for detecting heat associated with the user and for repositioning the items at a rate correlated to the heat.
33. A controller of claim 32, wherein the items correspond to computer gaming display images.
34. A controller of claim 32, wherein the heat corresponds to user stress.
35. A controller of claim 1, further comprising at least one other camera arranged to take images of at least a second part of the user.
36. A controller of claim 35, wherein the one other camera takes image data in a second electromagnetic spectrum.
37. A controller of claim 1, wherein the camera comprises a DSP.
38. A controller of claim 37, wherein the DSP processes difference information for the computer.
39. A controller of claim 1, wherein the signal processing means comprises a CPU within the computer.
40. A controller of claim 1, wherein the signal processing means comprises means for processing multiple image zones in frames of image data and for repositioning the items according to characteristics between zones.
41. A controller of claim 40, wherein one zone comprises image data corresponding to at least one eye of the user.
42. A controller of claim 41, wherein the signal processing means comprises means for determining a blink of the eye.
43. A controller of claim 42, wherein the signal processing means comprises means for disabling and alternatively enabling cursor control based upon the blink.
44. A controller of claim 1, wherein the signal processing means comprises means for processing image data to determine motion of at least one eye of the user and for repositioning the items based upon the motion.
45. A controller of claim 1, wherein the signal processing means comprises means for processing image data to determine motion of a pupil of at least one eye of the user and for repositioning the items based upon the motion.
46. A controller of claim 1, wherein the camera comprises a zoom attachment for automatically zooming into a desired magnification of at least one eye of the user.
47. A controller of claim 1, wherein the camera comprises zoom means for automatically focusing on the user as the user moves in distance from the camera.
48. A controller of claim 47, wherein the signal processing means comprises means for enlarging or shrinking the items on the display in response to focusing by the zoom means.
49. A controller of claim 1, wherein the signal processing means comprises means for determining edges of a head of the user and for repositioning the items in response to movements of the edges.
50. A controller of claim 1, wherein the signal processing means comprises means for isolating one or more objects held by the user and for repositioning the items in response to movement of the objects.
51. A controller of claim 1, wherein the signal processing means comprises means for isolating one or more parts of the user and for repositioning the items in response to movement of the parts.
52. A controller of claim 1, wherein the parts comprise at least one of a hand, head, and a foot.
53. A controller of claim 1, wherein the signal processing means comprises means for isolating one or more symbols associated with the user and for repositioning the items in response to movement of the symbols.
54. A controller of claim 1, further comprising a second camera constructed and arranged for viewing the user from above, the signal processing means having means for repositioning the items in response to movement detected from images in the second camera.
55. A controller of claim 54, wherein signal processing means comprises means for repositioning the items in response to forward and backward movement of the user as detected by the second camera.
56. A controller of claim 1, further comprising re-calibration means connected with the signal processing means for repositioning the items to an original position in response to a re-calibration event.
57. A controller of claim 1, wherein the re-calibration means comprises a microphone and the event comprises a sound generated by the user.
58. A controller of claim 1, wherein the camera comprises the re-calibration means for detecting a blink of the user.
59. A system for controlling a computer, comprising:
a transducer for converting optical signals to electrical signals; electronic means for converting electronic signals to digital data;
signal processor means for detecting motion in the digital data and providing a digital representation of said motion;
communication means for entering one or more of the electronic signals, digital data, and digital representation into the computer to manipulate a computer display in response to the motion.
60. A system of claim 59, wherein the computer comprises the signal processor means.
61. A system of claim 59, wherein said signal processor comprises a digital signal processor separate from a CPU within the computer.
62. A system of claim 59, wherein the transducer, electronic means and signal processor are constructed and arranged into a single device in communication with the computer.
63. A system of claim 59, wherein the communication means comprises one of RSI 70 video, a PCI bus interface, a digital computer interface, a serial computer interface.
64. A system of claim 59, further comprising means for repositioning a computer cursor in response to the motion.
65. A system of claim 59, wherein the transducer comprises one or more of a visible CCD camera and an IR camera.
66. A system of claim 59, wherein the transducer comprises a CCD camera having at least 2x2 imaging pixels.
67. A system of claim 66, wherein the camera comprises optics with various fields of view.
68. A system of claim 59, wherein the transducer comprises one of a CCD or a CMOS integrated circuit with digital outputs.
69. A system of claim 68, wherein the transducer generates RSI 70 ouput.
70. A system of claim 68, wherein the transducer generates RSI 70 digital output.
71. A system of claim 68, wherein the transducer generates digital resolutions of 4 bits or greater
72. A system of claim 59, wherein the signal processor comprises a video frame memory.
73. A system of claim 59, wherein the signal processor comprises frame difference functionality.
74. A system of claim 59, wherein the signal processor comprises video frame difference memory.
75. A system of claim 59, wherein the signal processor comprises correlation functionality.
76. A system of claim 59, wherein the signal processor comprises means for determining best fit motion.
77. A system of claim 59, further comprising means for controlling cursor movement.
78. A system of claim 59, further comprising means for segmenting video images to provide multiple digital representations of the motion corresponding to different portions of the digital representation.
79. A system of claim 78, wherein the optical signals are generated through image acquisition of a portion of a human.
80. A system of claim 78, wherein the optical signals are generated by viewing multiple features of a human.
81. A system of claim 59, further comprising neural net means for learning user motion over time.
PCT/US1999/000086 1998-01-06 1999-01-04 Human motion following computer mouse and game controller WO1999035633A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU22117/99A AU2211799A (en) 1998-01-06 1999-01-04 Human motion following computer mouse and game controller

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US7051298P 1998-01-06 1998-01-06
US60/070,512 1998-01-06
US10004698P 1998-09-11 1998-09-11
US60/100,046 1998-09-11

Publications (2)

Publication Number Publication Date
WO1999035633A2 true WO1999035633A2 (en) 1999-07-15
WO1999035633A3 WO1999035633A3 (en) 1999-09-23

Family

ID=26751208

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/000086 WO1999035633A2 (en) 1998-01-06 1999-01-04 Human motion following computer mouse and game controller

Country Status (2)

Country Link
AU (1) AU2211799A (en)
WO (1) WO1999035633A2 (en)

Cited By (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001088679A2 (en) * 2000-05-13 2001-11-22 Mathengine Plc Browser system and method of using it
WO2001095290A2 (en) * 2000-06-08 2001-12-13 Joachim Sauter Visualisation device and method
WO2002016875A1 (en) * 2000-08-24 2002-02-28 Siemens Aktiengesellschaft Method for querying target information and navigating within a card view, computer program product and navigation device
EP1220143A2 (en) * 2000-12-25 2002-07-03 Hitachi, Ltd. Electronics device applying an image sensor
WO2002075515A1 (en) 2001-03-15 2002-09-26 Ulf Parke An apparatus and method for controlling a cursor on a viewing screen
EP1270050A2 (en) * 2001-06-29 2003-01-02 Konami Corporation Game device, game controlling method and program
EP1279425A2 (en) * 2001-07-19 2003-01-29 Konami Corporation Video game apparatus, method and recording medium storing program for controlling movement of simulated camera in video game
WO2004034241A2 (en) * 2002-10-09 2004-04-22 Raphael Bachmann Rapid input device
WO2005010739A1 (en) * 2003-07-29 2005-02-03 Philips Intellectual Property & Standards Gmbh System and method for controlling the display of an image
WO2005078558A1 (en) * 2004-02-16 2005-08-25 Simone Soria A process for generating command signals, in particular for disabled users
WO2005094958A1 (en) * 2004-03-23 2005-10-13 Harmonix Music Systems, Inc. Method and apparatus for controlling a three-dimensional character in a three-dimensional gaming environment
EP1618930A1 (en) * 2004-02-18 2006-01-25 Sony Computer Entertainment Inc. Image display system, image processing system, and a video game system
US7058204B2 (en) 2000-10-03 2006-06-06 Gesturetek, Inc. Multiple camera control system
WO2006097722A2 (en) * 2005-03-15 2006-09-21 Intelligent Earth Limited Interface control
US7227526B2 (en) 2000-07-24 2007-06-05 Gesturetek, Inc. Video-based image control system
EP1655659A3 (en) * 2004-11-08 2007-10-31 Samsung Electronics Co., Ltd. Portable terminal and data input method therefor
EP2065795A1 (en) * 2007-11-30 2009-06-03 Koninklijke KPN N.V. Auto zoom display system and method
US7598942B2 (en) 2005-02-08 2009-10-06 Oblong Industries, Inc. System and method for gesture based control system
EP2151260A1 (en) * 2008-08-08 2010-02-10 Koninklijke Philips Electronics N.V. Calming device
WO2010086842A1 (en) 2009-02-02 2010-08-05 Laurent Nanot Mobile ergonomic movement controller
EP2249230A1 (en) 2009-05-04 2010-11-10 Topseed Technology Corp. Non-contact touchpad apparatus and method for operating the same
EP2249229A1 (en) 2009-05-04 2010-11-10 Topseed Technology Corp. Non-contact mouse apparatus and method for operating the same
US20100325590A1 (en) * 2009-06-22 2010-12-23 Fuminori Homma Operation control device, operation control method, and computer-readable recording medium
US7883415B2 (en) * 2003-09-15 2011-02-08 Sony Computer Entertainment Inc. Method and apparatus for adjusting a view of a scene being displayed according to tracked head motion
EP2305358A1 (en) * 2008-06-30 2011-04-06 Sony Computer Entertainment Inc. Portable type game device and method for controlling portable type game device
EP2359916A1 (en) * 2010-01-27 2011-08-24 NAMCO BANDAI Games Inc. Display image generation device and display image generation method
FR2960315A1 (en) * 2010-05-20 2011-11-25 Opynov Method for capturing movement of head, hand and/or finger of person interacted or communicated with e.g. mobile terminal screen, involves estimating human movement from change of identified thermal radiation area
US8166421B2 (en) 2008-01-14 2012-04-24 Primesense Ltd. Three-dimensional user interface
EP2450087A1 (en) * 2010-10-28 2012-05-09 Konami Digital Entertainment Co., Ltd. Game device, control method for a game device, a program, and a information storage medium
EP2485118A1 (en) * 2009-09-29 2012-08-08 Alcatel Lucent Method for viewing points detecting and apparatus thereof
US8249334B2 (en) 2006-05-11 2012-08-21 Primesense Ltd. Modeling of humanoid forms from depth maps
WO2013003414A3 (en) * 2011-06-28 2013-02-28 Google Inc. Methods and systems for correlating head movement with items displayed on a user interface
US8407725B2 (en) 2007-04-24 2013-03-26 Oblong Industries, Inc. Proteins, pools, and slawx in processing environments
EP2629179A1 (en) * 2012-02-15 2013-08-21 Samsung Electronics Co., Ltd. Eye tracking method and display apparatus using the same
US8531396B2 (en) 2006-02-08 2013-09-10 Oblong Industries, Inc. Control system for navigating a principal dimension of a data space
US8537111B2 (en) 2006-02-08 2013-09-17 Oblong Industries, Inc. Control system for navigating a principal dimension of a data space
US8537231B2 (en) 2002-11-20 2013-09-17 Koninklijke Philips N.V. User interface system based on pointing device
US8537112B2 (en) 2006-02-08 2013-09-17 Oblong Industries, Inc. Control system for navigating a principal dimension of a data space
US8542907B2 (en) 2007-12-17 2013-09-24 Sony Computer Entertainment America Llc Dynamic three-dimensional object mapping for user-defined control device
EP2374514A3 (en) * 2010-03-31 2013-10-09 NAMCO BANDAI Games Inc. Image generation system, image generation method, and information storage medium
US8565479B2 (en) 2009-08-13 2013-10-22 Primesense Ltd. Extraction of skeletons from 3D maps
US8582867B2 (en) 2010-09-16 2013-11-12 Primesense Ltd Learning-based pose estimation from depth maps
US8594425B2 (en) 2010-05-31 2013-11-26 Primesense Ltd. Analysis of three-dimensional scenes
US8787663B2 (en) 2010-03-01 2014-07-22 Primesense Ltd. Tracking body parts by combined color image and depth processing
US8872762B2 (en) 2010-12-08 2014-10-28 Primesense Ltd. Three dimensional user interface cursor control
US8881051B2 (en) 2011-07-05 2014-11-04 Primesense Ltd Zoom-based gesture user interface
US8933876B2 (en) 2010-12-13 2015-01-13 Apple Inc. Three dimensional user interface session control
US8959013B2 (en) 2010-09-27 2015-02-17 Apple Inc. Virtual keyboard for a non-tactile three dimensional user interface
US9002099B2 (en) 2011-09-11 2015-04-07 Apple Inc. Learning-based estimation of hand and finger pose
US9019267B2 (en) 2012-10-30 2015-04-28 Apple Inc. Depth mapping with enhanced resolution
US9030498B2 (en) 2011-08-15 2015-05-12 Apple Inc. Combining explicit select gestures and timeclick in a non-tactile three dimensional user interface
US9035876B2 (en) 2008-01-14 2015-05-19 Apple Inc. Three-dimensional user interface session control
US9047507B2 (en) 2012-05-02 2015-06-02 Apple Inc. Upper-body skeleton extraction from depth maps
US9075441B2 (en) 2006-02-08 2015-07-07 Oblong Industries, Inc. Gesture based control using three-dimensional information extracted over an extended depth of field
US9122311B2 (en) 2011-08-24 2015-09-01 Apple Inc. Visual feedback for tactile and non-tactile user interfaces
US9158375B2 (en) 2010-07-20 2015-10-13 Apple Inc. Interactive reality augmentation for natural interaction
US9201501B2 (en) 2010-07-20 2015-12-01 Apple Inc. Adaptive projector
US9218063B2 (en) 2011-08-24 2015-12-22 Apple Inc. Sessionless pointing user interface
US9229534B2 (en) 2012-02-28 2016-01-05 Apple Inc. Asymmetric mapping for tactile and non-tactile user interfaces
US9285883B2 (en) 2011-03-01 2016-03-15 Qualcomm Incorporated System and method to display content based on viewing orientation
US9285874B2 (en) 2011-02-09 2016-03-15 Apple Inc. Gaze detection in a 3D mapping environment
US9329723B2 (en) 2012-04-16 2016-05-03 Apple Inc. Reconstruction of original touch image from differential touch image
US9372576B2 (en) 2008-01-04 2016-06-21 Apple Inc. Image jaggedness filter for determining whether to perform baseline calculations
US9377863B2 (en) 2012-03-26 2016-06-28 Apple Inc. Gaze-enhanced virtual touchscreen
US9377865B2 (en) 2011-07-05 2016-06-28 Apple Inc. Zoom-based gesture user interface
US9459758B2 (en) 2011-07-05 2016-10-04 Apple Inc. Gesture-based interface with enhanced features
US9582131B2 (en) 2009-06-29 2017-02-28 Apple Inc. Touch sensor panel design
CN106791317A (en) * 2016-12-30 2017-05-31 天津航正科技有限公司 A kind of motion diagram retrieval device of human motion
US9682319B2 (en) 2002-07-31 2017-06-20 Sony Interactive Entertainment Inc. Combiner method for altering game gearing
US9684380B2 (en) 2009-04-02 2017-06-20 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US9682320B2 (en) 2002-07-22 2017-06-20 Sony Interactive Entertainment Inc. Inertially trackable hand-held controller
US9740922B2 (en) 2008-04-24 2017-08-22 Oblong Industries, Inc. Adaptive tracking system for spatial input devices
US9740293B2 (en) 2009-04-02 2017-08-22 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US9779131B2 (en) 2008-04-24 2017-10-03 Oblong Industries, Inc. Detecting, representing, and interpreting three-space input: gestural continuum subsuming freespace, proximal, and surface-contact modes
US9823747B2 (en) 2006-02-08 2017-11-21 Oblong Industries, Inc. Spatial, multi-modal control device for use with spatial operating system
US9880655B2 (en) 2014-09-02 2018-01-30 Apple Inc. Method of disambiguating water from a finger touch on a touch sensor panel
US9886141B2 (en) 2013-08-16 2018-02-06 Apple Inc. Mutual and self capacitance touch measurements in touch panel
US9910497B2 (en) 2006-02-08 2018-03-06 Oblong Industries, Inc. Gestural control of autonomous and semi-autonomous systems
US9933852B2 (en) 2009-10-14 2018-04-03 Oblong Industries, Inc. Multi-process interactive systems and methods
US9952673B2 (en) 2009-04-02 2018-04-24 Oblong Industries, Inc. Operating environment comprising multiple client devices, multiple displays, multiple users, and gestural control
US9990046B2 (en) 2014-03-17 2018-06-05 Oblong Industries, Inc. Visual collaboration interface
US9996175B2 (en) 2009-02-02 2018-06-12 Apple Inc. Switching circuitry for touch sensitive display
US10001888B2 (en) 2009-04-10 2018-06-19 Apple Inc. Touch sensor panel design
US10043279B1 (en) 2015-12-07 2018-08-07 Apple Inc. Robust detection and classification of body parts in a depth map
US10099147B2 (en) 2004-08-19 2018-10-16 Sony Interactive Entertainment Inc. Using a portable device to interface with a video game rendered on a main display
US10099130B2 (en) 2002-07-27 2018-10-16 Sony Interactive Entertainment America Llc Method and system for applying gearing effects to visual tracking
US10220302B2 (en) 2002-07-27 2019-03-05 Sony Interactive Entertainment Inc. Method and apparatus for tracking three-dimensional movements of an object using a depth sensing camera
US10279254B2 (en) 2005-10-26 2019-05-07 Sony Interactive Entertainment Inc. Controller having visually trackable object for interfacing with a gaming system
US10289251B2 (en) 2014-06-27 2019-05-14 Apple Inc. Reducing floating ground effects in pixelated self-capacitance touch screens
US10366278B2 (en) 2016-09-20 2019-07-30 Apple Inc. Curvature-based face detector
US10365773B2 (en) 2015-09-30 2019-07-30 Apple Inc. Flexible scan plan using coarse mutual capacitance and fully-guarded measurements
US10386965B2 (en) 2017-04-20 2019-08-20 Apple Inc. Finger tracking in wet environment
US10444918B2 (en) 2016-09-06 2019-10-15 Apple Inc. Back of cover touch sensors
US10488992B2 (en) 2015-03-10 2019-11-26 Apple Inc. Multi-chip touch architecture for scalability
US10529302B2 (en) 2016-07-07 2020-01-07 Oblong Industries, Inc. Spatially mediated augmentations of and interactions among distinct devices and applications via extended pixel manifold
US10565030B2 (en) 2006-02-08 2020-02-18 Oblong Industries, Inc. Multi-process interactive systems and methods
US10642364B2 (en) 2009-04-02 2020-05-05 Oblong Industries, Inc. Processing tracking and recognition data in gestural recognition systems
US10705658B2 (en) 2014-09-22 2020-07-07 Apple Inc. Ungrounded user signal compensation for pixelated self-capacitance touch sensor panel
US10712867B2 (en) 2014-10-27 2020-07-14 Apple Inc. Pixelated self-capacitance water rejection
US10795488B2 (en) 2015-02-02 2020-10-06 Apple Inc. Flexible self-capacitance and mutual capacitance touch sensing system architecture
US10824238B2 (en) 2009-04-02 2020-11-03 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
USRE48417E1 (en) 2006-09-28 2021-02-02 Sony Interactive Entertainment Inc. Object direction using video input combined with tilt angle information
US10936120B2 (en) 2014-05-22 2021-03-02 Apple Inc. Panel bootstraping architectures for in-cell self-capacitance
US10990454B2 (en) 2009-10-14 2021-04-27 Oblong Industries, Inc. Multi-process interactive systems and methods
US11010971B2 (en) 2003-05-29 2021-05-18 Sony Interactive Entertainment Inc. User-driven three-dimensional interactive gaming environment
RU2750593C1 (en) * 2020-11-10 2021-06-29 Михаил Юрьевич Шагиев Method for emulating pressing of directional arrows on keyboard, joystick, or movement of computer mouse connected to computer device, depending on user's position in space
US11269467B2 (en) 2007-10-04 2022-03-08 Apple Inc. Single-layer touch-sensitive display
US11294503B2 (en) 2008-01-04 2022-04-05 Apple Inc. Sensor baseline offset adjustment for a subset of sensor output values
US11662867B1 (en) 2020-05-30 2023-05-30 Apple Inc. Hover detection on a touch sensor panel

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9393487B2 (en) 2002-07-27 2016-07-19 Sony Interactive Entertainment Inc. Method for mapping movements of a hand-held controller to game commands
US8313380B2 (en) 2002-07-27 2012-11-20 Sony Computer Entertainment America Llc Scheme for translating movements of a hand-held controller into inputs for a system
US7760248B2 (en) 2002-07-27 2010-07-20 Sony Computer Entertainment Inc. Selective sound source listening in conjunction with computer interactive processing
US9573056B2 (en) 2005-10-26 2017-02-21 Sony Interactive Entertainment Inc. Expandable control device via hardware attachment
CN102016877B (en) 2008-02-27 2014-12-10 索尼计算机娱乐美国有限责任公司 Methods for capturing depth data of a scene and applying computer actions
US9495013B2 (en) 2008-04-24 2016-11-15 Oblong Industries, Inc. Multi-modal gestural interface
US8961313B2 (en) 2009-05-29 2015-02-24 Sony Computer Entertainment America Llc Multi-positional three-dimensional controller
US9317128B2 (en) 2009-04-02 2016-04-19 Oblong Industries, Inc. Remote devices used in a markerless installation of a spatial operating environment incorporating gestural control

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4950069A (en) * 1988-11-04 1990-08-21 University Of Virginia Eye movement detector with improved calibration and speed
US5168531A (en) * 1991-06-27 1992-12-01 Digital Equipment Corporation Real-time recognition of pointing information from video
US5252950A (en) * 1991-12-20 1993-10-12 Apple Computer, Inc. Display with rangefinder
US5287473A (en) * 1990-12-14 1994-02-15 International Business Machines Corporation Non-blocking serialization for removing data from a shared cache
US5367315A (en) * 1990-11-15 1994-11-22 Eyetech Corporation Method and apparatus for controlling cursor movement
US5581276A (en) * 1992-09-08 1996-12-03 Kabushiki Kaisha Toshiba 3D human interface apparatus using motion recognition based on dynamic image processing
US5617312A (en) * 1993-11-19 1997-04-01 Hitachi, Ltd. Computer system that enters control information by means of video camera

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4950069A (en) * 1988-11-04 1990-08-21 University Of Virginia Eye movement detector with improved calibration and speed
US5367315A (en) * 1990-11-15 1994-11-22 Eyetech Corporation Method and apparatus for controlling cursor movement
US5287473A (en) * 1990-12-14 1994-02-15 International Business Machines Corporation Non-blocking serialization for removing data from a shared cache
US5168531A (en) * 1991-06-27 1992-12-01 Digital Equipment Corporation Real-time recognition of pointing information from video
US5252950A (en) * 1991-12-20 1993-10-12 Apple Computer, Inc. Display with rangefinder
US5581276A (en) * 1992-09-08 1996-12-03 Kabushiki Kaisha Toshiba 3D human interface apparatus using motion recognition based on dynamic image processing
US5617312A (en) * 1993-11-19 1997-04-01 Hitachi, Ltd. Computer system that enters control information by means of video camera

Cited By (171)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001088679A2 (en) * 2000-05-13 2001-11-22 Mathengine Plc Browser system and method of using it
WO2001088679A3 (en) * 2000-05-13 2003-10-09 Mathengine Plc Browser system and method of using it
WO2001095290A2 (en) * 2000-06-08 2001-12-13 Joachim Sauter Visualisation device and method
WO2001095290A3 (en) * 2000-06-08 2002-06-27 Joachim Sauter Visualisation device and method
US8274535B2 (en) 2000-07-24 2012-09-25 Qualcomm Incorporated Video-based image control system
US8624932B2 (en) 2000-07-24 2014-01-07 Qualcomm Incorporated Video-based image control system
US8963963B2 (en) 2000-07-24 2015-02-24 Qualcomm Incorporated Video-based image control system
US7227526B2 (en) 2000-07-24 2007-06-05 Gesturetek, Inc. Video-based image control system
US7898522B2 (en) 2000-07-24 2011-03-01 Gesturetek, Inc. Video-based image control system
WO2002016875A1 (en) * 2000-08-24 2002-02-28 Siemens Aktiengesellschaft Method for querying target information and navigating within a card view, computer program product and navigation device
US7126579B2 (en) 2000-08-24 2006-10-24 Siemens Aktiengesellschaft Method for requesting destination information and for navigating in a map view, computer program product and navigation unit
US7058204B2 (en) 2000-10-03 2006-06-06 Gesturetek, Inc. Multiple camera control system
US8131015B2 (en) 2000-10-03 2012-03-06 Qualcomm Incorporated Multiple camera control system
US8625849B2 (en) 2000-10-03 2014-01-07 Qualcomm Incorporated Multiple camera control system
US7555142B2 (en) 2000-10-03 2009-06-30 Gesturetek, Inc. Multiple camera control system
US7421093B2 (en) 2000-10-03 2008-09-02 Gesturetek, Inc. Multiple camera control system
EP1220143A2 (en) * 2000-12-25 2002-07-03 Hitachi, Ltd. Electronics device applying an image sensor
EP1220143A3 (en) * 2000-12-25 2006-06-07 Hitachi, Ltd. Electronics device applying an image sensor
WO2002075515A1 (en) 2001-03-15 2002-09-26 Ulf Parke An apparatus and method for controlling a cursor on a viewing screen
EP1270050A3 (en) * 2001-06-29 2005-06-29 Konami Corporation Game device, game controlling method and program
EP1270050A2 (en) * 2001-06-29 2003-01-02 Konami Corporation Game device, game controlling method and program
US7452275B2 (en) 2001-06-29 2008-11-18 Konami Digital Entertainment Co., Ltd. Game device, game controlling method and program
EP1279425A2 (en) * 2001-07-19 2003-01-29 Konami Corporation Video game apparatus, method and recording medium storing program for controlling movement of simulated camera in video game
US6890262B2 (en) * 2001-07-19 2005-05-10 Konami Corporation Video game apparatus, method and recording medium storing program for controlling viewpoint movement of simulated camera in video game
EP1279425A3 (en) * 2001-07-19 2003-03-26 Konami Corporation Video game apparatus, method and recording medium storing program for controlling movement of simulated camera in video game
US9682320B2 (en) 2002-07-22 2017-06-20 Sony Interactive Entertainment Inc. Inertially trackable hand-held controller
US10406433B2 (en) 2002-07-27 2019-09-10 Sony Interactive Entertainment America Llc Method and system for applying gearing effects to visual tracking
US10220302B2 (en) 2002-07-27 2019-03-05 Sony Interactive Entertainment Inc. Method and apparatus for tracking three-dimensional movements of an object using a depth sensing camera
US10099130B2 (en) 2002-07-27 2018-10-16 Sony Interactive Entertainment America Llc Method and system for applying gearing effects to visual tracking
US9682319B2 (en) 2002-07-31 2017-06-20 Sony Interactive Entertainment Inc. Combiner method for altering game gearing
WO2004034241A2 (en) * 2002-10-09 2004-04-22 Raphael Bachmann Rapid input device
WO2004034241A3 (en) * 2002-10-09 2005-07-28 Raphael Bachmann Rapid input device
US8537231B2 (en) 2002-11-20 2013-09-17 Koninklijke Philips N.V. User interface system based on pointing device
US8971629B2 (en) 2002-11-20 2015-03-03 Koninklijke Philips N.V. User interface system based on pointing device
US8970725B2 (en) 2002-11-20 2015-03-03 Koninklijke Philips N.V. User interface system based on pointing device
US11010971B2 (en) 2003-05-29 2021-05-18 Sony Interactive Entertainment Inc. User-driven three-dimensional interactive gaming environment
WO2005010739A1 (en) * 2003-07-29 2005-02-03 Philips Intellectual Property & Standards Gmbh System and method for controlling the display of an image
US7883415B2 (en) * 2003-09-15 2011-02-08 Sony Computer Entertainment Inc. Method and apparatus for adjusting a view of a scene being displayed according to tracked head motion
WO2005078558A1 (en) * 2004-02-16 2005-08-25 Simone Soria A process for generating command signals, in particular for disabled users
EP1618930A4 (en) * 2004-02-18 2007-01-10 Sony Computer Entertainment Inc Image display system, image processing system, and a video game system
US7690975B2 (en) 2004-02-18 2010-04-06 Sony Computer Entertainment Inc. Image display system, image processing system, and video game system
EP1618930A1 (en) * 2004-02-18 2006-01-25 Sony Computer Entertainment Inc. Image display system, image processing system, and a video game system
WO2005094958A1 (en) * 2004-03-23 2005-10-13 Harmonix Music Systems, Inc. Method and apparatus for controlling a three-dimensional character in a three-dimensional gaming environment
US10099147B2 (en) 2004-08-19 2018-10-16 Sony Interactive Entertainment Inc. Using a portable device to interface with a video game rendered on a main display
EP1655659A3 (en) * 2004-11-08 2007-10-31 Samsung Electronics Co., Ltd. Portable terminal and data input method therefor
US8311370B2 (en) 2004-11-08 2012-11-13 Samsung Electronics Co., Ltd Portable terminal and data input method therefor
US9606630B2 (en) 2005-02-08 2017-03-28 Oblong Industries, Inc. System and method for gesture based control system
US7598942B2 (en) 2005-02-08 2009-10-06 Oblong Industries, Inc. System and method for gesture based control system
WO2006097722A3 (en) * 2005-03-15 2007-01-11 Intelligent Earth Ltd Interface control
WO2006097722A2 (en) * 2005-03-15 2006-09-21 Intelligent Earth Limited Interface control
US10279254B2 (en) 2005-10-26 2019-05-07 Sony Interactive Entertainment Inc. Controller having visually trackable object for interfacing with a gaming system
US10061392B2 (en) 2006-02-08 2018-08-28 Oblong Industries, Inc. Control system for navigating a principal dimension of a data space
US9910497B2 (en) 2006-02-08 2018-03-06 Oblong Industries, Inc. Gestural control of autonomous and semi-autonomous systems
US9823747B2 (en) 2006-02-08 2017-11-21 Oblong Industries, Inc. Spatial, multi-modal control device for use with spatial operating system
US9075441B2 (en) 2006-02-08 2015-07-07 Oblong Industries, Inc. Gesture based control using three-dimensional information extracted over an extended depth of field
US8537112B2 (en) 2006-02-08 2013-09-17 Oblong Industries, Inc. Control system for navigating a principal dimension of a data space
US10565030B2 (en) 2006-02-08 2020-02-18 Oblong Industries, Inc. Multi-process interactive systems and methods
US8531396B2 (en) 2006-02-08 2013-09-10 Oblong Industries, Inc. Control system for navigating a principal dimension of a data space
US8537111B2 (en) 2006-02-08 2013-09-17 Oblong Industries, Inc. Control system for navigating a principal dimension of a data space
US8249334B2 (en) 2006-05-11 2012-08-21 Primesense Ltd. Modeling of humanoid forms from depth maps
USRE48417E1 (en) 2006-09-28 2021-02-02 Sony Interactive Entertainment Inc. Object direction using video input combined with tilt angle information
US9804902B2 (en) 2007-04-24 2017-10-31 Oblong Industries, Inc. Proteins, pools, and slawx in processing environments
US8407725B2 (en) 2007-04-24 2013-03-26 Oblong Industries, Inc. Proteins, pools, and slawx in processing environments
US10664327B2 (en) 2007-04-24 2020-05-26 Oblong Industries, Inc. Proteins, pools, and slawx in processing environments
US11269467B2 (en) 2007-10-04 2022-03-08 Apple Inc. Single-layer touch-sensitive display
EP2065795A1 (en) * 2007-11-30 2009-06-03 Koninklijke KPN N.V. Auto zoom display system and method
US8542907B2 (en) 2007-12-17 2013-09-24 Sony Computer Entertainment America Llc Dynamic three-dimensional object mapping for user-defined control device
US11294503B2 (en) 2008-01-04 2022-04-05 Apple Inc. Sensor baseline offset adjustment for a subset of sensor output values
US9372576B2 (en) 2008-01-04 2016-06-21 Apple Inc. Image jaggedness filter for determining whether to perform baseline calculations
US9035876B2 (en) 2008-01-14 2015-05-19 Apple Inc. Three-dimensional user interface session control
US8166421B2 (en) 2008-01-14 2012-04-24 Primesense Ltd. Three-dimensional user interface
US10739865B2 (en) 2008-04-24 2020-08-11 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US9779131B2 (en) 2008-04-24 2017-10-03 Oblong Industries, Inc. Detecting, representing, and interpreting three-space input: gestural continuum subsuming freespace, proximal, and surface-contact modes
US9740922B2 (en) 2008-04-24 2017-08-22 Oblong Industries, Inc. Adaptive tracking system for spatial input devices
US10067571B2 (en) 2008-04-24 2018-09-04 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US10353483B2 (en) 2008-04-24 2019-07-16 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US10255489B2 (en) 2008-04-24 2019-04-09 Oblong Industries, Inc. Adaptive tracking system for spatial input devices
US10235412B2 (en) 2008-04-24 2019-03-19 Oblong Industries, Inc. Detecting, representing, and interpreting three-space input: gestural continuum subsuming freespace, proximal, and surface-contact modes
EP2305358A1 (en) * 2008-06-30 2011-04-06 Sony Computer Entertainment Inc. Portable type game device and method for controlling portable type game device
EP2305358A4 (en) * 2008-06-30 2011-08-03 Sony Computer Entertainment Inc Portable type game device and method for controlling portable type game device
US9662583B2 (en) 2008-06-30 2017-05-30 Sony Corporation Portable type game device and method for controlling portable type game device
CN102112174A (en) * 2008-08-08 2011-06-29 皇家飞利浦电子股份有限公司 Calming device
CN102112174B (en) * 2008-08-08 2015-01-28 皇家飞利浦电子股份有限公司 Calming device
JP2011530319A (en) * 2008-08-08 2011-12-22 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Device to calm the subject
EP2151260A1 (en) * 2008-08-08 2010-02-10 Koninklijke Philips Electronics N.V. Calming device
US8979731B2 (en) 2008-08-08 2015-03-17 Koninklijke Philips N.V. Calming device
WO2010015998A1 (en) 2008-08-08 2010-02-11 Koninklijke Philips Electronics N. V. Calming device
US9996175B2 (en) 2009-02-02 2018-06-12 Apple Inc. Switching circuitry for touch sensitive display
WO2010086842A1 (en) 2009-02-02 2010-08-05 Laurent Nanot Mobile ergonomic movement controller
US9740293B2 (en) 2009-04-02 2017-08-22 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US10656724B2 (en) 2009-04-02 2020-05-19 Oblong Industries, Inc. Operating environment comprising multiple client devices, multiple displays, multiple users, and gestural control
US10642364B2 (en) 2009-04-02 2020-05-05 Oblong Industries, Inc. Processing tracking and recognition data in gestural recognition systems
US10296099B2 (en) 2009-04-02 2019-05-21 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US9684380B2 (en) 2009-04-02 2017-06-20 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US9952673B2 (en) 2009-04-02 2018-04-24 Oblong Industries, Inc. Operating environment comprising multiple client devices, multiple displays, multiple users, and gestural control
US10824238B2 (en) 2009-04-02 2020-11-03 Oblong Industries, Inc. Operating environment with gestural control and multiple client devices, displays, and users
US10001888B2 (en) 2009-04-10 2018-06-19 Apple Inc. Touch sensor panel design
EP2249230A1 (en) 2009-05-04 2010-11-10 Topseed Technology Corp. Non-contact touchpad apparatus and method for operating the same
EP2249229A1 (en) 2009-05-04 2010-11-10 Topseed Technology Corp. Non-contact mouse apparatus and method for operating the same
US9128526B2 (en) * 2009-06-22 2015-09-08 Sony Corporation Operation control device, operation control method, and computer-readable recording medium for distinguishing an intended motion for gesture control
US20100325590A1 (en) * 2009-06-22 2010-12-23 Fuminori Homma Operation control device, operation control method, and computer-readable recording medium
EP2273345A3 (en) * 2009-06-22 2014-11-26 Sony Corporation Movement controlled computer
US9582131B2 (en) 2009-06-29 2017-02-28 Apple Inc. Touch sensor panel design
US8565479B2 (en) 2009-08-13 2013-10-22 Primesense Ltd. Extraction of skeletons from 3D maps
EP2485118A4 (en) * 2009-09-29 2014-05-14 Alcatel Lucent Method for viewing points detecting and apparatus thereof
EP2485118A1 (en) * 2009-09-29 2012-08-08 Alcatel Lucent Method for viewing points detecting and apparatus thereof
US9933852B2 (en) 2009-10-14 2018-04-03 Oblong Industries, Inc. Multi-process interactive systems and methods
US10990454B2 (en) 2009-10-14 2021-04-27 Oblong Industries, Inc. Multi-process interactive systems and methods
EP2359916A1 (en) * 2010-01-27 2011-08-24 NAMCO BANDAI Games Inc. Display image generation device and display image generation method
US8787663B2 (en) 2010-03-01 2014-07-22 Primesense Ltd. Tracking body parts by combined color image and depth processing
EP2374514A3 (en) * 2010-03-31 2013-10-09 NAMCO BANDAI Games Inc. Image generation system, image generation method, and information storage medium
US8556716B2 (en) 2010-03-31 2013-10-15 Namco Bandai Games Inc. Image generation system, image generation method, and information storage medium
FR2960315A1 (en) * 2010-05-20 2011-11-25 Opynov Method for capturing movement of head, hand and/or finger of person interacted or communicated with e.g. mobile terminal screen, involves estimating human movement from change of identified thermal radiation area
US8781217B2 (en) 2010-05-31 2014-07-15 Primesense Ltd. Analysis of three-dimensional scenes with a surface model
US8594425B2 (en) 2010-05-31 2013-11-26 Primesense Ltd. Analysis of three-dimensional scenes
US8824737B2 (en) 2010-05-31 2014-09-02 Primesense Ltd. Identifying components of a humanoid form in three-dimensional scenes
US9201501B2 (en) 2010-07-20 2015-12-01 Apple Inc. Adaptive projector
US9158375B2 (en) 2010-07-20 2015-10-13 Apple Inc. Interactive reality augmentation for natural interaction
US8582867B2 (en) 2010-09-16 2013-11-12 Primesense Ltd Learning-based pose estimation from depth maps
US8959013B2 (en) 2010-09-27 2015-02-17 Apple Inc. Virtual keyboard for a non-tactile three dimensional user interface
US8414393B2 (en) 2010-10-28 2013-04-09 Konami Digital Entertainment Co., Ltd. Game device, control method for a game device, and a non-transitory information storage medium
EP2450087A1 (en) * 2010-10-28 2012-05-09 Konami Digital Entertainment Co., Ltd. Game device, control method for a game device, a program, and a information storage medium
US8740704B2 (en) 2010-10-28 2014-06-03 Konami Digital Entertainment Co., Ltd. Game device, control method for a game device, and a non-transitory information storage medium
US8872762B2 (en) 2010-12-08 2014-10-28 Primesense Ltd. Three dimensional user interface cursor control
US8933876B2 (en) 2010-12-13 2015-01-13 Apple Inc. Three dimensional user interface session control
US9454225B2 (en) 2011-02-09 2016-09-27 Apple Inc. Gaze-based display control
US9342146B2 (en) 2011-02-09 2016-05-17 Apple Inc. Pointing-based display interaction
US9285874B2 (en) 2011-02-09 2016-03-15 Apple Inc. Gaze detection in a 3D mapping environment
US9285883B2 (en) 2011-03-01 2016-03-15 Qualcomm Incorporated System and method to display content based on viewing orientation
WO2013003414A3 (en) * 2011-06-28 2013-02-28 Google Inc. Methods and systems for correlating head movement with items displayed on a user interface
US8881051B2 (en) 2011-07-05 2014-11-04 Primesense Ltd Zoom-based gesture user interface
US9459758B2 (en) 2011-07-05 2016-10-04 Apple Inc. Gesture-based interface with enhanced features
US9377865B2 (en) 2011-07-05 2016-06-28 Apple Inc. Zoom-based gesture user interface
US9030498B2 (en) 2011-08-15 2015-05-12 Apple Inc. Combining explicit select gestures and timeclick in a non-tactile three dimensional user interface
US9122311B2 (en) 2011-08-24 2015-09-01 Apple Inc. Visual feedback for tactile and non-tactile user interfaces
US9218063B2 (en) 2011-08-24 2015-12-22 Apple Inc. Sessionless pointing user interface
US9002099B2 (en) 2011-09-11 2015-04-07 Apple Inc. Learning-based estimation of hand and finger pose
US9218056B2 (en) 2012-02-15 2015-12-22 Samsung Electronics Co., Ltd. Eye tracking method and display apparatus using the same
EP2629179A1 (en) * 2012-02-15 2013-08-21 Samsung Electronics Co., Ltd. Eye tracking method and display apparatus using the same
KR101922589B1 (en) * 2012-02-15 2018-11-27 삼성전자주식회사 Display apparatus and eye tracking method thereof
US9229534B2 (en) 2012-02-28 2016-01-05 Apple Inc. Asymmetric mapping for tactile and non-tactile user interfaces
US9377863B2 (en) 2012-03-26 2016-06-28 Apple Inc. Gaze-enhanced virtual touchscreen
US11169611B2 (en) 2012-03-26 2021-11-09 Apple Inc. Enhanced virtual touchpad
US9874975B2 (en) 2012-04-16 2018-01-23 Apple Inc. Reconstruction of original touch image from differential touch image
US9329723B2 (en) 2012-04-16 2016-05-03 Apple Inc. Reconstruction of original touch image from differential touch image
US9047507B2 (en) 2012-05-02 2015-06-02 Apple Inc. Upper-body skeleton extraction from depth maps
US9019267B2 (en) 2012-10-30 2015-04-28 Apple Inc. Depth mapping with enhanced resolution
US9886141B2 (en) 2013-08-16 2018-02-06 Apple Inc. Mutual and self capacitance touch measurements in touch panel
US10338693B2 (en) 2014-03-17 2019-07-02 Oblong Industries, Inc. Visual collaboration interface
US10627915B2 (en) 2014-03-17 2020-04-21 Oblong Industries, Inc. Visual collaboration interface
US9990046B2 (en) 2014-03-17 2018-06-05 Oblong Industries, Inc. Visual collaboration interface
US10936120B2 (en) 2014-05-22 2021-03-02 Apple Inc. Panel bootstraping architectures for in-cell self-capacitance
US10289251B2 (en) 2014-06-27 2019-05-14 Apple Inc. Reducing floating ground effects in pixelated self-capacitance touch screens
US9880655B2 (en) 2014-09-02 2018-01-30 Apple Inc. Method of disambiguating water from a finger touch on a touch sensor panel
US11625124B2 (en) 2014-09-22 2023-04-11 Apple Inc. Ungrounded user signal compensation for pixelated self-capacitance touch sensor panel
US10705658B2 (en) 2014-09-22 2020-07-07 Apple Inc. Ungrounded user signal compensation for pixelated self-capacitance touch sensor panel
US10712867B2 (en) 2014-10-27 2020-07-14 Apple Inc. Pixelated self-capacitance water rejection
US11561647B2 (en) 2014-10-27 2023-01-24 Apple Inc. Pixelated self-capacitance water rejection
US10795488B2 (en) 2015-02-02 2020-10-06 Apple Inc. Flexible self-capacitance and mutual capacitance touch sensing system architecture
US11353985B2 (en) 2015-02-02 2022-06-07 Apple Inc. Flexible self-capacitance and mutual capacitance touch sensing system architecture
US10488992B2 (en) 2015-03-10 2019-11-26 Apple Inc. Multi-chip touch architecture for scalability
US10365773B2 (en) 2015-09-30 2019-07-30 Apple Inc. Flexible scan plan using coarse mutual capacitance and fully-guarded measurements
US10043279B1 (en) 2015-12-07 2018-08-07 Apple Inc. Robust detection and classification of body parts in a depth map
US10529302B2 (en) 2016-07-07 2020-01-07 Oblong Industries, Inc. Spatially mediated augmentations of and interactions among distinct devices and applications via extended pixel manifold
US10444918B2 (en) 2016-09-06 2019-10-15 Apple Inc. Back of cover touch sensors
US10366278B2 (en) 2016-09-20 2019-07-30 Apple Inc. Curvature-based face detector
CN106791317A (en) * 2016-12-30 2017-05-31 天津航正科技有限公司 A kind of motion diagram retrieval device of human motion
US10386965B2 (en) 2017-04-20 2019-08-20 Apple Inc. Finger tracking in wet environment
US10642418B2 (en) 2017-04-20 2020-05-05 Apple Inc. Finger tracking in wet environment
US11662867B1 (en) 2020-05-30 2023-05-30 Apple Inc. Hover detection on a touch sensor panel
RU2750593C1 (en) * 2020-11-10 2021-06-29 Михаил Юрьевич Шагиев Method for emulating pressing of directional arrows on keyboard, joystick, or movement of computer mouse connected to computer device, depending on user's position in space

Also Published As

Publication number Publication date
AU2211799A (en) 1999-07-26
WO1999035633A3 (en) 1999-09-23

Similar Documents

Publication Publication Date Title
WO1999035633A2 (en) Human motion following computer mouse and game controller
US10635895B2 (en) Gesture-based casting and manipulation of virtual content in artificial-reality environments
JP6845982B2 (en) Facial expression recognition system, facial expression recognition method and facial expression recognition program
US9411417B2 (en) Eye gaze tracking system and method
Zhu et al. Novel eye gaze tracking techniques under natural head movement
KR102065687B1 (en) Wireless wrist computing and control device and method for 3d imaging, mapping, networking and interfacing
Morimoto et al. Keeping an eye for HCI
US9436277B2 (en) System and method for producing computer control signals from breath attributes
US7872635B2 (en) Foveated display eye-tracking system and method
Chan et al. Cyclops: Wearable and single-piece full-body gesture input devices
AU2018277842A1 (en) Eye tracking calibration techniques
CN112926423B (en) Pinch gesture detection and recognition method, device and system
US20140232749A1 (en) Vision-based augmented reality system using invisible marker
KR101892735B1 (en) Apparatus and Method for Intuitive Interaction
US20110199302A1 (en) Capturing screen objects using a collision volume
US20140139429A1 (en) System and method for computer vision based hand gesture identification
WO2002001336A2 (en) Automated visual tracking for computer access
Yeo et al. Opisthenar: Hand poses and finger tapping recognition by observing back of hand using embedded wrist camera
bin Mohd Sidik et al. A study on natural interaction for human body motion using depth image data
Lemley et al. Eye tracking in augmented spaces: A deep learning approach
US20180081430A1 (en) Hybrid computer interface system
Chung et al. Postrack: A low cost real-time motion tracking system for vr application
Borsato et al. A fast and accurate eye tracker using stroboscopic differential lighting
Mihara et al. A real‐time vision‐based interface using motion processor and applications to robotics
Park et al. Implementation of an eye gaze tracking system for the disabled people

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

NENP Non-entry into the national phase in:

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWE Wipo information: entry into national phase

Ref document number: 09582806

Country of ref document: US

122 Ep: pct application non-entry in european phase