In virtual reality (VR) and augmented reality (AR), a pose tracking system detects the precise pose of head-mounted displays, controllers, other objects or body parts within Euclidean space. Pose tracking is often referred to as 6DOF tracking, for the six degrees of freedom in which the pose is often tracked. [1]
Pose tracking is sometimes referred to as positional tracking, but the two are separate. Pose tracking is different from positional tracking because pose tracking includes orientation whereas and positional tracking does not. In some consumer GPS systems, orientation data is added additionally using magnetometers, which give partial orientation information, but not the full orientation that pose tracking provides.
In VR, it is paramount that pose tracking is both accurate and precise so as not to break the illusion of a being in virtual world. Several methods of tracking the position and orientation (pitch, yaw and roll) of the display and any associated objects or devices have been developed to achieve this. Many methods utilize sensors which repeatedly record signals from transmitters on or near the tracked object(s), and then send that data to the computer in order to maintain an approximation of their physical locations. A popular tracking method is Lighthouse tracking. By and large, these physical locations are identified and defined using one or more of three coordinate systems: the Cartesian rectilinear system, the spherical polar system, and the cylindrical system. Many interfaces have also been designed to monitor and control one's movement within and interaction with the virtual 3D space; such interfaces must work closely with positional tracking systems to provide a seamless user experience. [2]
Another type of pose tracking used more often in newer systems is referred to as inside-out tracking, including Simultaneous localization and mapping (SLAM) or Visual-inertial odometry (VIO). One example of a device that uses inside-out pose tracking is the Oculus Quest 2.
Wireless tracking uses a set of anchors that are placed around the perimeter of the tracking space and one or more tags that are tracked. This system is similar in concept to GPS, but works both indoors and outdoors. Sometimes referred to as indoor GPS. The tags triangulate their 3D position using the anchors placed around the perimeter. A wireless technology called Ultra Wideband has enabled the position tracking to reach a precision of under 100 mm. By using sensor fusion and high speed algorithms, the tracking precision can reach 5 mm level with update speeds of 200 Hz or 5 ms latency.
Pros:
Cons:
Optical tracking uses cameras placed on or around the headset to determine position and orientation based on computer vision algorithms. This method is based on the same principle as stereoscopic human vision. When a person looks at an object using binocular vision, they are able to define approximately at what distance the object is placed due to the difference in perspective between the two eyes. In optical tracking, cameras are calibrated to determine the distance to the object and its position in space. Optical systems are reliable and relatively inexpensive, but they can be difficult to calibrate. Furthermore, the system requires a direct line of light without occlusions, otherwise it will receive wrong data.
Optical tracking can be done either with or without markers. Tracking with markers involves targets with known patterns to serve as reference points, and cameras constantly seek these markers and then use various algorithms (for example, POSIT algorithm) to extract the position of the object. Markers can be visible, such as printed QR codes, but many use infrared (IR) light that can only be picked up by cameras. Active implementations feature markers with built-in IR LED lights which can turn on and off to sync with the camera, making it easier to block out other IR lights in the tracking area. [5] Passive implementations are retroreflectors which reflect the IR light back towards the source with little scattering. Markerless tracking does not require any pre-placed targets, instead using the natural features of the surrounding environment to determine position and orientation. [6]
In this method, cameras are placed in stationary locations in the environment to track the position of markers on the tracked device, such as a head mounted display or controllers. Having multiple cameras allows for different views of the same markers, and this overlap allows for accurate readings of the device position. [5] The original Oculus Rift utilizes this technique, placing a constellation of IR LEDs on its headset and controllers to allow external cameras in the environment to read their positions. [7] This method is the most mature, having applications not only in VR but also in motion capture technology for film. [8] However, this solution is space-limited, needing external sensors in constant view of the device.
Pros:
Cons:
In this method, the camera is placed on the tracked device and looks outward to determine its location in the environment. Headsets that use this tech have multiple cameras facing different directions to get views of its entire surroundings. This method can work with or without markers. The Lighthouse system used by the HTC Vive is an example of active markers. Each external Lighthouse module contains IR LEDs as well as a laser array that sweeps in horizontal and vertical directions, and sensors on the headset and controllers can detect these sweeps and use the timings to determine position. [10] [11] Markerless tracking, such as on the Oculus Quest, does not require anything mounted in the outside environment. It uses cameras on the headset for a process called SLAM, or simultaneous localization and mapping, where a 3D map of the environment is generated in real time. [6] Machine learning algorithms then determine where the headset is positioned within that 3D map, using feature detection to reconstruct and analyze its surroundings. [12] [13] This tech allows high-end headsets like the Microsoft HoloLens to be self-contained, but it also opens the door for cheaper mobile headsets without the need of tethering to external computers or sensors. [14]
Pros:
Cons:
Inertial tracking use data from accelerometers and gyroscopes, and sometimes magnetometers. Accelerometers measure linear acceleration. Since the derivative of position with respect to time is velocity and the derivative of velocity is acceleration, the output of the accelerometer could be integrated to find the velocity and then integrated again to find the position relative to some initial point. Gyroscopes measure angular velocity. Angular velocity can be integrated as well to determine angular position relatively to the initial point. Magnetometers measure magnetic fields and magnetic dipole moments. The direction of Earth's magnetic field can be integrated to have an absolute orientation reference and to compensate for gyroscopic drifts. [15] Modern inertial measurement units systems (IMU) are based on MEMS technology allows to track the orientation (roll, pitch, yaw) in space with high update rates and minimal latency. Gyroscopes are always used for rotational tracking, but different techniques are used for positional tracking based on factors like cost, ease of setup, and tracking volume. [16]
Dead reckoning is used to track positional data, which alters the virtual environment by updating motion changes of the user. [17] The dead reckoning update rate and prediction algorithm used in a virtual reality system affect the user experience, but there is no consensus on best practices as many different techniques have been used. [17] It is hard to rely only on inertial tracking to determine the precise position because dead reckoning leads to drift, so this type of tracking is not used in isolation in virtual reality. [18] A lag between the user's movement and virtual reality display of more than 100ms has been found to cause nausea. [19]
Inertial sensors are not only capable of tracking rotational movement (roll, pitch, yaw), but also translational movement. These two types of movement together are known as the Six degrees of freedom. Many applications of virtual reality need to not only track the users’ head rotations, but also how their bodies move with them (left/right, back/forth, up/down). [20] Six degrees of freedom capability is not necessary for all virtual reality experiences, but it is useful when the user needs to move things other than their head.
Pros:
Cons:
Sensor fusion combines data from several tracking algorithms and can yield better outputs than only one technology. One of the variants of sensor fusion is to merge inertial and optical tracking. These two techniques are often used together because while inertial sensors are optimal for tracking fast movements they also accumulate errors quickly, and optical sensors offer absolute references to compensate for inertial weaknesses. [16] Further, inertial tracking can offset some shortfalls of optical tracking. For example, optical tracking can be the main tracking method, but when an occlusion occurs inertial tracking estimates the position until the objects are visible to the optical camera again. Inertial tracking could also generate position data in-between optical tracking position data because inertial tracking has higher update rate. Optical tracking also helps to cope with a drift of inertial tracking. Combining optical and inertial tracking has shown to reduce misalignment errors that commonly occur when a user moves their head too fast. [21] Microelectrical magnetic systems advancements have made magnetic/electric tracking more common due to their small size and low cost. [22]
Acoustic tracking systems use techniques for identifying an object or device's position similar to those found naturally in animals that use echolocation. Analogous to bats locating objects using differences in soundwave return times to their two ears, acoustic tracking systems in VR may use sets of at least three ultrasonic sensors and at least three ultrasonic transmitters on devices in order to calculate the position and orientation of an object (e.g. a handheld controller). [23] There are two ways to determine the position of the object: to measure time-of-flight of the sound wave from the transmitter to the receivers or the phase coherence of the sinusoidal sound wave by receiving the transfer.
Given a set of three noncollinear sensors (or receivers) with distances between them d1 and d2, as well as the travel times of an ultrasonic soundwave (a wave with frequency greater than 20 kHz) from a transmitter to those three receivers, the relative Cartesian position of the transmitter can be calculated as follows:
Here, each li represents the distance from the transmitter to each of the three receivers, calculated based on the travel time of the ultrasonic wave using the equation l = ctus. The constant c denotes the speed of sound, which is equal to 343.2 m/s in dry air at temperature 20°C. Because at least three receivers are required, these calculations are commonly known as triangulation.
Beyond its position, determining a device's orientation (i.e. its degree of rotation in all directions) requires at least three noncollinear points on the tracked object to be known, mandating the number of ultrasonic transmitters to be at least three per device tracked in addition to the three aforementioned receivers. The transmitters emit ultrasonic waves in sequence toward the three receivers, which can then be used to derive spatial data on the three transmitters using the methods described above. The device's orientation can then be derived based on the known positioning of the transmitters upon the device and their spatial locations relative to one another. [24]
As opposed to TOF methods, phase-coherent (PC) tracking methods have also been used to locate object acoustically. PC tracking involves comparing the phase of the current soundwave received by sensors to that of a prior reference signal, such that one can determine the relative change in position of transmitters from the last measurement. Because this method operates only on observed changes in position values, and not on absolute measurements, any errors in measurement tend to compound over more observations. Consequently, this method has lost popularity with developers over time.
Pros:
Cons:
In summary, implementation of acoustic tracking is optimal in cases where one has total control over the ambient environment that the VR or AR system resides in, such as a flight simulator. [2] [25] [26]
Magnetic tracking relies on measuring the intensity of inhomogenous magnetic fields with electromagnetic sensors. A base station, often referred to as the system's transmitter or field generator, generates an alternating or a static electromagnetic field, depending on the system's architecture.
To cover all directions in the three dimensional space, three magnetic fields are generated sequentially. The magnetic fields are generated by three electromagnetic coils which are perpendicular to each other. These coils should be put in a small housing mounted on a moving target which position is necessary to track. Current, sequentially passing through the coils, turns them into electromagnets, which allows them to determine their position and orientation in space.
Because magnetic tracking does not require a head-mounted display, which are frequently used in virtual reality, it is often the tracking system used in fully immersive virtual reality displays. [21] Conventional equipment like head-mounted displays are obtrusive to the user in fully enclosed virtual reality experiences, so alternative equipment such as that used in magnetic tracking is favored. Magnetic tracking has been implemented by Polhemus and in Razer Hydra by Sixense. The system works poorly near any electrically conductive material, such as metal objects and devices, that can affect an electromagnetic field. Magnetic tracking worsens as the user moves away from the base emitter, [21] and scalable area is limited and can't be bigger than 5 meters.
Pros:
Cons:
Virtual reality (VR) is a simulated experience that employs 3D near-eye displays and pose tracking to give the user an immersive feel of a virtual world. Applications of virtual reality include entertainment, education and business. VR is one of the key technologies in the reality-virtuality continuum. As such, it is different from other digital visualization solutions, such as augmented virtuality and augmented reality.
Augmented reality (AR) is an interactive experience that combines the real world and computer-generated 3D content. The content can span multiple sensory modalities, including visual, auditory, haptic, somatosensory and olfactory. AR can be defined as a system that incorporates three basic features: a combination of real and virtual worlds, real-time interaction, and accurate 3D registration of virtual and real objects. The overlaid sensory information can be constructive, or destructive. As such, it is one of the key technologies in the reality-virtuality continuum.
Motion capture is the process of recording the movement of objects or people. It is used in military, entertainment, sports, medical applications, and for validation of computer vision and robots. In films, television shows and video games, motion capture refers to recording actions of human actors and using that information to animate digital character models in 2D or 3D computer animation. When it includes face and fingers or captures subtle expressions, it is often referred to as performance capture. In many fields, motion capture is sometimes called motion tracking, but in filmmaking and games, motion tracking usually refers more to match moving.
A head-mounted display (HMD) is a display device, worn on the head or as part of a helmet, that has a small display optic in front of one or each eye. HMDs have many uses including gaming, aviation, engineering, and medicine.
A wired glove is an input device for human–computer interaction worn like a glove.
The Sword of Damocles is widely misattributed as the name of the first AR display prototype. According to Ivan Sutherland, this was merely a joke name for the mechanical system that supported and tracked the actual HMD below it. It happened to look like a giant overhead cross, hence the joke. Ivan Sutherland's 1968 ground-breaking AR prototype was actually called "the head-mounted display", which is perhaps the first recorded use of the term "HMD", and he preferred "Stereoscopic-Television Apparatus for Individual Use."
Six degrees of freedom (6DOF), or sometimes six degrees of movement, refers to the six mechanical degrees of freedom of movement of a rigid body in three-dimensional space. Specifically, the body is free to change position as forward/backward (surge), up/down (heave), left/right (sway) translation in three perpendicular axes, combined with changes in orientation through rotation about three perpendicular axes, often termed yaw, pitch, and roll.
A positioning system is a system for determining the position of an object in space. Positioning system technologies exist ranging from interplanetary coverage with meter accuracy to workspace and laboratory coverage with sub-millimeter accuracy. A major subclass is made of geopositioning systems, used for determining an object's position with respect to Earth, i.e., its geographical position; one of the most well-known and commonly used geopositioning systems is the Global Positioning System (GPS) and similar global navigation satellite systems (GNSS).
An indoor positioning system (IPS) is a network of devices used to locate people or objects where GPS and other satellite technologies lack precision or fail entirely, such as inside multistory buildings, airports, alleys, parking garages, and underground locations.
In computing, a motion controller is a type of input device that uses accelerometers, gyroscopes, cameras, or other sensors to track motion.
Virtuality was a range of virtual reality machines produced by Virtuality Group, and found in video arcades in the early 1990s. The machines delivered real-time VR gaming via a stereoscopic VR headset, joysticks, tracking devices and networked units for a multi-player experience.
In computing, 3D interaction is a form of human-machine interaction where users are able to move and perform interaction in 3D space. Both human and machine process information where the physical position of elements in the 3D space is relevant.
In computing, an input device is a piece of equipment used to provide data and control signals to an information processing system, such as a computer or information appliance. Examples of input devices include keyboards, computer mice, scanners, cameras, joysticks, and microphones.
In the field of gesture recognition and image processing, finger tracking is a high-resolution technique developed in 1969 that is employed to know the consecutive position of the fingers of the user and hence represent objects in 3D. In addition to that, the finger tracking technique is used as a tool of the computer, acting as an external device in our computer, similar to a keyboard and a mouse.
Oculus Rift is a discontinued line of virtual reality headsets developed and manufactured by Oculus VR, a virtual reality company founded by Palmer Luckey that is widely credited with reviving the virtual reality industry. It was the first virtual reality headset to provide a realistic experience at an accessible price, utilizing novel technology to increase quality and reduce cost by orders of magnitude compared to earlier systems. The first headset in the line was the Oculus Rift DK1, released on March 28, 2013. The last was the Oculus Rift S, discontinued in April 2021.
Oculus Touch is a line of motion controller systems used by Meta Platforms virtual reality headsets. The controller was first introduced in 2016 as a standalone accessory for the Oculus Rift CV1, and began to be bundled with the headset and all future Oculus products beginning in July 2017. Since their original release, Touch controllers have undergone revisions for later generations of Oculus/Meta hardware, including a switch to inside-out tracking, and other design changes.
A virtual reality headset is a head-mounted device that uses 3D near-eye displays and positional tracking to provide a virtual reality environment for the user. VR headsets are widely used with VR video games, but they are also used in other applications, including simulators and trainers. VR headsets typically include a stereoscopic display, stereo sound, and sensors like accelerometers and gyroscopes for tracking the pose of the user's head to match the orientation of the virtual camera with the user's eye positions in the real world. AR headsets are similar to VR headsets, but AR headsets enable the user to see and interact with the outside world. Examples of AR headsets include the Apple Vision Pro and Meta Quest 3.
A virtual reality game or VR game is a video game played on virtual reality (VR) hardware. Most VR games are based on player immersion, typically through a head-mounted display unit or headset with stereoscopic displays and one or more controllers.
The Valve Index is a consumer virtual reality headset created and manufactured by Valve. Announced on April 30, 2019, the headset was released on June 28 of the same year. The Index is a second-generation headset and the first to be fully manufactured by Valve. Half-Life: Alyx is bundled with the headset.
Oculus Rift CV1, also known simply as Oculus Rift, is a virtual reality headset developed by Oculus VR, a subsidiary of Meta Platforms, known at the time as Facebook Inc. It was announced in January 2016, and released in March the same year. The device constituted the first commercial release in the Oculus Rift lineup.
{{cite book}}
: CS1 maint: multiple names: authors list (link){{cite book}}
: CS1 maint: multiple names: authors list (link)