Zhiming Hu is an incoming tenure-track Assistant Professor at the Hong Kong University of Science and Technology (Guangzhou) leading the Human-centred Artificial Intelligence (HuAI) Group starting from August 2025. He is currently a post-doctoral researcher in the Perceptual User Interfaces Group led by Prof. Andreas Bulling and the Computational Biophysics and Biorobotics Group led by Prof. Syn Schmitt, in the University of Stuttgart, Germany since August 2022. He obtained his Ph.D. degree in Computer Software and Theory from Peking University, China in 2022, supervised by Prof. Guoping Wang. He received his Bachelor's degree in Optical Engineering from Beijing Institute of Technology, China in 2017. He has published more than 10 papers as first or corresponding author at top venues in VR/AR and HCI, including IEEE VR, ISMAR, TVCG, CHI, UIST, and IROS. His work has been nominated for TVCG best journal award at IEEE VR 2021 as well as for best doctoral student paper award at INTERACT 2023. He served as a reviewer for many top venues, including SIGGRAPH, CVPR, ICCV, ECCV, CHI, UIST, IEEE VR, ISMAR, IMWUT, TMM, TVCG, and IJHCI.
Research Interests
My research interests include virtual reality, human-computer interaction, eye tracking, and human-centred artificial intelligence.
The long-term research goal is to develop a human-centred intelligent interactive system that can accurately model human behaviours, e.g. human eye movements and human body movements, in activities of daily living.
Recruiting
I am looking for outstanding Ph.D./Master/Intern students and excellent research assistants that have an interest in our research.
A strong interest in developing human-centred computational methods is required. Excellent programming skills are expected. Previous experience with Python, Pytorch, or CUDA is an advantage. Strong team working and critical thinking skills, aptitude for independent and creative work, as well as fluent English written and presentation skills are essential.
If you are highly motivated and capable of addressing and solving scientifically difficult problems and if you are interested in doing research in an internationally oriented and highly energetic team, you should send your application to zhiming.hu (at) simtech.uni-stuttgart.de.
Please include the following information in your application (preferably in a single pdf document):
Curriculum Vitae
Cover letter (stating why you are interested and why you should be chosen)
PhD/Master/Intern applicants: transcripts of master/bachelor program
Latest News
[10.2024] Our Pose2Gaze paper is invited to present at ISMAR 2024.
[09.2024] I will join the Hong Kong University of Science and Technology (Guangzhou) as a tenure-track Assistant Professor!
[08.2024] One paper is accepted at PG 2024.
[08.2024] I will serve as a Program Committee for AAAI 2025.
[08.2024] One paper is accepted at UIST 2024.
[07.2024] One paper is accepted at ISMAR 2024 TVCG-track.
[06.2024] One paper is accepted at IROS 2024 as oral presentation.
[05.2024] I will serve as an Associate Chair for MuC 2024.
[05.2024] One paper is accepted at TVCG 2024.
[03.2024] Two papers are accepted at ETRA 2024.
[01.2024] Two papers are accepted at CHI 2024.
[12.2023] I was invited to attend the 11th Chengyao Youth Forum organised by Nanjing University.
[12.2023] I was invited to be an international program committee for PETMEI 2024.
[12.2023] I was invited to attend the Fifth Youth Forum on the Next Generation Computer Sciences organised by Peking University.
[11.2023] I was invited to attend the 10th Teli Forum organised by Beijing Institute of Technology.
[09.2023] Our paper was nominated for IFIP TC13 Pioneers’ Award for Best Doctoral Student Paper at INTERACT 2023.
[08.2023] I will serve as the Virtualization Chair for ETRA 2024.
[08.2023] One paper is accepted at UIST 2023.
[06.2023] I will serve as an Associate Chair for MuC 2023.
[06.2023] One paper is accepted at INTERACT 2023.
[05.2023] I will serve as a Technical Program Committee member for iWOAR 2023.
[10.2022] One paper is accepted at NeurIPS 2022 Workshop Gaze Meets ML.
[08.2022] I joined the University of Stuttgart as a post-doctoral researcher.
[07.2022] I successfully defended my PhD!
Awards & Honours
Best Doctoral Student Paper Award Nominees at INTERACT 2023
SimTech Postdoctoral Fellowship, 2022
National Scholarship (top 2%), 2021
TVCG Best Journal Award Nominees at IEEE VR 2021 (top 2%, first time for Chinese researchers)
CSC (China Scholarship Council) Scholarship, 2020
Chancellor's Scholarship (top 2%), 2020
Leo KoGuan Scholarship (top 5%), 2019
Leader Scholarship (top 0.2%, 7 out of over 3800 students), 2017
Human motion prediction is important for many virtual and augmented reality (VR/AR) applications such as collision avoidance and realistic avatar generation. Existing methods have synthesised body motion only from observed past motion, despite the fact that human eye gaze is known to correlate strongly with body movements and is readily available in recent VR/AR headsets. We present GazeMoDiff – a novel gaze-guided denoising diffusion model to generate stochastic human motions. Our method first uses a gaze encoder and a motion encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features, and finally injects the gaze-motion features into a noise prediction network via a cross-attention mechanism to progressively generate multiple reasonable human motions in the future. Extensive experiments on the MoGaze and GIMO datasets demonstrate that our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final displacement error (17.3% on MoGaze and 13.3% on GIMO). We further conducted a human study (N=21) and validated that the motions generated by our method were perceived as both more precise and more realistic than those of prior methods. Taken together, these results reveal the significant information content available in eye gaze for stochastic human motion prediction as well as the effectiveness of our method in exploiting this information.
@inproceedings{yan24_gazemodiff,
title={GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction},
author={Yan, Haodong and Hu, Zhiming and Schmitt, Syn and Bulling, Andreas},
booktitle={Proceedings of the 2024 Pacific Conference on Computer Graphics and Applications},
year={2024}}
DisMouse: Disentangling Information from Mouse Movement Data
Mouse movement data contain rich information about users, performed tasks, and user interfaces, but separating the respective components remains challenging and unexplored. As a first step to address this challenge, we propose DisMouse – the first method to disentangle user-specific and user-independent information and stochastic variations from mouse movement data. At the core of our method is an autoencoder trained in a semi-supervised fashion, consisting of a self-supervised denoising diffusion process and a supervised contrastive user identification module. Through evaluations on three datasets, we show that DisMouse 1) captures complementary information of mouse input, hence providing an interpretable framework for modelling mouse movements, 2) can be used to produce refined features, thus enabling various applications such as personalised and variable mouse data generation, and 3) generalises across different datasets. Taken together, our results underline the significant potential of disentangled representation learning for explainable, controllable, and generalised mouse behaviour modelling.
@inproceedings{zhang24_dismouse,
title = {DisMouse: Disentangling Information from Mouse Movement Data},
author = {Zhang, Guanhua and Hu, Zhiming and Bulling, Andreas},
year = {2024},
pages = {1--13},
booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}}
We present GazeMotion – a novel method for human motion forecasting that combines information on past human poses with human eye gaze. Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion. We extensively evaluate our method on the MoGaze, ADT, and GIMO benchmark datasets and show that it outperforms state-of-the-art methods by up to 7.4% improvement in mean per joint position error. Using head direction as a proxy to gaze, our method still achieves an average improvement of 5.5%. We finally report an online user study showing that our method also outperforms prior methods in terms of perceived realism. These results show the significant information content available in eye gaze for human motion forecasting as well as the effectiveness of our method in exploiting this information.
@inproceedings{hu24_gazemotion,
title={GazeMotion: Gaze-guided Human Motion Forecasting},
author={Hu, Zhiming and Schmitt, Syn and Haeufle, Daniel and Bulling, Andreas},
booktitle={Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems},
year={2024}}
HOIMotion: Forecasting Human Motion During Human-Object
Interactions Using Egocentric 3D Object Bounding Boxes
We present HOIMotion – a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph convolutional network (GCN) and multi-layer perceptrons to extract features from body poses and egocentric 3D object bounding boxes, respectively. Our method then fuses pose and object features into a novel pose-object graph and uses a residual-decoder GCN to forecast future body motion. We extensively evaluate our method on the Aria digital twin (ADT) and MoGaze datasets and show that HOIMotion consistently outperforms state-of-the-art methods by a large margin of up to 8.7% on ADT and 7.2% on MoGaze in terms of mean per joint position error. Complementing these evaluations, we report a human study (N=20) that shows that the improvements achieved by our method result in forecasted poses being perceived as both more precise and more realistic than those of existing methods. Taken together, these results reveal the significant information content available in egocentric 3D object bounding boxes for human motion forecasting and the effectiveness of our method in exploiting this information.
@article{hu24_hoimotion,
author={Hu, Zhiming and Yin, Zheming and Haeufle, Daniel and Schmitt, Syn and Bulling, Andreas},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes},
year={2024}}
Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses
Human eye gaze plays a significant role in many virtual and augmented reality (VR/AR) applications, such as gaze-contingent rendering, gaze-based interaction, or eye-based activity recognition. However, prior works on gaze analysis and prediction have only explored eye-head coordination and were limited to human-object interactions. We first report a comprehensive analysis of eye-body coordination in various human-object and human-human interaction activities based on four public datasets collected in real-world (MoGaze), VR (ADT), as well as AR (GIMO and EgoBody) environments. We show that in human-object interactions, e.g. pick and place, eye gaze exhibits strong correlations with full-body motion while in human-human interactions, e.g. chat and teach, a person’s gaze direction is correlated with the body orientation towards the interaction partner. Informed by these analyses we then present Pose2Gaze – a novel eye-body coordination model that uses a convolutional neural network and a spatio-temporal graph convolutional neural network to extract features from head direction and full-body poses, respectively, and then uses a convolutional neural network to predict eye gaze. We compare our method with state-of-the-art methods that predict eye gaze only from head movements and show that Pose2Gaze outperforms these baselines with an average improvement of 24.0% on MoGaze, 10.1% on ADT, 21.3% on GIMO, and 28.6% on EgoBody in mean angular error, respectively. We also show that our method significantly outperforms prior methods in the sample downstream task of eye-based activity recognition. These results underline the significant information content available in eye-body coordination during daily activities and open up a new direction for gaze prediction.
@article{hu24_pose2gaze,
author={Hu, Zhiming and Xu, Jiahui and Schmitt, Syn and Bulling, Andreas},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses},
year={2024}}
VisRecall++: Analysing and Predicting Visualisation Recallability from Gaze Behaviour
Question answering has recently been proposed as a promising means to assess the recallability of information visualisations. However, prior works are yet to study the link between visually encoding a visualisation in memory and recall performance. To fill this gap, we propose VisRecall++ – a novel 40-participant recallability dataset that contains gaze data on 200 visualisations and five question types, such as identifying the title, and finding extreme values.We measured recallability by asking participants questions after they observed the visualisation for 10 seconds.Our analyses reveal several insights, such as saccade amplitude, number of fixations, and fixation duration significantly differ between high and low recallability groups.Finally, we propose GazeRecallNet – a novel computational method to predict recallability from gaze behaviour that outperforms several baselines on this task.Taken together, our results shed light on assessing recallability from gaze behaviour and inform future work on recallability-based visualisation optimisation.
@article{wang24_visrecall,
title = {VisRecall++: Analysing and Predicting Visualisation Recallability from Gaze Behaviour},
author = {Wang, Yao and Jiang, Yue and Hu, Zhiming and Ruhdorfer, Constantin and Bâce, Mihai and Bulling, Andreas},
year = {2024},
journal = {Proc. ACM on Human-Computer Interaction (PACM HCI)},
pages = {1--18},
volume = {8},
number = {ETRA}}
PrivatEyes: Appearance-based Gaze Estimation Using Federated Secure Multi-Party Computation
Latest gaze estimation methods require large-scale training data but their collection and exchange pose significant privacy risks. We propose PrivatEyes - the first privacy-enhancing training approach for appearance-based gaze estimation based on federated learning (FL) and secure multi-party computation (MPC). PrivatEyes enables training gaze estimators on multiple local datasets across different users and server-based secure aggregation of the individual estimators’ updates. PrivatEyes guarantees that individual gaze data remains private even if a majority of the aggregating servers is malicious. We also introduce a new data leakage attack DualView that shows that PrivatEyes limits the leakage of private training data more effectively than previous approaches. Evaluations on the MPIIGaze, MPIIFaceGaze, GazeCapture, and NVGaze datasets further show that the improved privacy does not lead to a lower gaze estimation accuracy or substantially higher computational costs - both of which are on par with its non-secure counterparts.
The mouse is a pervasive input device used for a wide range of interactive applications. However, computational modelling of mouse behaviour typically requires time-consuming design and extraction of handcrafted features, or approaches that are application-specific. We instead propose Mouse2Vec – a novel self-supervised method designed to learn semantic representations of mouse behaviour that are reusable across users and applications. Mouse2Vec uses a Transformer-based encoder-decoder architecture, which is specifically geared for mouse data: During pretraining, the encoder learns an embedding of input mouse trajectories while the decoder reconstructs the input and simultaneously detects mouse click events. We show that the representations learned by our method can identify interpretable mouse behaviour clusters and retrieve similar mouse trajectories. We also demonstrate on three sample downstream tasks that the representations can be practically used to augment mouse data for training supervised methods and serve as an effective feature extractor.
@inproceedings{zhang24_mouse2vec,
title = {Mouse2Vec: Learning Reusable Semantic Representations of Mouse Behaviour},
author = {Zhang, Guanhua and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas},
year = {2024},
pages = {1--17},
booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)},
doi = {10.1145/3613904.3642141}}
SalChartQA: Question-driven Saliency on Information Visualisations
Understanding the link between visual attention and user’s needs when visually exploring information visualisations is under-explored due to a lack of large and diverse datasets to facilitate these analyses. To fill this gap, we introduce SalChartQA – a novel crowd-sourced dataset that uses the BubbleView interface as a proxy for human gaze and a question-answering (QA) paradigm to induce different information needs in users. SalChartQA contains 74,340 answers to 6,000 questions on 3,000 visualisations. Informed by our analyses demonstrating the tight correlation between the question and visual saliency, we propose the first computational method to predict question-driven saliency on information visualisations. Our method outperforms state-of-the-art saliency models, improving several metrics, such as the correlation coefficient and the Kullback-Leibler divergence. These results show the importance of information needs for shaping attention behaviour and paving the way for new applications, such as task-driven optimisation of visualisations or explainable AI in chart question-answering.
@inproceedings{wang24_salchartqa,
title = {SalChartQA: Question-driven Saliency on Information Visualisations},
author = {Wang, Yao and Wang, Weitian and Abdelhafez, Abdullah and Elfares, Mayar and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas},
year = {2024},
pages = {1--14},
booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)},
doi = {10.1145/3613904.3642942}}
SUPREYES: SUPer Resolution for EYES Using Implicit Neural Representation Learning
We introduce SUPREYES – a novel self-supervised method to increase the spatio-temporal resolution of gaze data recorded using low(er)-resolution eye trackers. Despite continuing advances in eye tracking technology, the vast majority of current eye trackers – particularly mobile ones and those integrated into mobile devices – suffer from low-resolution gaze data, thus fundamentally limiting their practical usefulness. SUPREYES learns a continuous implicit neural representation from low-resolution gaze data to up-sample the gaze data to arbitrary resolutions. We compare our method with commonly used interpolation methods on arbitrary scale super-resolution and demonstrate that SUPREYES outperforms these baselines by a significant margin. We also test on the sample downstream task of gaze-based user identification and show that our method improves the performance of original low-resolution gaze data and outperforms other baselines. These results are promising as they open up a new direction for increasing eye tracking fidelity as well as enabling new gaze-based applications without the need for new eye tracking equipment.
@inproceedings{jiao23_supreyes,
author = {Jiao, Chuhan and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas},
title = {SUPREYES: SUPer Resolution for EYES Using Implicit Neural Representation Learning},
booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)},
year = {2023},
pages = {1--13},
doi = {10.1145/3586183.3606780}}
Exploring Natural Language Processing Methods for Interactive Behaviour Modelling
Analysing and modelling interactive behaviour is an important topic in human-computer interaction (HCI) and a key requirement for the development of intelligent interactive systems. Interactive behaviour has a sequential (actions happen one after another) and hierarchical (a sequence of actions forms an activity driven by interaction goals) structure, which may be similar to the structure of natural language. Designed based on such a structure, natural language processing (NLP) methods have achieved groundbreaking success in various downstream tasks. However, few works linked interactive behaviour with natural language. In this paper, we explore the similarity between interactive behaviour and natural language by applying an NLP method, byte pair encoding (BPE), to encode mouse and keyboard behaviour. We then analyse the vocabulary, i.e., the set of action sequences, learnt by BPE, as well as use the vocabulary to encode the input behaviour for interactive task recognition. An existing dataset collected in constrained lab settings and our novel out-of-the-lab dataset were used for evaluation. Results show that this natural language-inspired approach not only learns action sequences that reflect specific interaction goals, but also achieves higher F1 scores on task recognition than other methods. Our work reveals the similarity between interactive behaviour and natural language, and presents the potential of applying the new pack of methods that leverage insights from NLP to model interactive behaviour in HCI.
@inproceedings{zhang23_exploring,
title = {Exploring Natural Language Processing Methods for Interactive Behaviour Modelling},
author = {Zhang, Guanhua and Bortoletto, Matteo and Hu, Zhiming and Shi, Lei and B{\^a}ce, Mihai and Bulling, Andreas},
booktitle = {Proc. IFIP TC13 Conference on Human-Computer Interaction (INTERACT)},
pages = {1--22},
year = {2023},
publisher = {Springer}}
EHTask: Recognizing User Tasks from Eye and Head Movements in Immersive Virtual Reality
Understanding human visual attention in immersive virtual reality (VR) is crucial for many important applications, including gaze prediction, gaze guidance, and gaze-contingent rendering.
However, previous works on visual attention analysis typically only explored one specific VR task and paid less attention to the differences between different tasks.
Moreover, existing task recognition methods typically focused on 2D viewing conditions and only explored the effectiveness of human eye movements.
We first collect eye and head movements of 30 participants performing four tasks, i.e. Free viewing, Visual search, Saliency, and Track, in 15 360-degree VR videos.
Using this dataset, we analyze the patterns of human eye and head movements and reveal significant differences across different tasks in terms of fixation duration, saccade amplitude, head rotation velocity, and eye-head coordination.
We then propose EHTask -- a novel learning-based method that employs eye and head movements to recognize user tasks in VR.
We show that our method significantly outperforms the state-of-the-art methods derived from 2D viewing conditions both on our dataset (accuracy of 84.4% vs. 62.8%) and on a real-world dataset (61.9% vs. 44.1%).
As such, our work provides meaningful insights into human visual attention under different VR tasks and guides future work on recognizing user tasks in VR.
@article{hu22_ehtask,
author={Hu, Zhiming and Bulling, Andreas and Li, Sheng and Wang, Guoping},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={EHTask: Recognizing User Tasks From Eye and Head Movements in Immersive Virtual Reality},
year={2023},
volume={29},
number={4},
pages={1992-2004},
doi={10.1109/TVCG.2021.3138902}}
Intentional Head-Motion Assisted Locomotion for Reducing Cybersickness
Zehui Lin,
Xiang Gu,
Sheng Li,
Zhiming Hu,
Guoping Wang
IEEE Transactions on Visualization and Computer Graphics (TVCG, oral presentation at IEEE VR 2022),
2023, 29(8): 3458-3471.
We present an efficient locomotion technique that can reduce cybersickness through aligning the visual and vestibular induced self-motion illusion. Our locomotion technique stimulates proprioception consistent with the visual sense by intentional head motion, which includes both the head’s translational movement and yaw rotation. A locomotion event is triggered by the hand-held controller together with an intended physical head motion simultaneously. Based on our method, we further explore the connections between the level of cybersickness and the velocity of self motion through a series of experiments. We first conduct Experiment 1 to investigate the cybersickness induced by different translation velocities using our method and then conduct Experiment 2 to investigate the cybersickness induced by different angular velocities. Our user studies from these two experiments reveal a new finding on the correlation between translation/angular velocities and the level of cybersickness. The cybersickness is greatest at the lowest velocity using our method, and the statistical analysis also indicates a possible U-shaped relation between the translation/angular velocity and cybersickness degree. Finally, we conduct Experiment 3 to evaluate the performances of our method and other commonly-used locomotion approaches, i.e., joystick-based steering and teleportation. The results show that our method can significantly reduce cybersickness compared with the joystick-based steering and obtain a higher presence compared with the teleportation. These advantages demonstrate that our method can be an optional locomotion solution for immersive VR applications using commercially available HMD suites only.
@article{lin22_intentional,
author={Lin, Zehui and Gu, Xiang and Li, Sheng and Hu, Zhiming and Wang, Guoping},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={Intentional Head-Motion Assisted Locomotion for Reducing Cybersickness},
year={2023},
volume={29},
number={8},
pages={3458-3471},
doi={10.1109/TVCG.2022.3160232}}
Federated Learning for Appearance-based Gaze Estimation in the Wild
Gaze estimation methods have significantly matured in recent years but the large number of eye images required to train deep learning models poses significant privacy risks. In addition, the heterogeneous data distribution across different users can significantly hinder the training process. In this work, we propose the first federated learning approach for gaze estimation to preserve the privacy of gaze data. We further employ pseudo-gradients optimisation to adapt our federated learning approach to the divergent model updates to address the heterogeneous nature of in-the-wild gaze data in collaborative setups. We evaluate our approach on a real-world dataset (MPIIGaze dataset) and show that our work enhances the privacy guarantees of conventional appearance-based gaze estimation methods, handles the convergence issues of gaze estimators, and significantly outperforms vanilla federated learning by 15.8% (from a mean error of 10.63 degrees to 8.95 degrees). As such, our work paves the way to develop privacy-aware collaborative 14 learning setups for gaze estimation while maintaining the model’s performance.
@inproceedings{elfares22_federated,
title = {Federated Learning for Appearance-based Gaze Estimation in the Wild},
author = {Elfares, Mayar and Hu, Zhiming and Reisert, Pascal and Bulling, Andreas and Küsters, Ralf},
year = {2022},
booktitle = {Proceedings of the NeurIPS Workshop Gaze Meets ML (GMML)},
doi = {10.48550/arXiv.2211.07330},
pages = {1--17}}
FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments
Human visual attention in immersive virtual reality (VR) is key for many important applications, such as content design, gaze-contingent rendering, or gaze-based interaction.
However, prior works typically focused on free-viewing conditions that have limited relevance for practical applications.
We first collect eye tracking data of 27 participants performing a visual search task in four immersive VR environments.
Based on this dataset, we provide a comprehensive analysis of the collected data and reveal correlations between users' eye fixations and other factors, i.e. users' historical gaze positions, task-related objects, saliency information of the VR content, and users' head rotation velocities.
Based on this analysis, we propose FixationNet -- a novel learning-based model to forecast users' eye fixations in the near future in VR.
We evaluate the performance of our model for free-viewing and task-oriented settings and show that it outperforms the state of the art by a large margin of 19.8% (from a mean error of 2.93° to 2.35°) in free-viewing and of 15.1% (from 2.05° to 1.74°) in task-oriented situations.
As such, our work provides new insights into task-oriented attention in virtual environments and guides future work on this important topic in VR research.
@article{hu21_fixationnet,
title={FixationNet: Forecasting eye fixations in task-oriented virtual environments},
author={Hu, Zhiming and Bulling, Andreas and Li, Sheng and Wang, Guoping},
journal={IEEE Transactions on Visualization and Computer Graphics},
volume={27},
number={5},
pages={2681--2690},
year={2021},
publisher={IEEE}}
DGaze: CNN-Based Gaze Prediction in Dynamic Scenes
We conduct novel analyses of users' gaze behaviors in dynamic virtual scenes and, based on our analyses, we present a novel CNN-based model called DGaze for gaze prediction in HMD-based applications.
We first collect 43 users' eye tracking data in 5 dynamic scenes under free-viewing conditions.
Next, we perform statistical analysis of our data and observe that dynamic object positions, head rotation velocities, and salient regions are correlated with users' gaze positions.
Based on our analysis, we present a CNN-based model (DGaze) that combines object position sequence, head velocity sequence, and saliency features to predict users' gaze positions.
Our model can be applied to predict not only realtime gaze positions but also gaze positions in the near future and can achieve better performance than prior method.
In terms of realtime prediction, DGaze achieves a 22.0% improvement over prior method in dynamic scenes and obtains an improvement of 9.5% in static scenes, based on using the angular distance as the evaluation metric.
We also propose a variant of our model called DGaze_ET that can be used to predict future gaze positions with higher precision by combining accurate past gaze data gathered using an eye tracker.
We further analyze our CNN architecture and verify the effectiveness of each component in our model.
We apply DGaze to gaze-contingent rendering and a game, and also present the evaluation results from a user study.
@article{hu20_dgaze,
title={DGaze: CNN-Based Gaze Prediction in Dynamic Scenes},
author={Hu, Zhiming and Li, Sheng and Zhang, Congyi and Yi, Kangrui and Wang, Guoping and Manocha, Dinesh},
journal={IEEE Transactions on Visualization and Computer Graphics},
volume={26},
number={5},
pages={1902--1911},
year={2020},
publisher={IEEE}}
Temporal continuity of visual attention for future gaze prediction in immersive virtual reality
Background Eye tracking technology is receiving increased attention in the field of virtual reality.
Specifically, future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering, advertisement placement, and content-based design.
To explore future gaze prediction, it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.
Methods In this paper, the concept of temporal continuity of visual attention is presented.
Subsequently, an autocorrelation function method is proposed to evaluate the temporal continuity.
Thereafter, the temporal continuity is analyzed in both free-viewing and task-oriented conditions.
Results Specifically, in free-viewing conditions, the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.
A task-oriented game scene condition was created and conducted to collect users' gaze data.
An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.
Temporal continuity can be applied to future gaze prediction and if it is good, users' current gaze positions can be directly utilized to predict their gaze positions in the future.
Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.
The task of long-term gaze prediction still remains to be explored.
We present a novel, data-driven eye-head coordination model that can be used for realtime gaze prediction for immersive HMD-based applications without any external hardware or eye tracker.
Our model (SGaze) is computed by generating a large dataset that corresponds to different users navigating in virtual worlds with different lighting conditions.
We perform statistical analysis on the recorded data and observe a linear correlation between gaze positions and head rotation angular velocities.
We also find that there exists a latency between eye movements and head movements.
SGaze can work as a software-based realtime gaze predictor and we formulate a time related function between head movement and eye movement and use that for realtime gaze position prediction.
We demonstrate the benefits of SGaze for gaze-contingent rendering and evaluate the results with a user study.
@article{hu19_sgaze,
title={SGaze: A Data-Driven Eye-Head Coordination Model for Realtime Gaze Prediction},
author={Hu, Zhiming and Zhang, Congyi and Li, Sheng and Wang, Guoping and Manocha, Dinesh},
journal={IEEE Transactions on Visualization and Computer Graphics},
volume={25},
number={5},
pages={2002--2010},
year={2019},
publisher={IEEE}}