[12.2023] I was invited to attend the Fifth Youth Forum on the Next Generation Computer Sciences organised by Peking University.
[11.2023] I was invited to attend the 10th Teli Forum organised by Beijing Institute of Technology.
[09.2023] Our paper was nominated for IFIP TC13 Pioneers’ Award for Best Doctoral Student Paper at INTERACT 2023.
[08.2023] I will serve as the Virtualization Chair for ETRA 2024.
[08.2023] One paper is accepted at UIST 2023.
[06.2023] I will serve as an Associate Chair for MuC 2023.
[06.2023] One paper is accepted at INTERACT 2023.
[05.2023] I will serve as a Technical Program Committee member for iWOAR 2023.
[10.2022] One paper is accepted at NeurIPS 2022 Workshop Gaze Meets ML.
[08.2022] I joined the University of Stuttgart as a post-doctoral researcher.
[07.2022] I successfully defended my Ph.D.!
Research Interests
My research interests include virtual reality, human-computer interaction, eye tracking, and human-centred artificial intelligence.
The long-term research goal is to develop a human-centred intelligent interactive system that can accurately model human behaviours, e.g. human eye movements and human body movements, in activities of daily living.
Awards & Honors
IFIP TC13 Pioneers’ Award for Best Doctoral Student Paper Nominees at INTERACT 2023, 2023
SimTech Postdoctoral Fellowship, 2022
National Scholarship (top 2%), 2021
TVCG Best Journal Award Nominees at IEEE VR 2021 (top 2%), 2021
CSC (China Scholarship Council) Scholarship, 2020
Chancellor's Scholarship (top 2%), 2020
Leo KoGuan Scholarship (top 5%), 2019
Leader Scholarship (top 0.2%, 7 out of over 3800 students), 2017
Towards Human-aware Intelligent User Interfaces. Peking University Fifth Youth Forum on the Next Generation Computer Sciences, China, December, 2023.
Towards the Coordination of Eye, Body and Context in Daily Activities. Beijing Institute of Technology 10th Teli Forum, China, Hosted by Prof. Guoren Wang, November, 2023.
Analysis and Prediction of Human Visual Attention in Virtual Reality. Southeast University, China, Hosted by Prof. Ding Ding, June, 2022.
Recognizing User Tasks from Eye and Head Movements in Immersive Virtual Reality. IEEE VR 2022, Hosted by Prof. Kiyoshi Kiyokawa, March, 2022.
Forecasting Eye Fixations in Task-Oriented Virtual Environments. GAMES Webinar 2021, Hosted by Prof. Xubo Yang, September, 2021.
Eye-Head Coordination Model for Real-time Gaze Prediction. 2019 International Conference on VR/AR and 3D Display, Hosted by Prof. Feng Xu, June 2019.
Teaching
Machine Perception and Learning, University of Stuttgart, 2022, Lecturer
We introduce SUPREYES – a novel self-supervised method to increase the spatio-temporal resolution of gaze data recorded using low(er)-resolution eye trackers. Despite continuing advances in eye tracking technology, the vast majority of current eye trackers – particularly mobile ones and those integrated into mobile devices – suffer from low-resolution gaze data, thus fundamentally limiting their practical usefulness. SUPREYES learns a continuous implicit neural representation from low-resolution gaze data to up-sample the gaze data to arbitrary resolutions. We compare our method with commonly used interpolation methods on arbitrary scale super-resolution and demonstrate that SUPREYES outperforms these baselines by a significant margin. We also test on the sample downstream task of gaze-based user identification and show that our method improves the performance of original low-resolution gaze data and outperforms other baselines. These results are promising as they open up a new direction for increasing eye tracking fidelity as well as enabling new gaze-based applications without the need for new eye tracking equipment.
@inproceedings{jiao23_supreyes,
author = {Jiao, Chuhan and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas},
title = {SUPREYES: SUPer Resolution for EYES Using Implicit Neural Representation Learning},
booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)},
year = {2023},
pages = {1--13},
doi = {10.1145/3586183.3606780}
}
Exploring Natural Language Processing Methods for Interactive Behaviour Modelling
Analysing and modelling interactive behaviour is an important topic in human-computer interaction (HCI) and a key requirement for the development of intelligent interactive systems. Interactive behaviour has a sequential (actions happen one after another) and hierarchical (a sequence of actions forms an activity driven by interaction goals) structure, which may be similar to the structure of natural language. Designed based on such a structure, natural language processing (NLP) methods have achieved groundbreaking success in various downstream tasks. However, few works linked interactive behaviour with natural language. In this paper, we explore the similarity between interactive behaviour and natural language by applying an NLP method, byte pair encoding (BPE), to encode mouse and keyboard behaviour. We then analyse the vocabulary, i.e., the set of action sequences, learnt by BPE, as well as use the vocabulary to encode the input behaviour for interactive task recognition. An existing dataset collected in constrained lab settings and our novel out-of-the-lab dataset were used for evaluation. Results show that this natural language-inspired approach not only learns action sequences that reflect specific interaction goals, but also achieves higher F1 scores on task recognition than other methods. Our work reveals the similarity between interactive behaviour and natural language, and presents the potential of applying the new pack of methods that leverage insights from NLP to model interactive behaviour in HCI.
@inproceedings{zhang23_exploring,
title = {Exploring Natural Language Processing Methods for Interactive Behaviour Modelling},
author = {Zhang, Guanhua and Bortoletto, Matteo and Hu, Zhiming and Shi, Lei and B{\^a}ce, Mihai and Bulling, Andreas},
booktitle = {Proc. IFIP TC13 Conference on Human-Computer Interaction (INTERACT)},
pages = {1--22},
year = {2023},
publisher = {Springer}
}
EHTask: Recognizing User Tasks from Eye and Head Movements in Immersive Virtual Reality
Understanding human visual attention in immersive virtual reality (VR) is crucial for many important applications, including gaze prediction, gaze guidance, and gaze-contingent rendering.
However, previous works on visual attention analysis typically only explored one specific VR task and paid less attention to the differences between different tasks.
Moreover, existing task recognition methods typically focused on 2D viewing conditions and only explored the effectiveness of human eye movements.
We first collect eye and head movements of 30 participants performing four tasks, i.e. Free viewing, Visual search, Saliency, and Track, in 15 360-degree VR videos.
Using this dataset, we analyze the patterns of human eye and head movements and reveal significant differences across different tasks in terms of fixation duration, saccade amplitude, head rotation velocity, and eye-head coordination.
We then propose EHTask -- a novel learning-based method that employs eye and head movements to recognize user tasks in VR.
We show that our method significantly outperforms the state-of-the-art methods derived from 2D viewing conditions both on our dataset (accuracy of 84.4% vs. 62.8%) and on a real-world dataset (61.9% vs. 44.1%).
As such, our work provides meaningful insights into human visual attention under different VR tasks and guides future work on recognizing user tasks in VR.
@ARTICLE{hu22_ehtask,
author={Hu, Zhiming and Bulling, Andreas and Li, Sheng and Wang, Guoping},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={EHTask: Recognizing User Tasks From Eye and Head Movements in Immersive Virtual Reality},
year={2023},
volume={29},
number={4},
pages={1992-2004},
doi={10.1109/TVCG.2021.3138902}}
Intentional Head-Motion Assisted Locomotion for Reducing Cybersickness
Zehui Lin,
Xiang Gu,
Sheng Li,
Zhiming Hu,
Guoping Wang
IEEE Transactions on Visualization and Computer Graphics (TVCG),
2022, 29(8): 3458-3471.
We present an efficient locomotion technique that can reduce cybersickness through aligning the visual and vestibular induced self-motion illusion. Our locomotion technique stimulates proprioception consistent with the visual sense by intentional head motion, which includes both the head’s translational movement and yaw rotation. A locomotion event is triggered by the hand-held controller together with an intended physical head motion simultaneously. Based on our method, we further explore the connections between the level of cybersickness and the velocity of self motion through a series of experiments. We first conduct Experiment 1 to investigate the cybersickness induced by different translation velocities using our method and then conduct Experiment 2 to investigate the cybersickness induced by different angular velocities. Our user studies from these two experiments reveal a new finding on the correlation between translation/angular velocities and the level of cybersickness. The cybersickness is greatest at the lowest velocity using our method, and the statistical analysis also indicates a possible U-shaped relation between the translation/angular velocity and cybersickness degree. Finally, we conduct Experiment 3 to evaluate the performances of our method and other commonly-used locomotion approaches, i.e., joystick-based steering and teleportation. The results show that our method can significantly reduce cybersickness compared with the joystick-based steering and obtain a higher presence compared with the teleportation. These advantages demonstrate that our method can be an optional locomotion solution for immersive VR applications using commercially available HMD suites only.
@ARTICLE{lin22_intentional,
author={Lin, Zehui and Gu, Xiang and Li, Sheng and Hu, Zhiming and Wang, Guoping},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={Intentional Head-Motion Assisted Locomotion for Reducing Cybersickness},
year={2023},
volume={29},
number={8},
pages={3458-3471},
doi={10.1109/TVCG.2022.3160232}}
Federated Learning for Appearance-based Gaze Estimation in the Wild
Gaze estimation methods have significantly matured in recent years but the large number of eye images required to train deep learning models poses significant privacy risks. In addition, the heterogeneous data distribution across different users can significantly hinder the training process. In this work, we propose the first federated learning approach for gaze estimation to preserve the privacy of gaze data. We further employ pseudo-gradients optimisation to adapt our federated learning approach to the divergent model updates to address the heterogeneous nature of in-the-wild gaze data in collaborative setups. We evaluate our approach on a real-world dataset (MPIIGaze dataset) and show that our work enhances the privacy guarantees of conventional appearance-based gaze estimation methods, handles the convergence issues of gaze estimators, and significantly outperforms vanilla federated learning by 15.8% (from a mean error of 10.63 degrees to 8.95 degrees). As such, our work paves the way to develop privacy-aware collaborative 14 learning setups for gaze estimation while maintaining the model’s performance.
@inproceedings{elfares22_federated,
title = {Federated Learning for Appearance-based Gaze Estimation in the Wild},
author = {Elfares, Mayar and Hu, Zhiming and Reisert, Pascal and Bulling, Andreas and Küsters, Ralf},
year = {2022},
booktitle = {Proceedings of the NeurIPS Workshop Gaze Meets ML (GMML)},
doi = {10.48550/arXiv.2211.07330},
pages = {1--17}
}
Research progress of user task prediction and algorithm analysis (in Chinese)
Users’ cognitive behaviors are dramatically influenced by the specific tasks assigned to them.
Information on users’ tasks can be applied to many areas, such as human behavior analysis and intelligent human-computer interfaces.
It can be used as the input of intelligent systems and enable the systems to automatically adjust their functions according to different tasks.
User task prediction refers to the prediction of users’ tasks at hand based on the characteristics of his or her eye movements, the characteristics of scene content, and other related information.
User task prediction is a popular research topic in vision research, and researchers have proposed many successful task prediction algorithms.
However, the algorithms proposed in prior works mainly focus on a particular scene, and comparison and analysis are absent for these algorithms.
This paper presented a review of prior works on task prediction in scenes of images, videos, and real world, and detailed existing task prediction algorithms.
Based on a real-world task dataset, this paper evaluated the performances of existing algorithms and conducted the corresponding analysis and discussion.
As such, this work can provide meaningful insights for future works on this important topic.
@article{hu21_user,
title = {Research progress of user task prediction and algorithm analysis (in Chinese)},
author = {Hu, Zhiming and Li, Sheng and Gai, Meng},
year = {2021},
journal={Journal of Graphics},
doi = {http://www.txxb.com.cn/CN/10.11996/JG.j.2095-302X.2021030367},
volume = {42},
number = {3},
pages = {367-375}
}
Eye Fixation Forecasting in Task-Oriented Virtual Reality
In immersive virtual reality (VR), users' visual attention is crucial for many important applications, including VR content design, gaze-based interaction, and gaze-contingent rendering.
Especially, information on users' future eye fixations is key for intelligent user interfaces and has significant relevance for many areas, such as visual attention enhancement, dynamic event triggering, and human-computer interaction.
However, previous works typically focused on free-viewing conditions and paid less attention to task-oriented attention.
This paper aims at forecasting users' eye fixations in task-oriented virtual reality.
To this end, a VR eye tracking dataset that corresponds to different users performing a visual search task in immersive virtual environments is built.
A comprehensive analysis of users' eye fixations is performed based on the collected data.
The analysis reveals that eye fixations are correlated with users' historical gaze positions, task-related objects, saliency information of the VR content, and head rotation velocities.
Based on this analysis, a novel learning-based model is proposed to forecast users' eye fixations in the near future in immersive virtual environments.
@inproceedings{hu21_eye,
author={Hu, Zhiming},
title = {Eye Fixation Forecasting in Task-Oriented Virtual Reality},
booktitle={Proceedings of the 2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops},
year = {2021},
pages={707-708},
organization={IEEE}
}
FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments
Human visual attention in immersive virtual reality (VR) is key for many important applications, such as content design, gaze-contingent rendering, or gaze-based interaction.
However, prior works typically focused on free-viewing conditions that have limited relevance for practical applications.
We first collect eye tracking data of 27 participants performing a visual search task in four immersive VR environments.
Based on this dataset, we provide a comprehensive analysis of the collected data and reveal correlations between users' eye fixations and other factors, i.e. users' historical gaze positions, task-related objects, saliency information of the VR content, and users' head rotation velocities.
Based on this analysis, we propose FixationNet -- a novel learning-based model to forecast users' eye fixations in the near future in VR.
We evaluate the performance of our model for free-viewing and task-oriented settings and show that it outperforms the state of the art by a large margin of 19.8% (from a mean error of 2.93° to 2.35°) in free-viewing and of 15.1% (from 2.05° to 1.74°) in task-oriented situations.
As such, our work provides new insights into task-oriented attention in virtual environments and guides future work on this important topic in VR research.
@article{hu21_fixationnet,
title={FixationNet: Forecasting eye fixations in task-oriented virtual environments},
author={Hu, Zhiming and Bulling, Andreas and Li, Sheng and Wang, Guoping},
journal={IEEE Transactions on Visualization and Computer Graphics},
volume={27},
number={5},
pages={2681--2690},
year={2021},
publisher={IEEE}
}
In virtual reality (VR) systems, users’ gaze information has gained importance in recent years.
It can be applied to many aspects, including VR content design, eye-movement based interaction, gaze-contingent rendering, etc.
In this context, it becomes increasingly important to understand users’ gaze behaviors in virtual reality and to predict users’ gaze positions.
This paper presents research in gaze behavior analysis and gaze position prediction in virtual reality.
Specifically, this paper focuses on static virtual scenes and dynamic virtual scenes under free-viewing conditions.
Users’ gaze data in virtual scenes are collected and statistical analysis is performed on the recorded data.
The analysis reveals that users’ gaze positions are correlated with their head rotation velocities and the salient regions of the content.
In dynamic scenes, users’ gaze positions also have strong correlations with the positions of dynamic objects.
A data-driven eye-head coordination model is proposed for realtime gaze prediction in static scenes and a CNN-based model is derived for predicting gaze positions in dynamic scenes.
@inproceedings{hu20_gaze,
author={Hu, Zhiming},
title = {Gaze Analysis and Prediction in Virtual Reality},
booktitle={Proceedings of the 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops},
year = {2020},
pages={543--544},
organization={IEEE}
}
DGaze: CNN-Based Gaze Prediction in Dynamic Scenes
We conduct novel analyses of users' gaze behaviors in dynamic virtual scenes and, based on our analyses, we present a novel CNN-based model called DGaze for gaze prediction in HMD-based applications.
We first collect 43 users' eye tracking data in 5 dynamic scenes under free-viewing conditions.
Next, we perform statistical analysis of our data and observe that dynamic object positions, head rotation velocities, and salient regions are correlated with users' gaze positions.
Based on our analysis, we present a CNN-based model (DGaze) that combines object position sequence, head velocity sequence, and saliency features to predict users' gaze positions.
Our model can be applied to predict not only realtime gaze positions but also gaze positions in the near future and can achieve better performance than prior method.
In terms of realtime prediction, DGaze achieves a 22.0% improvement over prior method in dynamic scenes and obtains an improvement of 9.5% in static scenes, based on using the angular distance as the evaluation metric.
We also propose a variant of our model called DGaze_ET that can be used to predict future gaze positions with higher precision by combining accurate past gaze data gathered using an eye tracker.
We further analyze our CNN architecture and verify the effectiveness of each component in our model.
We apply DGaze to gaze-contingent rendering and a game, and also present the evaluation results from a user study.
@article{hu20_dgaze,
title={DGaze: CNN-Based Gaze Prediction in Dynamic Scenes},
author={Hu, Zhiming and Li, Sheng and Zhang, Congyi and Yi, Kangrui and Wang, Guoping and Manocha, Dinesh},
journal={IEEE Transactions on Visualization and Computer Graphics},
volume={26},
number={5},
pages={1902--1911},
year={2020},
publisher={IEEE}
}
Temporal continuity of visual attention for future gaze prediction in immersive virtual reality
Background Eye tracking technology is receiving increased attention in the field of virtual reality.
Specifically, future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering, advertisement placement, and content-based design.
To explore future gaze prediction, it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.
Methods In this paper, the concept of temporal continuity of visual attention is presented.
Subsequently, an autocorrelation function method is proposed to evaluate the temporal continuity.
Thereafter, the temporal continuity is analyzed in both free-viewing and task-oriented conditions.
Results Specifically, in free-viewing conditions, the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.
A task-oriented game scene condition was created and conducted to collect users' gaze data.
An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.
Temporal continuity can be applied to future gaze prediction and if it is good, users' current gaze positions can be directly utilized to predict their gaze positions in the future.
Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.
The task of long-term gaze prediction still remains to be explored.
We present a novel, data-driven eye-head coordination model that can be used for realtime gaze prediction for immersive HMD-based applications without any external hardware or eye tracker.
Our model (SGaze) is computed by generating a large dataset that corresponds to different users navigating in virtual worlds with different lighting conditions.
We perform statistical analysis on the recorded data and observe a linear correlation between gaze positions and head rotation angular velocities.
We also find that there exists a latency between eye movements and head movements.
SGaze can work as a software-based realtime gaze predictor and we formulate a time related function between head movement and eye movement and use that for realtime gaze position prediction.
We demonstrate the benefits of SGaze for gaze-contingent rendering and evaluate the results with a user study.
@article{hu19_sgaze,
title={SGaze: A Data-Driven Eye-Head Coordination Model for Realtime Gaze Prediction},
author={Hu, Zhiming and Zhang, Congyi and Li, Sheng and Wang, Guoping and Manocha, Dinesh},
journal={IEEE Transactions on Visualization and Computer Graphics},
volume={25},
number={5},
pages={2002--2010},
year={2019},
publisher={IEEE}
}