Zhiming Hu

zhiming.hu (at) simtech.uni-stuttgart.de Pfaffenwaldring 5a, 70569 Stuttgart, Germany University of Stuttgart, SimTech Building google scholar, github, curriculum vitae (English),curriculum vitae (Chinese)
wechat official account (lab), wechat official account (personal)

Short Bio

Zhiming Hu is an incoming tenure-track Assistant Professor at the Hong Kong University of Science and Technology (Guangzhou) leading the Human-centered Artificial Intelligence (HAI) Group starting from August 2025. He is currently a post-doctoral researcher in the Collaborative Artificial Intelligence Group led by Prof. Andreas Bulling and the Computational Biophysics and Biorobotics Group led by Prof. Syn Schmitt, in the University of Stuttgart, Germany since August 2022. He obtained his Ph.D. degree in Computer Software and Theory from Peking University, China in 2022, supervised by Prof. Guoping Wang. He received his Bachelor's degree in Optical Engineering from Beijing Institute of Technology, China in 2017. His research interests include virtual reality, human-computer interaction, eye tracking, and human-centered artificial intelligence. He has published over 20 papers at top venues in VR/AR and HCI, including SIGGRAPH, TVCG, IEEE VR, ISMAR, CHI, and UIST. His work has won Best Journal Paper Award at ISMAR 2024 (the only one at the conference), Best Journal Paper Nominees at IEEE VR 2021 (first time for Chinese researchers), and Best Student Paper Nominees at INTERACT 2023. He serves as a reviewer for many top venues, including SIGGRAPH, CVPR, ICCV, ECCV, CHI, UIST, IEEE VR, ISMAR, IMWUT, TMM, TVCG, and IJHCI.

Recruiting

I am looking for outstanding PostDocs, excellent Ph.D./Master/Intern students, and brilliant research assistans that have an interest in our research.

A strong interest in developing human-centered computational methods is required. Excellent programming skills are expected. Previous experience with Python, Pytorch, or CUDA is an advantage. Strong team working and critical thinking skills, aptitude for independent and creative work, as well as fluent English written and presentation skills are essential.

If you are highly motivated and capable of addressing and solving scientifically difficult problems and if you are interested in doing research in an internationally oriented and highly energetic team, you should send your application to cranehzm (at) gmail.com.

Please include the following information in your application (preferably in a single pdf document):

Curriculum Vitae
Cover letter (stating why you are interested and why you should be chosen)
PhD/Master/Intern applicants: transcripts of master/bachelor program

Please note that due to a large amount of applications, I am unable to provide timely replies to all of them. If you don't receive response within two weeks, you are encouraged to contact my colleagues at HKUST (GZ) and I would be willing to co-supervise PhD students with them if there is a fit.

Latest News

[07.2025] I am invited to give a talk at GAMES Webinar 2025.
[05.2025] One paper is accepted at TVCG 2025.
[04.2025] One paper is accepted at SIGGRAPH 2025.
[01.2025] One paper is accepted at CHI 2025.
[10.2024] Our HOIMotion paper won Best Journal Paper Award at ISMAR 2024!
[10.2024] Our GazeMotion paper is invited to present at IROS 2024 workshop on Nonverbal Cues for Human-Robot Cooperative Intelligence.
[10.2024] Our Pose2Gaze paper is invited to present at ISMAR 2024.
[09.2024] I will join the Hong Kong University of Science and Technology (Guangzhou) as a tenure-track Assistant Professor!
[08.2024] One paper is accepted at PG 2024.
[08.2024] I will serve as a Program Committee for AAAI 2025.
[08.2024] I will serve as the Presentation and Poster Chair for ETRA 2025.
[08.2024] One paper is accepted at UIST 2024.
[07.2024] One paper is accepted at ISMAR 2024 journal-track.
[06.2024] One paper is accepted at IROS 2024 as oral presentation.
[05.2024] I will serve as an Associate Chair for MuC 2024.
[05.2024] One paper is accepted at TVCG 2024.
[03.2024] Two papers are accepted at ETRA 2024.
[01.2024] Two papers are accepted at CHI 2024.
[12.2023] I am invited to attend the 11th Chengyao Youth Forum organised by Nanjing University.
[12.2023] I am invited to be an international program committee for PETMEI 2024.
[12.2023] I am invited to attend the Fifth Youth Forum on the Next Generation Computer Sciences organised by Peking University.
[11.2023] I am invited to attend the 10th Teli Forum organised by Beijing Institute of Technology.
[09.2023] Our paper is nominated for Best Student Paper at INTERACT 2023.
[08.2023] I will serve as the Virtualization Chair for ETRA 2024.
[08.2023] One paper is accepted at UIST 2023.
[06.2023] I will serve as an Associate Chair for MuC 2023.
[06.2023] One paper is accepted at INTERACT 2023.
[05.2023] I will serve as a Technical Program Committee member for iWOAR 2023.
[10.2022] One paper is accepted at NeurIPS 2022 Workshop Gaze Meets ML.
[08.2022] I joined the University of Stuttgart as a post-doctoral researcher.
[07.2022] I successfully defended my PhD!

Awards & Honours

Best Journal Paper Award at ISMAR 2024 (the only one at the conference)
Baden-Wurttemberg Foundation Postdoctoral Fellowship, 2024
Best Student Paper Nominees at INTERACT 2023
SimTech Postdoctoral Fellowship, 2022
National Scholarship (top 2%), 2021
Best Journal Paper Nominees at IEEE VR 2021 (first time for Chinese researchers)
CSC (China Scholarship Council) Scholarship, 2020
Chancellor's Scholarship (top 2%), 2020
Leo KoGuan Scholarship (top 5%), 2019
Leader Scholarship (top 0.2%, 7 out of over 3800 students), 2017
National Scholarship (top 2%), 2016
National Scholarship (top 2%), 2014

Professional Activities & Talks

Reviewing

Journals: TVCG, IMWUT, TMM, IJHCI, TiiS, MTAP, VR, BRM
Conferences: SIGGRAPH, CVPR, ICCV, ECCV, CHI, UIST, IEEE VR, ISMAR, AAAI, PG, ETRA

Organising Committee

Presentation and Poster Chair for ETRA 2025
Program Committee for AAAI 2025
Associate Chair for MuC 2024
International Program Committee for PETMEI 2024
Virtualization Chair for ETRA 2024
Associate Chair for MuC 2023
Technical Program Committee member for iWOAR 2023

Invited Talks

Investigating the Coordination of Human Eye Gaze and Body Movements in Extended Reality. GAMES Webinar 2025, Hosted by Dr. Xinda Liu, July, 2025.
Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses. ISMAR 2024, October, 2024.
Gaze-guided Human Motion Forecasting. IROS 2024 workshop on Nonverbal Cues for Human-Robot Cooperative Intelligence, Hosted by Dr. Jouh Yeong Chew, October, 2024.
Towards Human-centered Artificial Intelligence. Nanjing University 11th Chengyao Youth Forum, China, December, 2023.
Towards Human-aware Intelligent User Interfaces. Peking University Fifth Youth Forum on the Next Generation Computer Sciences, China, December, 2023.
Towards the Coordination of Eye, Body and Context in Daily Activities. Beijing Institute of Technology 10th Teli Forum, China, Hosted by Prof. Guoren Wang, November, 2023.
The Coordination of Digital Humans. Peking University Career Talk on Computer Science, China, November, 2022.
Analysis and Prediction of Human Visual Attention in Virtual Reality. Southeast University, China, Hosted by Prof. Ding Ding, June, 2022.
Recognizing User Tasks from Eye and Head Movements in Immersive Virtual Reality. IEEE VR 2022, Hosted by Prof. Kiyoshi Kiyokawa, March, 2022.
Forecasting Eye Fixations in Task-Oriented Virtual Environments. GAMES Webinar 2021, Hosted by Prof. Xubo Yang, September, 2021.
Gaze Analysis and Prediction in Virtual Reality. ChinaVR 2020 - IEEE VR Night, Hosted by Prof. Lili Wang, September 2020.
Eye-Head Coordination Model for Real-time Gaze Prediction. 2019 International Conference on VR/AR and 3D Display, Hosted by Prof. Feng Xu, June 2019.

Teaching

Machine Perception and Learning, University of Stuttgart, 2022, Lecturer
Computer Graphics, Peking University, 2018, Teaching Assistant
Image and Video-Based 3D Reconstruction, Peking University, 2018, Teaching Assistant
Programming Basics, Peking University, 2018, Teaching Assistant

Selected Publications

* Corresponding author # Equal contribution

HaHeAE: Learning Generalisable Joint Representations of Human Hand and Head Movements in Extended Reality

Zhiming Hu*, Guanhua Zhang, Zheming Yin, Daniel Haeufle, Syn Schmitt, Andreas Bulling

IEEE Transactions on Visualization and Computer Graphics, 2025.

Abstract Links BibTeX Project

Human hand and head movements are the most pervasive input modalities in extended reality (XR) and are significant for a wide range of applications. However, prior works on hand and head modelling in XR only explored a single modality or focused on specific applications. We present HaHeAE - a novel self-supervised method for learning generalisable joint representations of hand and head movements in XR. At the core of our method is an autoencoder (AE) that uses a graph convolutional network-based semantic encoder and a diffusion-based stochastic encoder to learn the joint semantic and stochastic representations of hand-head movements. It also features a diffusion-based decoder to reconstruct the original signals. Through extensive evaluations on three public XR datasets, we show that our method 1) significantly outperforms commonly used self-supervised methods by up to 74.1% in terms of reconstruction quality and is generalisable across users, activities, and XR environments, 2) enables new applications, including interpretable hand-head cluster identification and variable hand-head movement generation, and 3) can serve as an effective feature extractor for downstream tasks. Together, these results demonstrate the effectiveness of our method and underline the potential of self-supervised methods for jointly modelling hand-head behaviours in extended reality.

Paper: paper.pdf

@article{hu25haheae, author={Hu, Zhiming and Zhang, Guanhua and Yin, Zheming and Haeufle, Daniel and Schmitt, Syn and Bulling, Andreas}, journal={IEEE Transactions on Visualization and Computer Graphics}, title={HaHeAE: Learning Generalisable Joint Representations of Human Hand and Head Movements in Extended Reality}, year={2025}}

HOIGaze: Gaze Estimation During Hand-Object Interactions in Extended Reality Exploiting Eye-Hand-Head Coordination

Zhiming Hu*, Daniel Haeufle, Syn Schmitt, Andreas Bulling

Proceedings of the 2025 ACM Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH), 2025.

Abstract Links BibTeX Project

We present HOIGaze – a novel learning-based approach for gaze estimation during hand-object interactions (HOI) in extended reality (XR). HOIGaze addresses the challenging HOI setting by building on one key insight: The eye, hand, and head movements are closely coordinated during HOIs and this coordination can be exploited to identify samples that are most useful for gaze estimator training – as such, effectively denoising the training data. This denoising approach is in stark contrast to previous gaze estimation methods that treated all training samples as equal. Specifically, we propose: 1) a novel hierarchical framework that first recognises the hand currently visually attended to and then estimates gaze direction based on the attended hand; 2) a new gaze estimator that uses cross-modal Transformers to fuse head and hand-object features extracted using a convolutional neural network and a spatio-temporal graph convolutional network; and 3) a novel eye-head coordination loss that upgrades training samples belonging to the coordinated eye-head movements. We evaluate HOIGaze on the HOT3D and Aria digital twin (ADT) datasets and show that it significantly outperforms state-of-the-art methods, achieving an average improvement of 15.6% on HOT3D and 6.0% on ADT in mean angular error. To demonstrate the potential of our method, we further report significant performance improvements for the sample downstream task of eye-based activity recognition on ADT. Taken together, our results underline the significant information content available in eye-hand-head coordination and, as such, open up an exciting new direction for learning-based gaze estimation.

Paper: paper.pdf

@inproceedings{hu25hoigaze, title={HOIGaze: Gaze Estimation During Hand-Object Interactions in Extended Reality Exploiting Eye-Hand-Head Coordination}, author={Hu, Zhiming and Haeufle, Daniel and Schmitt, Syn and Bulling, Andreas}, booktitle={Proceedings of the 2025 ACM Special Interest Group on Computer Graphics and Interactive Techniques}, year={2025}}

SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation

Guanhua Zhang, Mohamed Ahmed, Zhiming Hu*, Andreas Bulling

Proc. ACM CHI Conference on Human Factors in Computing Systems (CHI), pp. 1–17, 2025.

Abstract Links BibTeX Project

Recent work has highlighted the potential of modelling interactive behaviour analogously to natural language. We propose interactive behaviour summarisation as a novel computational task and demonstrate its usefulness for automatically uncovering latent user goals while interacting with graphical user interfaces. We introduce SummAct – a novel hierarchical method to summarise low-level input actions into high-level goals to tackle this task. SummAct first identifies sub-goals from user actions using a large language model and in-context learning. In a second step, high-level goals are obtained by fine-tuning the model using a novel UI element weighting mechanism to preserve detailed context information embedded within UI elements during summarisation. Through a series of evaluations, we demonstrate that SummAct significantly outperforms baseline methods across desktop and mobile user interfaces and interactive tasks by up to 21.9%. We further introduce two exciting example use cases enabled by our method: interactive behaviour forecasting and automatic behaviour synonym identification.

Paper: paper.pdf

@inproceedings{zhang25summact, title = {SummAct: Uncovering User Intentions Through Interactive Behaviour Summarisation}, author = {Zhang, Guanhua and Ahmed, Mohamed and Hu, Zhiming and Bulling, Andreas}, year = {2025}, pages = {1--17}, booktitle = {Proc. ACM CHI Conference on Human Factors in Computing Systems (CHI)}}

GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction

Haodong Yan#, Zhiming Hu#*, Syn Schmitt, Andreas Bulling

Proceedings of the 2024 Pacific Conference on Computer Graphics and Applications (Pacific Graphics), 2024.

Abstract Links BibTeX Project

Human motion prediction is important for many virtual and augmented reality (VR/AR) applications such as collision avoidance and realistic avatar generation. Existing methods have synthesised body motion only from observed past motion, despite the fact that human eye gaze is known to correlate strongly with body movements and is readily available in recent VR/AR headsets. We present GazeMoDiff – a novel gaze-guided denoising diffusion model to generate stochastic human motions. Our method first uses a gaze encoder and a motion encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features, and finally injects the gaze-motion features into a noise prediction network via a cross-attention mechanism to progressively generate multiple reasonable human motions in the future. Extensive experiments on the MoGaze and GIMO datasets demonstrate that our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final displacement error (17.3% on MoGaze and 13.3% on GIMO). We further conducted a human study (N=21) and validated that the motions generated by our method were perceived as both more precise and more realistic than those of prior methods. Taken together, these results reveal the significant information content available in eye gaze for stochastic human motion prediction as well as the effectiveness of our method in exploiting this information.

Paper: paper.pdf

@inproceedings{yan24gazemodiff, title={GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction}, author={Yan, Haodong and Hu, Zhiming and Schmitt, Syn and Bulling, Andreas}, booktitle={Proceedings of the 2024 Pacific Conference on Computer Graphics and Applications}, year={2024}}

DisMouse: Disentangling Information from Mouse Movement Data

Guanhua Zhang, Zhiming Hu*, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 1–13, 2024.

Abstract Links BibTeX Project

Mouse movement data contain rich information about users, performed tasks, and user interfaces, but separating the respective components remains challenging and unexplored. As a first step to address this challenge, we propose DisMouse – the first method to disentangle user-specific and user-independent information and stochastic variations from mouse movement data. At the core of our method is an autoencoder trained in a semi-supervised fashion, consisting of a self-supervised denoising diffusion process and a supervised contrastive user identification module. Through evaluations on three datasets, we show that DisMouse 1) captures complementary information of mouse input, hence providing an interpretable framework for modelling mouse movements, 2) can be used to produce refined features, thus enabling various applications such as personalised and variable mouse data generation, and 3) generalises across different datasets. Taken together, our results underline the significant potential of disentangled representation learning for explainable, controllable, and generalised mouse behaviour modelling.

Paper: paper.pdf

@inproceedings{zhang24dismouse, title = {DisMouse: Disentangling Information from Mouse Movement Data}, author = {Zhang, Guanhua and Hu, Zhiming and Bulling, Andreas}, year = {2024}, pages = {1--13}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}}

GazeMotion: Gaze-guided Human Motion Forecasting

Zhiming Hu*, Syn Schmitt, Daniel Haeufle, Andreas Bulling

Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024.

Abstract Links BibTeX Project Oral Presentation

We present GazeMotion – a novel method for human motion forecasting that combines information on past human poses with human eye gaze. Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion. We extensively evaluate our method on the MoGaze, ADT, and GIMO benchmark datasets and show that it outperforms state-of-the-art methods by up to 7.4% improvement in mean per joint position error. Using head direction as a proxy to gaze, our method still achieves an average improvement of 5.5%. We finally report an online user study showing that our method also outperforms prior methods in terms of perceived realism. These results show the significant information content available in eye gaze for human motion forecasting as well as the effectiveness of our method in exploiting this information.

Paper: paper.pdf

@inproceedings{hu24gazemotion, title={GazeMotion: Gaze-guided Human Motion Forecasting}, author={Hu, Zhiming and Schmitt, Syn and Haeufle, Daniel and Bulling, Andreas}, booktitle={Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems}, year={2024}}

HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes

Zhiming Hu*, Zheming Yin, Daniel Haeufle, Syn Schmitt, Andreas Bulling

IEEE Transactions on Visualization and Computer Graphics (TVCG, ISMAR 2024 Journal-track), 2024.

Abstract Links BibTeX Project Best Journal Paper Award

We present HOIMotion – a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph convolutional network (GCN) and multi-layer perceptrons to extract features from body poses and egocentric 3D object bounding boxes, respectively. Our method then fuses pose and object features into a novel pose-object graph and uses a residual-decoder GCN to forecast future body motion. We extensively evaluate our method on the Aria digital twin (ADT) and MoGaze datasets and show that HOIMotion consistently outperforms state-of-the-art methods by a large margin of up to 8.7% on ADT and 7.2% on MoGaze in terms of mean per joint position error. Complementing these evaluations, we report a human study (N=20) that shows that the improvements achieved by our method result in forecasted poses being perceived as both more precise and more realistic than those of existing methods. Taken together, these results reveal the significant information content available in egocentric 3D object bounding boxes for human motion forecasting and the effectiveness of our method in exploiting this information.

Paper: paper.pdf

@article{hu24hoimotion, author={Hu, Zhiming and Yin, Zheming and Haeufle, Daniel and Schmitt, Syn and Bulling, Andreas}, journal={IEEE Transactions on Visualization and Computer Graphics}, title={HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes}, year={2024}}

Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses

Zhiming Hu*, Jiahui Xu, Syn Schmitt, Andreas Bulling

IEEE Transactions on Visualization and Computer Graphics (TVCG, oral presentation at ISMAR 2024), 2024.

Abstract Links BibTeX Project

Human eye gaze plays a significant role in many virtual and augmented reality (VR/AR) applications, such as gaze-contingent rendering, gaze-based interaction, or eye-based activity recognition. However, prior works on gaze analysis and prediction have only explored eye-head coordination and were limited to human-object interactions. We first report a comprehensive analysis of eye-body coordination in various human-object and human-human interaction activities based on four public datasets collected in real-world (MoGaze), VR (ADT), as well as AR (GIMO and EgoBody) environments. We show that in human-object interactions, e.g. pick and place, eye gaze exhibits strong correlations with full-body motion while in human-human interactions, e.g. chat and teach, a person’s gaze direction is correlated with the body orientation towards the interaction partner. Informed by these analyses we then present Pose2Gaze – a novel eye-body coordination model that uses a convolutional neural network and a spatio-temporal graph convolutional neural network to extract features from head direction and full-body poses, respectively, and then uses a convolutional neural network to predict eye gaze. We compare our method with state-of-the-art methods that predict eye gaze only from head movements and show that Pose2Gaze outperforms these baselines with an average improvement of 24.0% on MoGaze, 10.1% on ADT, 21.3% on GIMO, and 28.6% on EgoBody in mean angular error, respectively. We also show that our method significantly outperforms prior methods in the sample downstream task of eye-based activity recognition. These results underline the significant information content available in eye-body coordination during daily activities and open up a new direction for gaze prediction.

Paper: paper.pdf

@article{hu24pose2gaze, author={Hu, Zhiming and Xu, Jiahui and Schmitt, Syn and Bulling, Andreas}, journal={IEEE Transactions on Visualization and Computer Graphics}, title={Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses}, year={2024}}

VisRecall++: Analysing and Predicting Visualisation Recallability from Gaze Behaviour

Yao Wang, Yue Jiang, Zhiming Hu, Constantin Ruhdorfer, Mihai Bâce, Andreas Bulling

Proc. ACM on Human-Computer Interaction (PACM HCI), 8 (ETRA), pp. 1–18, 2024.

Abstract Links BibTeX Project

Question answering has recently been proposed as a promising means to assess the recallability of information visualisations. However, prior works are yet to study the link between visually encoding a visualisation in memory and recall performance. To fill this gap, we propose VisRecall++ – a novel 40-participant recallability dataset that contains gaze data on 200 visualisations and five question types, such as identifying the title, and finding extreme values.We measured recallability by asking participants questions after they observed the visualisation for 10 seconds.Our analyses reveal several insights, such as saccade amplitude, number of fixations, and fixation duration significantly differ between high and low recallability groups.Finally, we propose GazeRecallNet – a novel computational method to predict recallability from gaze behaviour that outperforms several baselines on this task.Taken together, our results shed light on assessing recallability from gaze behaviour and inform future work on recallability-based visualisation optimisation.

Paper: paper.pdf

@article{wang24visrecall, title = {VisRecall++: Analysing and Predicting Visualisation Recallability from Gaze Behaviour}, author = {Wang, Yao and Jiang, Yue and Hu, Zhiming and Ruhdorfer, Constantin and Bâce, Mihai and Bulling, Andreas}, year = {2024}, journal = {Proc. ACM on Human-Computer Interaction (PACM HCI)}, pages = {1--18}, volume = {8}, number = {ETRA}}

PrivatEyes: Appearance-based Gaze Estimation Using Federated Secure Multi-Party Computation

Mayar Elfares, Pascal Reisert, Zhiming Hu, Wenwu Tang, Ralf Küsters, Andreas Bulling

Proc. ACM on Human-Computer Interaction (PACM HCI), 8 (ETRA), pp. 1–22, 2024.

Abstract Links BibTeX Project

Latest gaze estimation methods require large-scale training data but their collection and exchange pose significant privacy risks. We propose PrivatEyes - the first privacy-enhancing training approach for appearance-based gaze estimation based on federated learning (FL) and secure multi-party computation (MPC). PrivatEyes enables training gaze estimators on multiple local datasets across different users and server-based secure aggregation of the individual estimators’ updates. PrivatEyes guarantees that individual gaze data remains private even if a majority of the aggregating servers is malicious. We also introduce a new data leakage attack DualView that shows that PrivatEyes limits the leakage of private training data more effectively than previous approaches. Evaluations on the MPIIGaze, MPIIFaceGaze, GazeCapture, and NVGaze datasets further show that the improved privacy does not lead to a lower gaze estimation accuracy or substantially higher computational costs - both of which are on par with its non-secure counterparts.

Paper: paper.pdf

@article{elfares24privateyes, title = {PrivatEyes: Appearance-based Gaze Estimation Using Federated Secure Multi-Party Computation}, author = {Elfares, Mayar and Reisert, Pascal and Hu, Zhiming and Tang, Wenwu and Küsters, Ralf and Bulling, Andreas}, year = {2024}, journal = {Proc. ACM on Human-Computer Interaction (PACM HCI)}, pages = {1--22}, volume = {8}, number = {ETRA}}

Mouse2Vec: Learning Reusable Semantic Representations of Mouse Behaviour

Guanhua Zhang, Zhiming Hu*, Mihai Bâce, Andreas Bulling

Proc. ACM CHI Conference on Human Factors in Computing Systems (CHI), pp. 1–17, 2024.

Abstract Links BibTeX Project

The mouse is a pervasive input device used for a wide range of interactive applications. However, computational modelling of mouse behaviour typically requires time-consuming design and extraction of handcrafted features, or approaches that are application-specific. We instead propose Mouse2Vec – a novel self-supervised method designed to learn semantic representations of mouse behaviour that are reusable across users and applications. Mouse2Vec uses a Transformer-based encoder-decoder architecture, which is specifically geared for mouse data: During pretraining, the encoder learns an embedding of input mouse trajectories while the decoder reconstructs the input and simultaneously detects mouse click events. We show that the representations learned by our method can identify interpretable mouse behaviour clusters and retrieve similar mouse trajectories. We also demonstrate on three sample downstream tasks that the representations can be practically used to augment mouse data for training supervised methods and serve as an effective feature extractor.

Paper: paper.pdf

@inproceedings{zhang24mouse2vec, title = {Mouse2Vec: Learning Reusable Semantic Representations of Mouse Behaviour}, author = {Zhang, Guanhua and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas}, year = {2024}, pages = {1--17}, booktitle = {Proc. ACM CHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3613904.3642141}}

SalChartQA: Question-driven Saliency on Information Visualisations

Yao Wang, Weitian Wang, Abdullah Abdelhafez, Mayar Elfares, Zhiming Hu*, Mihai Bâce, Andreas Bulling

Proc. ACM CHI Conference on Human Factors in Computing Systems (CHI), pp. 1–14, 2024.

Abstract Links BibTeX Project

Understanding the link between visual attention and user’s needs when visually exploring information visualisations is under-explored due to a lack of large and diverse datasets to facilitate these analyses. To fill this gap, we introduce SalChartQA – a novel crowd-sourced dataset that uses the BubbleView interface as a proxy for human gaze and a question-answering (QA) paradigm to induce different information needs in users. SalChartQA contains 74,340 answers to 6,000 questions on 3,000 visualisations. Informed by our analyses demonstrating the tight correlation between the question and visual saliency, we propose the first computational method to predict question-driven saliency on information visualisations. Our method outperforms state-of-the-art saliency models, improving several metrics, such as the correlation coefficient and the Kullback-Leibler divergence. These results show the importance of information needs for shaping attention behaviour and paving the way for new applications, such as task-driven optimisation of visualisations or explainable AI in chart question-answering.

Paper: paper.pdf

@inproceedings{wang24salchartqa, title = {SalChartQA: Question-driven Saliency on Information Visualisations}, author = {Wang, Yao and Wang, Weitian and Abdelhafez, Abdullah and Elfares, Mayar and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas}, year = {2024}, pages = {1--14}, booktitle = {Proc. ACM CHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3613904.3642942}}

SUPREYES: SUPer Resolution for EYES Using Implicit Neural Representation Learning

Chuhan Jiao, Zhiming Hu*, Mihai Bâce, Andreas Bulling

Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 1–13, 2023.

Abstract Links BibTeX Project

We introduce SUPREYES – a novel self-supervised method to increase the spatio-temporal resolution of gaze data recorded using low(er)-resolution eye trackers. Despite continuing advances in eye tracking technology, the vast majority of current eye trackers – particularly mobile ones and those integrated into mobile devices – suffer from low-resolution gaze data, thus fundamentally limiting their practical usefulness. SUPREYES learns a continuous implicit neural representation from low-resolution gaze data to up-sample the gaze data to arbitrary resolutions. We compare our method with commonly used interpolation methods on arbitrary scale super-resolution and demonstrate that SUPREYES outperforms these baselines by a significant margin. We also test on the sample downstream task of gaze-based user identification and show that our method improves the performance of original low-resolution gaze data and outperforms other baselines. These results are promising as they open up a new direction for increasing eye tracking fidelity as well as enabling new gaze-based applications without the need for new eye tracking equipment.

Paper: paper.pdf

@inproceedings{jiao23supreyes, author = {Jiao, Chuhan and Hu, Zhiming and B{\^a}ce, Mihai and Bulling, Andreas}, title = {SUPREYES: SUPer Resolution for EYES Using Implicit Neural Representation Learning}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, year = {2023}, pages = {1--13}, doi = {10.1145/3586183.3606780}}

Exploring Natural Language Processing Methods for Interactive Behaviour Modelling

Guanhua Zhang, Matteo Bortoletto, Zhiming Hu*, Lei Shi, Mihai Bâce, Andreas Bulling

Proc. IFIP TC13 Conference on Human-Computer Interaction (INTERACT), pp. 1–22, 2023.

Abstract Links BibTeX Project Best Student Paper Nominees

Analysing and modelling interactive behaviour is an important topic in human-computer interaction (HCI) and a key requirement for the development of intelligent interactive systems. Interactive behaviour has a sequential (actions happen one after another) and hierarchical (a sequence of actions forms an activity driven by interaction goals) structure, which may be similar to the structure of natural language. Designed based on such a structure, natural language processing (NLP) methods have achieved groundbreaking success in various downstream tasks. However, few works linked interactive behaviour with natural language. In this paper, we explore the similarity between interactive behaviour and natural language by applying an NLP method, byte pair encoding (BPE), to encode mouse and keyboard behaviour. We then analyse the vocabulary, i.e., the set of action sequences, learnt by BPE, as well as use the vocabulary to encode the input behaviour for interactive task recognition. An existing dataset collected in constrained lab settings and our novel out-of-the-lab dataset were used for evaluation. Results show that this natural language-inspired approach not only learns action sequences that reflect specific interaction goals, but also achieves higher F1 scores on task recognition than other methods. Our work reveals the similarity between interactive behaviour and natural language, and presents the potential of applying the new pack of methods that leverage insights from NLP to model interactive behaviour in HCI.

Paper: paper.pdf

@inproceedings{zhang23exploring, title = {Exploring Natural Language Processing Methods for Interactive Behaviour Modelling}, author = {Zhang, Guanhua and Bortoletto, Matteo and Hu, Zhiming and Shi, Lei and B{\^a}ce, Mihai and Bulling, Andreas}, booktitle = {Proc. IFIP TC13 Conference on Human-Computer Interaction (INTERACT)}, pages = {1--22}, year = {2023}, publisher = {Springer}}

EHTask: Recognizing User Tasks from Eye and Head Movements in Immersive Virtual Reality

Zhiming Hu, Andreas Bulling, Sheng Li, Guoping Wang

IEEE Transactions on Visualization and Computer Graphics (TVCG, oral presentation at IEEE VR 2022), 2023, 29(4): 1992-2004.

Abstract Links BibTeX Project

Understanding human visual attention in immersive virtual reality (VR) is crucial for many important applications, including gaze prediction, gaze guidance, and gaze-contingent rendering. However, previous works on visual attention analysis typically only explored one specific VR task and paid less attention to the differences between different tasks. Moreover, existing task recognition methods typically focused on 2D viewing conditions and only explored the effectiveness of human eye movements. We first collect eye and head movements of 30 participants performing four tasks, i.e. Free viewing, Visual search, Saliency, and Track, in 15 360-degree VR videos. Using this dataset, we analyze the patterns of human eye and head movements and reveal significant differences across different tasks in terms of fixation duration, saccade amplitude, head rotation velocity, and eye-head coordination. We then propose EHTask -- a novel learning-based method that employs eye and head movements to recognize user tasks in VR. We show that our method significantly outperforms the state-of-the-art methods derived from 2D viewing conditions both on our dataset (accuracy of 84.4% vs. 62.8%) and on a real-world dataset (61.9% vs. 44.1%). As such, our work provides meaningful insights into human visual attention under different VR tasks and guides future work on recognizing user tasks in VR.

Doi: doi

Paper: paper.pdf
Dataset: dataset
Slides: slides.pdf
Code: code

@article{hu22ehtask, author={Hu, Zhiming and Bulling, Andreas and Li, Sheng and Wang, Guoping}, journal={IEEE Transactions on Visualization and Computer Graphics}, title={EHTask: Recognizing User Tasks From Eye and Head Movements in Immersive Virtual Reality}, year={2023}, volume={29}, number={4}, pages={1992-2004}, doi={10.1109/TVCG.2021.3138902}}

Intentional Head-Motion Assisted Locomotion for Reducing Cybersickness

Zehui Lin, Xiang Gu, Sheng Li, Zhiming Hu, Guoping Wang

IEEE Transactions on Visualization and Computer Graphics (TVCG, oral presentation at IEEE VR 2022), 2023, 29(8): 3458-3471.

Abstract Links BibTeX Project

We present an efficient locomotion technique that can reduce cybersickness through aligning the visual and vestibular induced self-motion illusion. Our locomotion technique stimulates proprioception consistent with the visual sense by intentional head motion, which includes both the head’s translational movement and yaw rotation. A locomotion event is triggered by the hand-held controller together with an intended physical head motion simultaneously. Based on our method, we further explore the connections between the level of cybersickness and the velocity of self motion through a series of experiments. We first conduct Experiment 1 to investigate the cybersickness induced by different translation velocities using our method and then conduct Experiment 2 to investigate the cybersickness induced by different angular velocities. Our user studies from these two experiments reveal a new finding on the correlation between translation/angular velocities and the level of cybersickness. The cybersickness is greatest at the lowest velocity using our method, and the statistical analysis also indicates a possible U-shaped relation between the translation/angular velocity and cybersickness degree. Finally, we conduct Experiment 3 to evaluate the performances of our method and other commonly-used locomotion approaches, i.e., joystick-based steering and teleportation. The results show that our method can significantly reduce cybersickness compared with the joystick-based steering and obtain a higher presence compared with the teleportation. These advantages demonstrate that our method can be an optional locomotion solution for immersive VR applications using commercially available HMD suites only.

Doi: doi

Paper: paper.pdf

@article{lin22intentional, author={Lin, Zehui and Gu, Xiang and Li, Sheng and Hu, Zhiming and Wang, Guoping}, journal={IEEE Transactions on Visualization and Computer Graphics}, title={Intentional Head-Motion Assisted Locomotion for Reducing Cybersickness}, year={2023}, volume={29}, number={8}, pages={3458-3471}, doi={10.1109/TVCG.2022.3160232}}

Federated Learning for Appearance-based Gaze Estimation in the Wild

Mayar Elfares, Zhiming Hu, Pascal Reisert, Andreas Bulling, Ralf Küsters

Proceedings of the NeurIPS Workshop Gaze Meets ML (NeurIPS GMML), pp. 1–17, 2022.

Abstract Links BibTeX Project

Gaze estimation methods have significantly matured in recent years but the large number of eye images required to train deep learning models poses significant privacy risks. In addition, the heterogeneous data distribution across different users can significantly hinder the training process. In this work, we propose the first federated learning approach for gaze estimation to preserve the privacy of gaze data. We further employ pseudo-gradients optimisation to adapt our federated learning approach to the divergent model updates to address the heterogeneous nature of in-the-wild gaze data in collaborative setups. We evaluate our approach on a real-world dataset (MPIIGaze dataset) and show that our work enhances the privacy guarantees of conventional appearance-based gaze estimation methods, handles the convergence issues of gaze estimators, and significantly outperforms vanilla federated learning by 15.8% (from a mean error of 10.63 degrees to 8.95 degrees). As such, our work paves the way to develop privacy-aware collaborative 14 learning setups for gaze estimation while maintaining the model’s performance.

Doi: doi

Paper: paper.pdf

@inproceedings{elfares22federated, title = {Federated Learning for Appearance-based Gaze Estimation in the Wild}, author = {Elfares, Mayar and Hu, Zhiming and Reisert, Pascal and Bulling, Andreas and Küsters, Ralf}, year = {2022}, booktitle = {Proceedings of the NeurIPS Workshop Gaze Meets ML (GMML)}, doi = {10.48550/arXiv.2211.07330}, pages = {1--17}}

FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments

Zhiming Hu, Andreas Bulling, Sheng Li, Guoping Wang

IEEE Transactions on Visualization and Computer Graphics (TVCG, IEEE VR 2021 Journal-track), 2021, 27(5): 2681-2690.

Abstract Links BibTeX Project Best Journal Paper Nominees

Human visual attention in immersive virtual reality (VR) is key for many important applications, such as content design, gaze-contingent rendering, or gaze-based interaction. However, prior works typically focused on free-viewing conditions that have limited relevance for practical applications. We first collect eye tracking data of 27 participants performing a visual search task in four immersive VR environments. Based on this dataset, we provide a comprehensive analysis of the collected data and reveal correlations between users' eye fixations and other factors, i.e. users' historical gaze positions, task-related objects, saliency information of the VR content, and users' head rotation velocities. Based on this analysis, we propose FixationNet -- a novel learning-based model to forecast users' eye fixations in the near future in VR. We evaluate the performance of our model for free-viewing and task-oriented settings and show that it outperforms the state of the art by a large margin of 19.8% (from a mean error of 2.93° to 2.35°) in free-viewing and of 15.1% (from 2.05° to 1.74°) in task-oriented situations. As such, our work provides new insights into task-oriented attention in virtual environments and guides future work on this important topic in VR research.

Doi: doi

Paper: paper.pdf

Code: code

Dataset: dataset
Slides: slides.pdf

Experimental scenes: experimental senes

Supplementary materials: supplementary materials

@article{hu21fixationnet, title={FixationNet: Forecasting eye fixations in task-oriented virtual environments}, author={Hu, Zhiming and Bulling, Andreas and Li, Sheng and Wang, Guoping}, journal={IEEE Transactions on Visualization and Computer Graphics}, volume={27}, number={5}, pages={2681--2690}, year={2021}, publisher={IEEE}}

DGaze: CNN-Based Gaze Prediction in Dynamic Scenes

Zhiming Hu, Sheng Li, Congyi Zhang, Kangrui Yi, Guoping Wang, Dinesh Manocha

IEEE Transactions on Visualization and Computer Graphics (TVCG, IEEE VR 2020 Journal-track), 2020, 26(5): 1902-1911.

Abstract Links BibTeX Project

We conduct novel analyses of users' gaze behaviors in dynamic virtual scenes and, based on our analyses, we present a novel CNN-based model called DGaze for gaze prediction in HMD-based applications. We first collect 43 users' eye tracking data in 5 dynamic scenes under free-viewing conditions. Next, we perform statistical analysis of our data and observe that dynamic object positions, head rotation velocities, and salient regions are correlated with users' gaze positions. Based on our analysis, we present a CNN-based model (DGaze) that combines object position sequence, head velocity sequence, and saliency features to predict users' gaze positions. Our model can be applied to predict not only realtime gaze positions but also gaze positions in the near future and can achieve better performance than prior method. In terms of realtime prediction, DGaze achieves a 22.0% improvement over prior method in dynamic scenes and obtains an improvement of 9.5% in static scenes, based on using the angular distance as the evaluation metric. We also propose a variant of our model called DGaze_ET that can be used to predict future gaze positions with higher precision by combining accurate past gaze data gathered using an eye tracker. We further analyze our CNN architecture and verify the effectiveness of each component in our model. We apply DGaze to gaze-contingent rendering and a game, and also present the evaluation results from a user study.

Doi: doi

Paper: paper.pdf

Code: code

Dataset: dataset
Slides: slides.pdf

Experimental scenes: experimental scenes

Supplementary materials: supplementary materials

@article{hu20dgaze, title={DGaze: CNN-Based Gaze Prediction in Dynamic Scenes}, author={Hu, Zhiming and Li, Sheng and Zhang, Congyi and Yi, Kangrui and Wang, Guoping and Manocha, Dinesh}, journal={IEEE Transactions on Visualization and Computer Graphics}, volume={26}, number={5}, pages={1902--1911}, year={2020}, publisher={IEEE}}

Temporal continuity of visual attention for future gaze prediction in immersive virtual reality

Zhiming Hu, Sheng Li, Meng Gai

Virtual Reality & Intelligent Hardware (VRIH), 2020, 2(2): 142-152.

Abstract Links BibTeX Project

Background Eye tracking technology is receiving increased attention in the field of virtual reality. Specifically, future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering, advertisement placement, and content-based design. To explore future gaze prediction, it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality. Methods In this paper, the concept of temporal continuity of visual attention is presented. Subsequently, an autocorrelation function method is proposed to evaluate the temporal continuity. Thereafter, the temporal continuity is analyzed in both free-viewing and task-oriented conditions. Results Specifically, in free-viewing conditions, the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval. A task-oriented game scene condition was created and conducted to collect users' gaze data. An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions. Temporal continuity can be applied to future gaze prediction and if it is good, users' current gaze positions can be directly utilized to predict their gaze positions in the future. Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction. The task of long-term gaze prediction still remains to be explored.

Doi: doi

Paper: paper.pdf

@article{hu20temporal, title={Temporal continuity of visual attention for future gaze prediction in immersive virtual reality}, author={Hu, Zhiming and Li, Sheng and Gai, Meng}, journal={Virtual Reality and Intelligent Hardware}, volume={2}, number={2}, pages={142--152}, year={2020}, publisher={Elsevier}}

SGaze: A Data-Driven Eye-Head Coordination Model for Realtime Gaze Prediction

Zhiming Hu, Congyi Zhang, Sheng Li, Guoping Wang, Dinesh Manocha

IEEE Transactions on Visualization and Computer Graphics (TVCG, IEEE VR 2019 Journal-track), 2019, 25(5): 2002-2010.

Abstract Links BibTeX Project

We present a novel, data-driven eye-head coordination model that can be used for realtime gaze prediction for immersive HMD-based applications without any external hardware or eye tracker. Our model (SGaze) is computed by generating a large dataset that corresponds to different users navigating in virtual worlds with different lighting conditions. We perform statistical analysis on the recorded data and observe a linear correlation between gaze positions and head rotation angular velocities. We also find that there exists a latency between eye movements and head movements. SGaze can work as a software-based realtime gaze predictor and we formulate a time related function between head movement and eye movement and use that for realtime gaze position prediction. We demonstrate the benefits of SGaze for gaze-contingent rendering and evaluate the results with a user study.

Doi: doi

Paper: paper.pdf
Dataset: dataset
Code: code

Supplementary materials: supplementary materials

@article{hu19sgaze, title={SGaze: A Data-Driven Eye-Head Coordination Model for Realtime Gaze Prediction}, author={Hu, Zhiming and Zhang, Congyi and Li, Sheng and Wang, Guoping and Manocha, Dinesh}, journal={IEEE Transactions on Visualization and Computer Graphics}, volume={25}, number={5}, pages={2002--2010}, year={2019}, publisher={IEEE}}

Personal Blogs

All blogs are written in Chinese.