G-VOILA

"Can AI understand us better if eye data is shared?"

Overview

“Can AI understand us better if eye data is shared?”

The information conveyed by eye movements is very rich in content. Within the movements and patterns of gaze, there lies information about preferences, intentions, as well as incomplete contextual information.

As developers of general artificial intelligence increasingly shift their focus towards incorporating more modalities of information such as images and sounds, we can’t help but wonder: If we provide such rich eye movement information to artificial intelligence, could it enable them to better understand us?

Role

2nd author (of HCI research paper)

Duration

2023 June – 2023 Sept. (4 months) 

Director

Yuntao Wang, Tsinghua PI HCI Lab

Category

Smart glasses, Large Language Models, Gaze Tracking

Current Status

Submitted, IMWUT 24′

Abstract

The boundaries of information access have been radically redefined in today’s ubiquitous computing era. No longer confined to traditional mediums or specific locations, information retrieval (IR) has transcended physical limitations, allowing individuals to tap into vast information anywhere and anytime. In this paper, we present G-VOILA, a technique that integrates gaze and voice inputs to streamline information querying in everyday contexts on smart glasses. We conducted a user study to delve into the inherent ways users formulate queries with G-VOILA. By harnessing advanced computer vision techniques and large language models (LLMs) to decode intricate gaze and voice patterns, G-VOILA reveals the user’s unspoken interests, providing a semantic understanding of their query intent to deliver apt responses. In a follow-up study involving 16 participants across two real-life scenarios, our method showcased an 89% query intent recall. Additionally, users indicated strong satisfaction with the responses, a high matching score for query intent, and a pronounced preference for usage. These results validate the efficacy of G-VOILA for natural information retrieval in daily scenarios.

Research Gap

As the cyber and physical spaces quickly merge, people exhibit a significant demand for information retrieval (IR) anywhere at any time in their daily lives.

Traditional text-based querying methods, such as search engines or chatbots, are often restricted to specific devices and rigid input modalities, which can sometimes cause difficulties in expressing queries and interrupt user’s current workflow.

Behavioral Investigation

To gain insights into users’ natural expression patterns and potential engagement with G-Voila, we conducted a formative study in three daily scenarios. 

Our findings indicate that users tend to omit specific context-related details in their expressions, assuming that G-Voila (AI assistant) can inherently comprehend such information through its sensing capabilities. We categorized users’ queries according to different levels of ambiguity; for further details, please refer to our paper.

Methods

Utilizing the quantitative and qualitative analysis results obtained from the formative study, along with cutting-edge multi-modal large models and large language models, we proposed and implemented a six-stage pipeline for understanding query intent and answering questions effectively

User Evaluations

To evaluate G-Voila, we conducted a controlled user study in two daily life settings, obtaining an objective intent recall score of 0.89 and a precision score of 0.84. Additionally, G-Voila demonstrated a significantly higher subjective preference compared to our no-gaze baseline.

Nolibox (confidential)

#Algorithm #AI Design #Knowledge Map

NeRF Virtual Avatar Reconsturction

Utilizing NeRF technology, I achieved the reconstruction of virtual avatars from 2D photos to 3D models. A 3D model was rebuilt from 24 high-definition photographs.

Learn more about this program

SpaceFlow

An astronaut-specific tea kettle leveraging space’s zero-gravity. It’s engineered to avoid spills and enables convenient swapping of tea bags.

Tools: Rhino, Realflow

Floral Echos

An artistic phone sound amplifier with a special chamber design for audio enhancement, doubling as a vase for fresh flowers, offering both visual and auditory delight.

Tools: Rhino, Keyshot

Bugoides

A biomimetic animal based on a four-bar linkage, replicating an insect’s gait, composed entirely of one type of part.

Tools: Solidworks, Mechanical Manufacturing

→ Learn more about this program

Kinetic Art

A dynamic art installation, emulating Alexandar’s style, depicting a Chinese verse:

“You stand on the bridge, beholding the view / While from the tower, the viewers behold you / The bright moon adorns your window’s frame / And in others’ dreams, you do the same.”

i: Companion

A personal companion app that establishes a unique personality, founded on the ‘I as my own partner’ principle, by constructing a virtual self from user’s linguistic inputs.

Tool:  Swift

Moonster Music

‘Moonster’ has identified users’ tendencies to either venture beyond or stick to their comfort zones in music.

This platform allows you to explore what songs others are enjoying and adapt your monster persona to mingle in various music-listening groups.

Tools: Figma

→ Visit my video to read more

Space Article Builder

An exploration into a new reading medium. I’ve envisioned a virtual space where users can create virtual buildings, embedding articles, videos, and other information within, offering a non-linear reading experience.

Tool: Minecraft

→ Visit my video to read more

New Wave

‘New Wave’ is a memorable art performance event. It took five months of preparation and was presented to over 1,000 spectators. Yuchen served as the stage director, and the event featured a ‘vintage radio’ artistic concept, linking all performances in a time-traveling radio story.

ASL Sign Search (not released)

#UX #CV #Assistive Technology

Connect with me

yuchenyao_thu@163.com

→ LinkedIn

→ Instagram

Tabc T

TabcT (Thermal Activated Bacteria Cancer Therapy) is a temperature-controlled bacterial therapy developed by Tsinghua 2023, targeting breast cancer, and it offers the advantages of being 5C: cheap, convenient, controlled, continuous, and comprehensive. This is a collaborative project, Yuchen partook in this program as a design advisor and software developer.

→ Visit our wiki to see more

Fufu Companion

“Record and soothe depression with a soft robot”

Fufu’s diary is program aimed at helping the depression patients, especially in the situation thatCOVID-19 pandemic is separating people physically, making it harder to get offline treatment.

Fufu contains both Online And Offline (OAO) product, all the way to an integrated service system. Here we present fufu’s diary, in a very story-telling way.

→ See more details

MuSee

“Reimagine the unheard miracles in an alternate expression”

Imagine a world where no mellifluous birdsong, soft-spoken words, or majestic melodies exist. How poignant to miss these sounds! Yet, 448 million globally endure this quietude due to auditory impairment.

In 2021, a group of Tsinghua University students aspired to bridge this void. They sought to ‘play’ a symphony in lights, enabling those with hearing loss to partake in the symphony’s wonder, in a one-to-one sensory dance.

→ See more details

Unlock Fear

Current Status: IMWUT 24′ under review

“Unlock the secret of fear”

Acrophobia, the fear of heights, significantly affects overall quality of life. People who suffer from acrophobia know this fear is excessive and unreasonable, yet 6.4% of total population suffer from this issue.

We investigated the feasibility of passive sensing data for fear measurement and discern the most influential physiological and behavioral indicators. 

→ See more details

Resonant Echoes​

“Bridging Deaf Children to Quality Education Recovery”

Resonant Echoes  is an innovative augmented reality interface that helps teachers understand the intentions of language-impaired kids, facilitating better rehabilitation and social engagement for deaf students.

Deaf children often lack sufficient language skills during rehabilitation, which hinders their socialization due to difficulties in communication. The ‘Resonant Echoes’ helps therapists overcome understanding difficulties with an XR display and embedded AI, focusing on a role-play game scenario.

→ See more details

G-Voila

Current Status: IMWUT 24′ under review

“Can AI understand us better if eye data is shared?”

As developers of general artificial intelligence increasingly shift their focus towards incorporating more modalities of information such as images and sounds, we can’t help but wonder: If we provide such rich eye movement information to artificial intelligence, could it enable them to better understand us?

→ See more details

Kosmos-2 Interface (not released)

#UX #LLM #Interface

Time of City

An artistic installation that embodies a variety of light and shadow patterns within its Sisyphean cycle.

Tools: Raspberry Pi, Smart Car

→ Visit my video to read more