I led a generative quantitative research study using eye-tracking technology to determine optimal UI. My findings were presented at the World Passenger Festival in Vienna and at APTA in Orlando, impacting hundreds of transit systems worldwide.
My roles
Lead Researcher
Data Analyst
Findings Advocate

My tools
ET, VR, mixed-methods UXR
Python, multivariate statistics
Figma, Miro, Powerpoint/Slides
My impact

I led end-to end research execution and UX strategy, as well as project scope and goals. I presented synthesized research findings to leadership & design teams, which led to a company-wide UI overhaul.
A laptop screen showing the biometrics analytics software used in the data synthesis process
A F-POV stream from our eye-tracking glasses with a fixation and saccade data overlay during user testing

Probing

How do our users actually perceive and interface with our information architecture?

I joined the UX team a month after it was established, at a time when the company’s understanding of user behavior was tenuous at best; roadblocks, desires, ethnography, everything was completely unknown. Given the stand-up state of the team, I was given full autonomy to proceed with user experience research at-will. To lay a solid foundation, I started with an exploratory pain point study.

Readability metrics drove the rewording of open-ended questions, allowing someone with the English proficiency as low as that of a third-grader to engage with the survey. These tolerances held true when translated into Seattle’s seven most spoken languages, allowing for a wide range of language distribution. Of course, great care was taken to optimize ease-of-consumption, sequester response leading, and limit ordering effect bias.

An intentionally blurred photo of an affinity map
An intentionally blurred photo of conversations among my team about preliminary research synthesis

Intentionally blurred data synthesis conversations for respondent/team confidentiality

Synthesizing survey data gave us broad thematic strokes into the struggles of users, with numerous sub-trends pointing towards the larger notion of “Wayfinding Struggles”. Respondents repeatedly mentioned exit signage and navigation boards as difficult to locate.

Problem: How can we optimize UI and IA to streamline user flow?

My solution: Use eye-tracking research methodology to uncover ocular hotspots.

Initial Thinking

The signage existed, it was just hard to find; but for signage to be effective, it needs to be seen.

I thought the best way forward was to employ an eye-tracking research study. This approach would be cheaper and faster than A/B testing signage configurations, and would also uncover hidden user behavior and allow us to understand how testers perceive existing UI. Returned metrics could also give insight into how we could move signage to be in the best position for consumption.

My hypothesis on optimal signage configuration, which would be discovered with the help of eye-tracking hardware, was a first in the world of transit station design. Instead of placing signage in areas with the highest amount of passenger traffic, which on paper should return the highest levels of interaction, signage should be placed in “ocular magnets”, or where users naturally gaze as they navigate a station. These hotspots could be uncovered using eye-tracking glasses, possibly eliminating the task of information searching altogether.

A photo describing the eye-tracking research lifecycle; a participant exploring their environment, a participant capturing data, and a data analytics team synthesizing behavioral data/.

Experimental Design

I decided to run alpha test trials as a single-blind study. I was concerned about testers inadvertently interacting with signage in an unnatural manner if they understood research context and outcomes, so testers were simply instructed to navigate towards "Exit A". Testers were sourced with a number of demographic stipulations, such as having no prior exposure to the test station, further solidifying behavior performance validity.

In addition to uncovering ocular hotspots, an objective of the study was to determine specific areas of the station that testers encountered difficulty negotiating. Part of the reason why existing signage was poorly placed was due to an imperfect understanding of user journey decision points, which was based on decade-old research with little scientific reasoning. If we could identify these specific pain points, findings could inform future design projects and streamline signage retrofitting in other stations. To solve this, I turned to Python.

A graph showing the blink rate spikes exhibited by a tester as they completed various stages of the test

Beta tester blink behavior

Eliminating user report bias

The problem with concurrent thinkaloud methodology is the inherent delay in task execution roadblock verbalization; the time it takes for a participant to explain their thought process and where they're getting stuck is longer (and comes with more delay) than the actual process itself.

This lag in self-reporting roadblocks may seem negligible, but considering the test environment circumstances and proximity of stimuli, this form of data collection would generate incorrect pain point locations.

My solution to this was to determine pain points using blink rate. Rapid blinking is commonly associated with increased mental cognition and confusion, which became the flagship metric for identifying pain points along a user's journey.

Conveniently, our eye-tracking hardware contains sensors that measure the frequency at which a wearer blinks, so a quick round of beta testers and python scripting identified these cognitively difficult sections. Isolating problematic parts of the station also sped up the environment point-cloud model scan process, which are needed by testers to generate adequate motion parallax for ocular biometrics collection. While I was running unit tests on my Python script, I was also coordinating alpha tester sourcing and scheduling.  

Task Performance Visualization (Activity Heatmapping)

A 3x2 grid of heatmap overlays generated by our analytics software. Each image shows the ocular activity heatmap generated by testers in various stages of the test!

Visual activity heatmaps showing alpha tester ocular magnets

Visual activity heatmaps gave us detailed raw interaction intensities, but they weren't heavily utilized in the task flow analysis. Many of the heatmaps showed intense activity around signage, which at surface level could be seen as a good thing; we want testers to be looking at the signage, the heatmaps show testers looking at signage, so what's the big deal? Heatmaps are inherently misleading, because they show what features testers looked at, but not how they looked at the features. Additional metrics were needed.

Python data visualizations showing individual tester visual fixations through a test trial

Tester task performance metrics generated in Matplotlib

Pandas and NumPy helped me uncover three additional interaction metrics from the raw biometric data. "Feature TTFC" (time-to-first contact) quantifies how long it takes testers to make visual contact with a feature after it comes into line-of-sight, "Dwell Time" calculates the amount of time testers spend looking at a feature, and "Hit Rate" measures how many testers even made visual contact with a feature.

The combination of these three metrics allowed us to gain objective insight into granular feature performance, which gave us a better idea of how testers truly perceived and interacted with the signage.

The compiled interaction metrics blew us away. We found that nearly 40% of task-relevant signage wasn't seen by testers, despite being actively searched for. When testers did make visual contact with signage, they couldn't rapidly consume the information posted when they laid their eyes on it, spending upwards of eight seconds processing information. Signage should be consumed at a glance, much like a street sign; imagine spending eight seconds trying to figure out what street you're on while going 30mph.

Blurred tester biometrics performance visualizations

Intentionally blurred zone data for tester biometrics privacy

Eight additional zones of the station were tested, uncovering dozens of usability flaws. I presented high-level findings, along with remediation design recommendations, to leadership & design teams. Our deliverables ultimately led to the overhauling of information architecture design processes agency-wide; stations, busses, the light rail fleet, everything.

Blurred signage remediation designs, informed through research above and presented to leadership

Intentionally blurred signage design remediations - pending full media/agency release, stay tuned.

AR Integration

Considering how successful the eye-tracking study was, it had a few flaws, chief among them being the fact that our research was entirely retroactive. We were evaluating a product that had already been pushed to "market", leaving us playing design catch-up a couple years too late. What if we could incorporate this method into the station design process? Instead of being used as a reactionary tool, what if eye-tracking was built into the station design process, or an entire user trip, to proactively design stations to be easier to navigate?

To accomplish this, I plotted grid-logged retinal fixation data over a Unity (VR) POV of a BIM 360 CAD file, tracking interaction metrics with file feature IDs instead of a point-cloud scan. Not only was the time-intensive preparatory environment scan process completely eliminated, but testing trials took mere minutes instead of entire hours. I also integrated an RTA (retroactive think-aloud) into this stage of the research process, further solidifying quantitative findings.

I'll try my best to keep this page as updated as I can to reflect progress with VR testing and system integration. It seems to be doing well, considering the architecture & design teams are crazy about it, but I've been asked to keep it under wraps until my work is presented at the World Passenger Festival.

You can watch the presentation virtually, or in-person in Vienna, October 4-5, 2023.