Harold Rodriguez - Research Home


 

What is Verbal Tracking and Visualization?

Verbal Tracking and Visualization (VTV) is a project designed to extract and visualize information in speech using Digital Signal Programming (DSP) and graphics hardware.







































Description:

Using the FMOD sound library and OpenGL, audio visualization and analysis is explored. Particularly of interest is the visualization of differences in spoken dialogue between a user in one scenario and the same user in a different scenario.

For example, does a user speak to one type of person differently than another? Using a Digital Signal Processing (DSP) approach, a proven variety of influential vocal metrics were calculated and visualized.

A user study was conducted utilizing the software. A human and a virtual human (VH) was used to explore the impact of H-VH interaction. The goal is to determine if conversing with a VH can elicit detectable and systematic vocal changes. To study this topic, we examined the H-VH scenario of pharmacy students speaking with immersive VHs playing the role of patients. The audio analysis focused on the students' reaction to scripted empathetic challenges. Empathetic challenges are VH-initiated dialogue designed to generate an unrehearsed affective response from the human. The analysis showed that, although some of the vocal changes are undetectable by the human ear:

Changes in vocal tone occur when speaking with VH

The changes were detectable by DSP

The changes were consistent across participants groups.

Further, these changes are correlated with known H-H conversation patterns.

 

Virtual Experiences Research Group Mission Statement:

The mission of the Virtual Experiences Research Group (VERG) at the University of Florida is to develop highly immersive virtual human interactions.

Team Members:

Benjamin Lok, University of Florida, College of Information Sciences and Engineering

Harold Rodriguez, Graduate Researcher


Image Captions
(left, from top to bottom):

Image 1: Two signals (green, blue) are used to produce a third (envelope: red).

Image 2: Spectrum analysis performed on a speech signal over time.

Image 3: A plot of the average power of a speech signal at a certain time.

Image 4: Features like time-scaling and signal processing filters are applied in real-time.

 

Harold Rodriguez - Research Home