designer. thinker. programmer.

(On-Going) Speech-To-Text Interface for Theater



Theater Speech Recognition


Project Goal

To design and develop a speech-recognition application, to more naturally engage digital media in theater performances.

My Role

Research, design and develop the software interface


Javascript, HTML, CSS, Google Speech API, Sketch, Illustrator, Madmapper


Myself (Thesis project at NYU ITP)


Thesis Statement is a software interface that allows stage projection mapping to change according to the actors' speech — on stage, in real time.

It integrates the stage design with live speech, and by doing so, gives new creative options to projection designers, and empowers performers to act more freely and naturally on stage.


The Problem:

In theater productions, the stage is often configured by designers early on, before the performers arrive to rehearse.

This can cause frictions during rehearsals, and performers have to tailor their acting to match the timing of the scenic transitions.


The Solution:

An interface that uses Speech-to-Text API to tracking and interpreting performers’ live speech.

Special parts of the play script can be linked with particular visual effects or scene transitions, and these changes will only happen at the moment the performers speak that particular sentence or word.



While there are many creative software applications for audio and visual art, not so much are done when it comes to the theater. Indeed, theaters will lose their audience to other multimedia entertainments if they don’t effectively embrace and leverage the power of digital technology.

Beginning in the early 1990s, videos have been used in theater performances - but not very often. Through my research and interview with theater designer, I found that there are two main difficulties when adding video projection in theater design: the high financial cost of equipments and technical support to professionally implement projection, the difficulty of timing the video content to ensure that they are injected at the “right moment” (often by using human labor), and corresponding to the plot and actors’ expressions.

By building on top of Natural Language Processing technology, Speakspeare aims to solve the second problem by synchronizing live speech with digital media output, and hopefully by automating scenic control and reducing human labor, lower the cost of projection design.


*More details coming soon