The live video feed is pulled from the device camera(s) using the Javascript getUserMedia()
API.
This displays a live representation of the amplitude of the incoming audio from the device microphone. When the amplitude of ambient noise exceeds the pre-set threshold, our algorithm switches from voice-to-text recognition to lip reading technology.
An individual frame is extracted from the webcam feed twenty-four times a second. This is then compressed, encoded, and sent via a Websockets connection for processing on the server.