What is Remote Source Redundancy (RSR)?
Any transcriptionist who has worked from a digital recording will tell you that transcribing through crosstalk scenarios can be difficult, if not impossible, to discern what is being said. What exactly is a crosstalk scenario in the context of a digital recording? The most common instances are when two or more people are speaking at the same time. However, this can also include someone coughing at the exact moment a word is spoken, construction noises from next door, a phone alert, even a squeaky chair.
These extraneous noises that occur at the wrong moments, often during important testimony, represent one of the fundamental barriers with adoption of digital reporting or recording. The principles of Auditory Scene Analysis detail how two competing speech sound waves, when overlapped, become incomprehensible for the human mind to understand.
This barrier is especially salient for adoption of remote depositions. When configuring a deposition recording with everyone in the same room, standard practice is to setup dedicated microphones connected to a multi-channel mixer. Remotely connected deposition configurations have not had a solution for isolating channels in a similar fashion to multi-channel mixer technology. That is until today.
Remote source redundancy (RSR) is a tool that isolates the audio of remote participants enabling the production of a higher quality transcript. How does it work?
- Remote streams first produce a composite video with all the feeds in one video file.
- While the composite video is being generated, isolated audio clips are captured and recorded separately from the remote feeds, creating a redundant recording for remote streams.
- Isolated audio clips are transcribed using speech-to-text.
- Text lines and isolated audio clips are synced to the composite video recording.
- Transcriptionist can quickly search and play isolated audio clips during crosstalk scenarios.
Here’s a clip of it in action:
How do transcriptionists use remote source redundancy (RSR)?
Many court transcriptionists who work with digital playback software will use synced notes generated by a digital reporter to find a specific location of the video to begin transcribing. A problem that occurs after taking a break is that unless the exact second was written down, finding the desired point in the video can be time consuming. With RSR, a Speech-to-text rough is generated that orients transcriptionists to the point in the proceeding, saving time by making it easy to pick up where the work was left off.
With the ability to “search the video” for an exact point of a long recording, this saves time with getting started on the day’s project. For example, let’s say the previous day’s work was ended during the introduction of Exhibit 6. Simply type “Exhibit 6” and the playback will begin at that point in the recording.
Now that the transcriptionist has been oriented with the video, the new day’s work can begin. For most of the duration of the testimony, using the composite video playback to complete the transcript is sufficient. However, a scenario occurs that throws a monkey wrench into the quality standards that are expected. The deposer throws out a question, and the witness’s response was given at the exact same moment as an opposing counsel’s objection. Not to worry. The transcriptionist need only find the ScriptSync lines using the search function, then play back the isolated audio clips to discern what each person said. The witness said, “I don’t know” and their counsel interjected with “objection to form.” The transcript reads clean verbatim without an inaudible.
Currently, this technology is optimized for fully remote depositions with the use of headphones, as computers will kick in echo cancellation if two people are speaking at the same time for too long. However, for short crosstalk interactions, RSR will work without the use of headphones. In 95% of real-world scenarios, two people are not actively trying to speak over each other at the same time, that is, unless the proceedings get particularly heated.
In a few years’ time this technology may even be possible for same room depositions. Advanced array microphones today, such as the Amazon Echo, can “focus” on active speakers in the same room. It may be possible to train these microphones to produce multichannel isolation in a similar mechanism to RSR for remote participants. The giant tech companies are investing big money into a problem called “speaker diarization” that will identify different speakers located in the same room based on the wave pattern of their voice. Advancements such as this will change the game for digital recording solutions and simplify same room digital capture configurations.
Given the unique advancement of remote source redundancy on the deposition experience, the same sophistication applied to multi-channel sound mixers can be applied to remote depositions. With true isolation of crosstalk scenarios no longer a concern for legal transcriptionists, the highest quality of verbatim record can be achieved for remotely conducted depositions. Tools like RSR demonstrate how capture software that is built with transcriptionists in mind help overcome barriers, and move the industry forward with practical solutions.