StenoKnight CART Services: Realtime Captioning
		for the Deaf and Hard of Hearing
This article was originally written for The ATHEN ListServ in February, 2009, in response to an Assistive Technology director who was considering a voice-recognition captioning service for his Deaf/HoH students. It has been revised and expanded for this page.

Voice Captioning Versus CART

Full disclosure: I'm a stenographic CART provider, so I'm in direct competition with this sort of service, which sells a "voice captioning system", consisting of voice recognition software, a computer, a voicemask microphone, and the promise that it will only take a few hours of training for an employee of the college to achieve realtime speed and accuracy.

That said, I really recommend you get a live demonstration of this technology before you buy it. I work for a university whose disability accommodations department bought a system very like this one; I don't know whether or not it was exactly the same. I do know that it was a voice captioning system, that the disabilities coordinator hired ASL interpreters to train with it and use it (the student used hearing aids and her first language was English, but she understood ASL, and the school had been offering her interpreters because they had been unable to find a CART provider. She had gone through undergraduate school using remote stenographic CART, but told me that she preferred onsite CART to both remote CART and onsite ASL interpretation). It was a disaster for the student. The transcripts were apparently unreadable, and the captioning system (which, I got the impression, nearly broke the disability accommodations department's budget for the semester) is currently moldering in a cupboard. I began providing CART there immediately afterwards, and this fall I'll be starting my fourth year with them.

UPDATE, JUNE 2010: I recently asked the school's disability services director her specific reasons for switching from the voice captioning system to CART, and she emailed me the following:

"1) The Captioning training software was difficult to access and use, and the training itself proved difficult and laborious.

2) A prospective captioner would have to put in literally hundreds of hours of training just for the system to accurately recognize their voice.

3) The process proved to be disruptive to other students in the classroom.

4) One captioner felt that the microphone setup was claustrophic.

5) I was having sign language interpreters trained on the captioning system. At first the interpreters were eager to learn a new technology, but as the training droned on and on, and when they used it in the classroom and it was glaringly apparent that students didn't derive the same level of cogent, expressive interpreting they'd had with interpreters, the interpreters felt they were doing the students a grave injustice. They all hated it and felt either they should be left to do real sign language interpreting or that CART would be a much better alternative.

6) The students felt the resulting transcript contained too many errors and the lag time was too long.

7) The end result is that I have a $5000 Captioning System just sitting in my storage room collecting dust."


-- Mai McDonald, Pratt Institute Disability Services. Posted with permission.

Computerized transcription is a fantastic technology, and it's particularly useful for those who can speak easily but find typing difficult. When someone is composing off the top of their head, they generally tend to speak at a slower rate than their ordinary conversational speed, and if they see that the program has made a mistake, they can easily stop and correct it. It's a very different situation when it comes to realtime transcription of someone else's speech, particularly in a university environment. It is possible to produce an accurate transcript using only a microphone and a computer with sufficiently advanced software and processing power, but it is by no means a simple matter.

A distinction should be made here between speech recognition and voice recognition. Speech recognition is completely automated. A computer picks up natural live speech from any speaker through a microphone and instantly translates it to text without any human intermediary. In practice, speech recognition is unsuited for the task of live transcription. It can be effective in distinguishing from a small number of options or commands (such as the automated voice menus that have replaced "press one for yes or two for no" in some commercial telephone systems), but it is unable to distinguish between speakers or insert punctuation, and its accuracy is wildly variable, ranging from not very good to abysmal.

Voice writing, on the other hand, employs voice recognition, requiring a period of training in which the computer learns the individual patterns of a speaker's voice, while the speaker learns how to make their speech consistent enough to be reliably transcribed by the computer. There are several excellent voicewriters working today who are able to provide verbatim transcription, but they've put in thousands of hours training their voice, their software, and their transcription theory (coming up with different ways to pronounce homophones, for instance, since artificial technology is very far from solving the "their/they're/there" problem.) It's probably more difficult to find a truly verbatim voicewriter than it is to find a truly verbatim stenographic CART provider. They're quite rare, and they charge equivalent prices to stenographic CART. Unfortunately, a majority of voicewriting services currently serving universities position themselves as economical alternatives to CART, and they generally save that money by hiring voicewriters with insufficient training to deliver an accurate verbatim transcript.

This can seem counterintuitive. Most people, after all, find speaking easier than typing, and so they assume that voice writing should naturally be easier than realtime stenography. The trouble is that, in ordinary conversation, the translation engine is the human brain, which is vastly better than a computer is at compensating for inconsistent pronunciation, muffled speech, accents and dialects, homonyms, rate variation, and unfamiliar vocabulary. By drawing on context, outside knowledge, speech-reading, and memories of previous conversations, humans are able to compensate for an extraordinary number of gaps and errors which computers simply cannot parse. In order to transcribe speech accurately, a computer needs input that is even, regular, perfectly articulated, cleanly delineated, and without potential duplicates or ambiguities in pronunciation.

I sometimes like to say that voicewriters are to stenotypists as beatboxers are to drummers. Without training, it's easier to say "paradiddle" once than it is to beat one on a drum set, but try saying "paradiddle pataflafla flam dragadiddle ratamacue" ten times fast without tripping over your tongue. It usually takes a drummer only a few years of solid practice to play at a competent level. Good beatboxers, on the other hand, have to be fiendishly talented and practice fiendishly hard, because the human voice is not easily able to imitate the complex percussive rhythms which come naturally to drummers. The human hand is generally a more accurate instrument than the human voice for swift, repetitive motions, which are precisely the sorts of signals a computer needs to receive in order to deliver consistent output. This is why, despite the wide availability of voice recognition software and the relative scarcity and cost of stenographic technology, the majority of realtime court reporters and CART providers working today are stenographers. Realtime verbatim voice writing is tremendously difficult to achieve; consequently, it's rare to find a voice writer capable of providing the level of service that qualified CART providers take for granted.

Putting all that aside, even assuming the voice captioning service provides highly trained, highly accurate voice writers, another big disadvantage of a voicewriter versus a CART provider is that sound inevitably bleeds through the mask they use to cover their computer's microphone. Though the mask muffles the sound to some degree, it can still be distracting in an academic environment. Some services propose to solve this by offering remote captioning. Remote captioning, whether provided by stenographers or voicewriters, is a good solution when no one can be found to provide onsite services, but it has several drawbacks.

For one, remote captioners can't read terminology written on Powerpoint slides or speak up to clarify unclear phrases at the student's request, and they can't voice for students who don't use speech but who want to contribute to class discussions. I'm currently doing some work for a university that hired a remote stenographic CART provider to caption some highly technical classes for a Pharmacy student. One of the professors had a thick accent, spoke very quietly, used complex biochemical terminology, and taught in a classroom with thick concrete walls. I read some of the transcripts, and the captioner was clearly quite skillful, but every third word was (inaudible); the Skype reception kept cutting out, the professor's accent made his speech very hard to catch without watching his mouth, and none of the extensive information displayed on the screen or in paper handouts was available for the remote captioner to use. Three weeks into the semester, I was called in to provide CART onsite, and the following semester the student requested me for all of his lecture classes.

Another disadvantage of remote captioning, whether stenographic or voice-based, is that only the person wearing the microphone will be transcribed. This means that questions from students, unless the professor repeats them before answering, will be marked as inaudible, and the student might feel left out of the discussion. I'm certainly not opposed to working remotely on principle. I own remote captioning software, and I'm more than willing to provide it in cases where an onsite CART provider is not available; but, by and large, onsite CART is more likely to ensure that realtime output is accurate and complete. Sometimes services claim that remote captioning is preferable for students who might not like their classmates to know that they're receiving realtime, and who might feel awkward with a captioner sitting next to them. In that case, a good solution is to find a captioner or CART provider with two computers that transmit text wirelessly back and forth. In situations where my clients have told me that they'd prefer to sit on their own rather than reading from my laptop, I set up my equipment in the back of the room, give them a computer connected via Bluetooth, and let them sit wherever they like.

I know CART providers are hard to find and sometimes can seem prohibitively expensive, but very often speech-to-text companies claim a lot more than they're able to deliver, and it can severely affect the quality of a student's access. It's worth being a little cautious before laying out a lot of cash on something that might not be as good as it sounds, which is why I advise prospective clients to ask for a demonstration of various technologies before settling on a provider. I'm always happy to provide demonstrations of CART on request, free of charge.