What Is Steno Good For? Part Six: CART, Court, and Captioning

StenoKnight CART
Services: Realtime Captioning for the Deaf and
Hard of Hearing

This article was originally written for The Plover Blog.

What Is Steno Good For?

Part One: How to Speak With Your Fingers
Part Two: Writing and Coding
Part Three: The Ergonomic Argument
Part Four: Mobile and Wearable Computing
Part Five: Raw Speed

Part Six: CART, Court, and Captioning

Finally, the sixth and last installment of my What Is Steno Good For? series. The first five sections dealt with using steno in daily life, for conversation, prose composition and coding, injury prevention, typing while walking, and inputting text as efficiently as possible. Plover is being developed primarily with those five spheres in mind.

This section is different. It focuses on people who actually want to make a living as court reporters, CART providers, or captioners. It's also the category that the majority of the Plover Project's current testers, readers, and commenters belong to. In order for Plover to succeed, that proportion needs to change.

Steno as a career is skyrocketing. Official reporters (the ones who work in actual courtrooms) are facing layoffs, but in every other field -- deposition work, captioning, and CART -- there's far more demand than supply. Rates are relatively high (though down considerably from their peak in the '90s, and gradually continuing to decline) and work is plentiful. Certified realtime stenographers can make six figures a year, while setting their own schedules and maintaining autonomy as independent contractors. It's pretty much a dream job.

Steno as an academic-vocational discipline is dying. Steno schools continue to shut down across the country. The national dropout rate is 85%. Student machines cost over $1,000, and DRM-riddled student software runs about $500, so without even considering tuition, students are forced to pay a largely non-refundable $1,500 right out of the gate. Considering the 15% graduation rate and the variable length of study (which ranges from 1 to 6 years, but averages around 4 years of intensive daily practice to reach graduation speeds of 225 WPM), steno school is a fool's gamble for the vast majority of new students. Most schools are for-profit, so it's in their interest to accept large numbers of theory students, selling them their steno machines when the semester starts and buying them back at a steep markdown from the dropouts, who tend to leave around 120 WPM, just in time for the next crop of theory students to arrive. There's no incentive for schools to screen for English aptitude, physical dexterity, or self-discipline, because the students that are all but doomed to fail are potentially even more lucrative than the successful ones, due to the revolving steno machine sale-and-buyback scheme. This means plenty of profit in the short term, but in the long term it spells the death not only of these short-sighted schools, but of the steno professions themselves.

A market in which demand exceeds supply will hold out only so long. Eventually the vaccuum caused by the shortage of stenographers will collapse, and inferior but readily available substitutes such as electronic recording, undertrained voice writers, and non-verbatim notetaking systems will move in to claim the territory. Compounding the problem is that many people think that the career is less than a decade away from obsolescence; 30 years of Star Trek has put the idea into their heads that artificial intelligence is a nut we're close to cracking, and that a computer that can understand and transcribe everything we say to it is just around the corner. I've got lots and lots to say on this one, but let me just lay out the short and sweet version, and you can either take my word for it now or wait for the long argument to come later. (You might also want to read this article for some of the technical details.)

Without true artificial intelligence, there is no reliable speech recognition. Current speech recognition software works relatively well with good audio, clear speakers, and a somewhat restricted vocabulary. Dictation at 160 WPM or less can give good results, especially if the speaker puts in the effort to train themselves and their software, and providing that they have the luxury to stop the dictation and correct any errors made by the software before continuing on. In real-life situations, where the speaker being transcribed can't be induced to slow down, correct errors, or enunciate perfectly in American-accented English -- even with an intermediary "respeaker" repeating the dictation directly into a microphone, inserting punctuation, and correcting errors on the fly -- the software's verbatim realtime accuracy is significantly below that of a trained stenographer. The only respeakers that even approach the accuracy of realtime steno are true voice writers, who spend thousands of hours training their voices, figuring out ways to differentiate the pronunciation of homophones, and creating macros to resolve mistranscriptions. It is not easy to do. I compare true voice writing to beatboxing and steno to playing a drumset in my article Voice Captioning Versus CART. You can read it if you're interested in that sort of thing.

The trouble is that everyone keeps saying "Voice recognition software is constantly improving. It gets better with every new release. Soon it'll be perfect." The first two statements are correct. The third is a fallacy. The software is improving, but asymptotically. Its theoretical ceiling of improvement is far below what's required for consistent, reliable transcription. Speech recognition software doesn't parse language the way humans do. It has no ability to use context or meaning to change sounds into words. It records audio waveforms, breaks them up into little bits, and compares them to a database of other audio waveforms. It never finds a perfect match, because no two humans say the same word in exactly the same way each time. Instead, it tries to choose the closest match in its database of thousands of other tiny fragments of audio. All speech recognition software relies on probability-based algorithms to guess at what's being said. This means that the more common the phrase, the more variants of it will be found in the database, and the more likely it will be to be correctly transcribed.

But the converse is also true. In the architecture class I provide CART for, the phrase "sum of the forces" comes up several dozen times a week. But because the phrase "some of the" is so much more common in normal speech than "sum of the", the VR software would mistranscribe it unless the voice writer figured out a way to say "sum" that sounded completely different from the word "some" and defined it as a custom waveform. There are scads of these soundalike words and phrases in the English language, and the voice writer is at a disadvantage when trying to distinguish them. The steno writer has a number of options to resolve homophone conflicts or to compress a wordy phrase into a single stroke. They can add the asterisk, they can alter the vowels, or they can take a cue from the way the word is spelled. It's much harder for a voice writer to find an alternative way to pronounce a word or syllable, because not only must they pronounce it consistently so that the computer can recognize it each time, but it also can't sound like any other words or syllables that they might be called upon to speak. It's much easier to write a memorable nonsense syllable on the steno keyboard than it ever would be to speak it.

There's also the inherent uncertainty involved in decoding analog speech with a digital algorithm. Even with good amplification, the signal is always lossy to some extent, and the speech processing algorithms are essentially a black box that weigh relative probabilities and then just spit out the most likely one, without being able to incorporate any semantic or contextual calculations. The voice writer is never quite sure what the machine is going to make out of what they said, and no matter how cleanly they speak, they're forced to build in a lot more error correction time into their transcription process. Steno writers can write a word in half a second that took the speaker three seconds to say, and they know with certainty what will come up on the screen when they hit a particular chord. That's an advantage a voice writer will never have. Add in that a voice writer has to speak at the same time that they're trying to listen, and you see some of the difficulties they labor under.

There are some excellent voice writers out there, and I don't want to devalue their talent or the enormous amount of training that goes into the process of achieving accurate verbatim realtime using VR software. On the contrary; I think if people realized how much work it takes to do the job properly with the voice, they might balk a lot less at the idea of learning to do it with their fingers. Unfortunately, the shortage of CART providers, captioners, and court reporters has led to a widespread practice of companies hiring untrained voice writers, deciding that their output is good enough, and dropping both standards and wages accordingly. It's a sad situation.

Because voice recognition is perceived to be so much easier than it really is, and because learning it only requires about $200, a microphone, and a computer, it's much easier to find people willing to give it a chance. After all, if it doesn't live up to their expectations, they're only out $200, rather than the $1,500 albatross steno school dropouts find themselves trying to unload. Imagine if computer programming required a special computer that couldn't connect to the internet or run games or do anything else except write computer software, and that it sold for $1,500. What do you think the state of software development would look like? Maybe some rich kids' parents would buy them the machine, but they'd probably prefer that they become doctors or lawyers than programmers, which is a lot of work for not much prestige. Poor kids would be completely out of luck. Middle class kids might think that programming sounded fun, but they'd probably decide it wasn't worth the restrictive entry cost. Some few people might decide that programming was their best shot at making a good living, so they'd scrimp and save and take out loans to buy the special programming computer plus the lessons to go with it. And after all that, what if they didn't like programming? What if they didn't have an aptitude for it? They were out $1,500 and a lot of wasted effort. What kind of smart, inquisitive, curious kid would make that kind of gamble? What would the field of computer programming look like if this were the only way to write software?

It's the state of steno today, and I'm worried that if it goes on for much longer, the discipline will die out altogether. The only way we can build the next generation of realtime reporters, captioners, and CART providers is if we get people using steno for all sorts of purposes -- not just the ones that will make them an immediate profit. Once there's a pool of amateurs and enthusiasts all using steno in their daily lives, it will be evident how useful it can be and how outdated the qwerty interface has become. Kids will start learning it in their typing classes. Companies will start selling steno machines (hopefully ultra-portable ones!) at consumer prices. People who would feel awkward talking to themselves in public via VR software will embrace steno as the most efficient way to put their thoughts into words.

All of this holds true even if they're only writing at 120 words per minute. It took me a year and a half to graduate from steno school. In that time, I noticed that most of my fellow students dropped out when they were writing between 120 and 225 words per minute. Relatively few of them dropped out before their third semester. They would make fairly steady progress through theory and up to 120 WPM, then plateau. It seems that nearly anyone can get up to 100 WPM or so in less than six months, but that closing the gap between 100 and 200 seems to take much more work. You don't need to write at 225 WPM to reap the advantages of steno. Even 120 WPM is double the average qwerty typing speed, and steno has significant ergonomic benefits as well. Users can overtake their qwerty speed within the first few months of use, then gradually work their way up to higher speeds while using steno to perform their daily tasks, rather than spending 10 hours a week in grueling, boring dictation classes.

Inevitably, some of these people will find they have both a passion and a talent for steno. They'll push themselves to go faster and faster, and eventually they'll arrive at court/CART/captioning speeds. Much like programmers do today, they'll start out tinkering around with the free software, discover a passion and an aptitude for the system, possibly spend some time in a formal program polishing their technique, and discover one day that they're skilled enough to take paying work. These people are the future of our profession, and right now they hardly know it exists. The only way people will bother to learn steno is if the software is free, the steno machine costs less than $100, and the lessons are available online. The Plover Project is an attempt to meet those goals, and to secure the future of the work that I love.

Home Resume About Demo FAQ Contact Experience Testimonials Articles Blog