What Is Steno Good For?
Part
One: How to Speak With Your Fingers
Part
Two: Writing and Coding
Part
Three: The Ergonomic Argument
Part
Four: Mobile and Wearable Computing
Part
Five: Raw Speed
Part Six: CART, Court, and Captioning
Finally, the sixth and last installment of my
What
Is Steno Good For? series. The first five sections dealt
with using steno in daily life, for
conversation,
prose
composition and coding,
injury
prevention,
typing
while walking, and
inputting
text as efficiently as possible. Plover is being
developed primarily with those five spheres in mind.
This section is different. It focuses on people who actually want to
make a living as
court
reporters,
CART
providers, or
captioners.
It's also the category that the majority of the Plover Project's
current testers, readers, and commenters belong to. In order for
Plover to succeed, that proportion needs to change.
Steno as a career is skyrocketing. Official reporters (the ones who
work in actual courtrooms) are facing layoffs, but in every other
field -- deposition work, captioning, and CART -- there's far more
demand than supply. Rates are relatively high (though down
considerably from their peak in the '90s, and gradually continuing to
decline) and work is plentiful. Certified realtime stenographers can
make six figures a year, while setting their own schedules and
maintaining autonomy as independent contractors. It's pretty much a
dream job.
Steno as an academic-vocational discipline is dying. Steno schools
continue to shut down across the country. The national dropout rate is
85%. Student machines cost over $1,000, and DRM-riddled student
software runs about $500, so without even considering tuition,
students are forced to pay a largely non-refundable $1,500 right out
of the gate. Considering the 15% graduation rate and the variable
length of study (which ranges from 1 to 6 years, but averages around 4
years of intensive daily practice to reach graduation speeds of 225
WPM), steno school is a fool's gamble for the vast majority of new
students. Most schools are for-profit, so it's in their interest to
accept large numbers of theory students, selling them their steno
machines when the semester starts and buying them back at a steep
markdown from the dropouts, who tend to leave around 120 WPM, just in
time for the next crop of theory students to arrive. There's no
incentive for schools to screen for English aptitude, physical
dexterity, or self-discipline, because the students that are all but
doomed to fail are potentially even more lucrative than the successful
ones, due to the revolving steno machine sale-and-buyback scheme. This
means plenty of profit in the short term, but in the long term it
spells the death not only of these short-sighted schools, but of the
steno professions themselves.
A market in which demand exceeds supply will hold out only so long.
Eventually the vaccuum caused by the shortage of stenographers will
collapse, and inferior but readily available substitutes such as
electronic recording, undertrained voice writers, and non-verbatim
notetaking systems will move in to claim the territory. Compounding
the problem is that many people think that the career is less than a
decade away from obsolescence; 30 years of Star Trek has put the idea
into their heads that artificial intelligence is a nut we're close to
cracking, and that a computer that can understand and transcribe
everything we say to it is just around the corner. I've got lots and
lots to say on this one, but let me just lay out the short and sweet
version, and you can either take my word for it now or wait for the
long argument to come later. (You might also want to read
this
article for some of the technical details.)
Without true artificial intelligence, there is no reliable speech
recognition. Current speech recognition software works relatively well
with good audio, clear speakers, and a somewhat restricted
vocabulary. Dictation at 160 WPM or less can give good results,
especially if the speaker puts in the effort to train themselves and
their software, and providing that they have the luxury to stop the
dictation and correct any errors made by the software before
continuing on. In real-life situations, where the speaker being
transcribed can't be induced to slow down, correct errors, or
enunciate perfectly in American-accented English -- even with an
intermediary "respeaker" repeating the dictation directly into a
microphone, inserting punctuation, and correcting errors on the fly --
the software's verbatim realtime accuracy is significantly below that
of a trained stenographer. The only respeakers that even approach the
accuracy of realtime steno are true voice writers, who spend thousands
of hours training their voices, figuring out ways to differentiate the
pronunciation of homophones, and creating macros to resolve
mistranscriptions. It is not easy to do. I compare true voice writing
to
beatboxing
and steno to playing a drumset in my article
Voice
Captioning Versus CART. You can read it if you're
interested in that sort of thing.
The trouble is that everyone keeps saying "Voice recognition software
is constantly improving. It gets better with every new release. Soon
it'll be perfect." The first two statements are correct. The third is
a fallacy. The software is improving, but
asymptotically.
Its theoretical ceiling of improvement is far below what's required
for consistent, reliable transcription. Speech recognition software
doesn't parse language the way humans do. It has no ability to use
context or meaning to change sounds into words. It records audio
waveforms,
breaks them up into little bits, and compares them to a database of
other audio waveforms. It never finds a perfect match, because no two
humans say the same word in exactly the same way each time. Instead,
it tries to choose the closest match in its database of thousands of
other tiny fragments of audio. All speech recognition software relies
on probability-based algorithms to guess at what's being said. This
means that the more common the phrase, the more variants of it will be
found in the database, and the more likely it will be to be correctly
transcribed.
But the converse is also true. In the architecture class I provide
CART for, the phrase "sum of the forces" comes up several dozen times
a week. But because the phrase "some of the" is so much more common in
normal speech than "sum of the", the VR software would mistranscribe
it unless the voice writer figured out a way to say "sum" that sounded
completely different from the word "some" and defined it as a custom
waveform. There are
scads
of these soundalike words and phrases in the English
language, and the voice writer is at a disadvantage when
trying to distinguish them. The steno writer has a number of options
to resolve homophone conflicts or to compress a wordy phrase into a
single stroke. They can add the asterisk, they can alter the vowels,
or they can take a cue from the way the word is spelled. It's much
harder for a voice writer to find an alternative way to pronounce a
word or syllable, because not only must they pronounce it consistently
so that the computer can recognize it each time, but it also can't
sound like any other words or syllables that they might be called upon
to speak. It's much easier to write a memorable nonsense syllable on
the steno keyboard than it ever would be to speak it.
There's also the inherent uncertainty involved in decoding analog
speech with a digital algorithm. Even with good amplification, the
signal is always lossy to some extent, and the speech processing
algorithms are essentially a black box that weigh relative
probabilities and then just spit out the most likely one, without
being able to incorporate any semantic or contextual calculations. The
voice writer is never quite sure what the machine is going to make out
of what they said, and no matter how cleanly they speak, they're
forced to build in a lot more error correction time into their
transcription process. Steno writers can write a word in half a second
that took the speaker three seconds to say, and they know with
certainty what will come up on the screen when they hit a particular
chord. That's an advantage a voice writer will never have. Add in that
a voice writer has to speak at the same time that they're trying to
listen, and you see some of the difficulties they labor under.
There are some excellent voice writers out there, and I don't want to
devalue their talent or the enormous amount of training that goes into
the process of achieving accurate verbatim realtime using VR software.
On the contrary; I think if people realized how much work it takes to
do the job properly with the voice, they might balk a lot less at the
idea of learning to do it with their fingers. Unfortunately, the
shortage of CART providers, captioners, and court reporters has led
to a widespread practice of companies hiring untrained voice writers,
deciding that their output is good enough, and dropping both standards
and wages accordingly. It's a sad situation.
Because voice recognition is perceived to be so much easier than it
really is, and because learning it only requires about $200, a
microphone, and a computer, it's much easier to find people willing to
give it a chance. After all, if it doesn't live up to their
expectations, they're only out $200, rather than the $1,500 albatross
steno school dropouts find themselves trying to unload. Imagine if
computer programming required a special computer that couldn't connect
to the internet or run games or do anything else except write computer
software, and that it sold for $1,500. What do you think the state of
software development would look like? Maybe some rich kids' parents
would buy them the machine, but they'd probably prefer that they
become doctors or lawyers than programmers, which is a lot of work for
not much prestige. Poor kids would be completely out of luck. Middle
class kids might think that programming sounded fun, but they'd
probably decide it wasn't worth the restrictive entry cost. Some few
people might decide that programming was their best shot at making a
good living, so they'd scrimp and save and take out loans to buy the
special programming computer plus the lessons to go with it. And after
all that, what if they didn't like programming? What if they didn't
have an aptitude for it? They were out $1,500 and a lot of wasted
effort. What kind of smart, inquisitive, curious kid would make that
kind of gamble? What would the field of computer programming look like
if this were the only way to write software?
It's the state of steno today, and I'm worried that if it goes on for
much longer, the discipline will die out altogether. The only way we
can build the next generation of realtime reporters, captioners, and
CART providers is if we get people using steno for all sorts of
purposes -- not just the ones that will make them an immediate profit.
Once there's a pool of amateurs and enthusiasts all using steno in
their daily lives, it will be evident how useful it can be and how
outdated the qwerty interface has become. Kids will start learning it
in their typing classes. Companies will start selling steno machines
(hopefully
ultra-portable
ones!) at consumer prices. People who would feel awkward
talking to themselves in public via VR software will embrace steno as
the most efficient way to put their thoughts into words.
All of this holds true even if they're only writing at 120 words per
minute. It took me a year and a half to graduate from steno school. In
that time, I noticed that most of my fellow students dropped out when
they were writing between 120 and 225 words per minute. Relatively few
of them dropped out before their third semester. They would make
fairly steady progress through theory and up to 120 WPM, then plateau.
It seems that nearly anyone can get up to 100 WPM or so in less than
six months, but that closing the gap between 100 and 200 seems to take
much more work. You don't need to write at 225 WPM to reap the
advantages of steno. Even 120 WPM is double the average qwerty typing
speed, and steno has significant ergonomic benefits as well. Users can
overtake their qwerty speed within the first few months of use, then
gradually work their way up to higher speeds while using steno to
perform their daily tasks, rather than spending 10 hours a week in
grueling, boring dictation classes.
Inevitably, some of these people will find they have both a passion
and a talent for steno. They'll push themselves to go faster and
faster, and eventually they'll arrive at court/CART/captioning speeds.
Much like programmers do today, they'll start out tinkering around
with the free software, discover a passion and an aptitude for the
system, possibly spend some time in a formal program polishing their
technique, and discover one day that they're skilled enough to take
paying work. These people are the future of our profession, and right
now they hardly know it exists. The only way people will bother to
learn steno is if
the
software is free,
the
steno machine costs less than $100, and
the
lessons are available online.
The Plover
Project is an attempt to meet those goals, and to secure
the future of the work that I love.