Speak To Me, Janet
In an era when responsiveness to market needs defines one’s competitiveness, what’s a handy tool for slashing turnaround time? How about computers that respond to voice command?
November 1 1989 by Chief Executive
Like the now ubiquitous facsimile machine, voice recognition is a technology that’s been long and slow in coming. At the peak of her career as a diva, Beverly Sills used to say she spent 20 years preparing for her “overnight success.” During the long gestation period, people tend to dismiss early devices as interesting proof-of-concept machines, but little more. Only after tinkering with them do applications drive demand for their use. At the vanguard of speech recognition (SR) are Janet and Jim Baker, president and chairman respectively, of Dragon Systems, a small
The husband and wife team are not the typical “garage start-up” entrepreneurs. (“It’s too cold to start anything in a garage in
The Bakers pursued their ideas at IBM for five-and-a-half years and worked with some of its top people, like Cornell’s Fred Jelinek. But disagreements arose. The Bakers were operationally oriented and wanted to get a workable machine into the hands of users even if it wasn’t absolutely perfect. In the late 1970s, they left to join Exxon Enterprises’ Verbex division, the oil giant’s ill fated bid to jump into office automation. Some machines were produced but at unaffordable price tags of $30,000 to $100,000. In 1982, Verbex effectively killed long-term projects such as those to which Janet and Jim were committed. They quit.One year later, with no job, no marketing plans and no venture capital behind them, they went out on their own. With little more than a handful of Apple II PCs, they produced a 30,000-word dictation system that recognized 15 to 60 words per minute. They called their operation Dragon Systems owing to their mutual fondness for Oriental art and that in the East, the dragon is a symbol of life, vitality and auspicious beginnings. (The mythical beast depicted in their corporate logo is a five-clawed imperial dragon.)Both IBM and AT&T have had working prototypes of computers that respond instantly to voice command. The Voice Works Speech Processor devised by Kurzweil Applied Intelligence has been $15 million and almost 15 years in the making. Each can do impressive things, but most depend on massive processing power and are often speaker dependent. Dragon Systems, on the other hand, avoids the traditional artificial intelligence (AI) techniques in favor of rudimentary statistics to determine probability of sequences of words given the context and acoustic inputs. What’s more, it learns as you use it. “It’s easy to use and understand,” says technology writer Mickey Williamson of one of the early Dragon Dictate units. “When I programmed a board with a male speaker it was almost speaker independent then.”
With no sales staff, the Bakers initially licensed the technology to original equipment manufacturers (OEMs) such as Apricot and IBM. Later it produced its own units, selling in small volume to distributors and resellers. All growth has been self-funded. Janet Baker doesn’t think voice recognition will replace keyboard inputting necessarily, but for tasks that require someone’s hands and eyes to be doing other things, such as inspection or materials handling a computer that can accurately document and type what one says is an ideal time saver. (Even CEOs who may not like fussing with keyboards may find it to their advantage.)
Beyond inspection and inventory control, some of the far-reaching applications will be tailored to customer service. Thomas J Martin, vice president of Arthur D. Little’s AI applications center, in
The technology isn’t quite there yet. Dragon’s Voicescribe and Dictate depend on discretely spoken speech. The goal of processing continuous speech, long held to be the Holy Grail of AI, is at least three years away, thinks Baker. Others say it’s more like a decade away. Rensselaer Polytechnic Institute president and former GE R&D chief Roland Schmidt thinks SR technology has reached its first major hurdle and may ultimately become the facsimile of the 1990s once the details are worked out “Over the next 10 years, speech recognition will dominate keyboard and mouse input,” says Arthur Andersen & Co. partner Joe Carter, who heads its Center for Applied AI.
Editor JP. Donlon recently spoke to Janet Baker at the firm’s
What are some of the current applications of speech recognition in business that you see as the bellwether for the future?
The applications fall into three primary areas: one is what I call command control; that is, you can activate your telephone, you can turn things on or off. The second application is in data entry retrieval, which is probably the single largest area of voice recognition being used in the business community. It even includes accessing Lotus
The third important area, one where we think most of the action is likely to be down the road, is in report generation and document creation.
We see more reporting happening now than ever before. We also have a situation where there are growing labor shortages, especially among the secretarial and clerical levels. Whether people like it or not, they have to become more self-sufficient.
Do you envision CEOs ever using this technology in a significant way?
Sure. The needs are greatest, though, among middle management. It is the middle-management ranks who are sharing one secretary among 10 or 15 people. It is the CEO, however, who makes the decision about how the company can make the best use of the available resources. One of the principal advantages of speech recognition is that it helps reduce turnaround time.
What about tasks? Are we at the point where we could do what the computer in the film 2001 does?
The computer in 2001 also had a personality. That would not only incorporate speech recognition but also what is called natural language understanding. We have to go beyond that, to having these personality or behavioral characteristics built into the program.
Who is using this technology?
Another example is at companies where a person is physically manipulating an object and needs to enter data at the same time. We have found this situation in installations where there are toxic materials being handled, and where the person’s hands may not be free for typing or recording information.
Can you cite the experience of a company that gained in terms of productivity or in reduced costs from the use of speech recognition?
Certainly. Xerox Corporation. In a nationwide inventory program using our equipment, they had their people counting inventory by voice. They had a tremendous return on savings. They went from being able to do 10 to 15 percent of a sample audit in six months, to being able to do a 100 percent inventory audit-2.2 million parts in up to 15,000 locations-in two months. They did almost all of it by voice.
How long would it have taken if they had done it the old way?
They couldn’t have done the job the old way.
Have users, yours or anyone else’s, come up with applications or benefits that you had not thought of?
Yes. We have seen some spectacular applications come about simply because our customers had a new tool placed in their hands that they then started to use in a unique fashion. For example, McCormack & Dodge, who supply financial software that runs primarily on mainframes, had speech recognition installed in their demonstration sites throughout the world. It allows their salespeople to demonstrate their software packages much more effectively.
Being able to control both the packages and the data entry itself by voice allows salespeople to concentrate on the job they do best, which is sales. And it makes for a very good presentation to the customer.
What does it cost to get a typical customer started?
That depends on what he wants to do. You can get somebody started for about $1,500 for an add-on to an IBM PC.
One of our customers, Articulate Systems, is selling systems for the Macintosh-a 200-word system and a 1,000-word system.
If you want to gain access to the new voice typewriter we call Dragon Dictate-which will accept 30,000 words of vocabulary, and has a version of the Random House dictionary on line, all completely accessible by voice-it is a $9,000 add-on to a 386-based MS-DOS. The total price includes a PC plus a package, and the Dragon Dictate package for $15,000 to $18,000 per unit. You can store different people’s patterns on a hard disk so that it can be a shared facility.
Do you license to IBM and DEC?
We have licensed to IBM for their small vocabulary speech recognition products. We also have big licensors, such as Apricot Computers, who are not well known in the
Why is the issue of licensing so sensitive?
Some of our customers are in product development for several years and they don’t want their competition to find out about it until the product itself is announced. We have a number of large corporations as customers that see it as a disadvantage if their competition finds out how well they are doing by using speech, and they don’t want to accelerate that any sooner than necessary.
Not every technology finds a market. For example, Chrysler came out with a talking dashboard for the automobile. It was a big dud. No one wanted a voice talking to him in his car.
There are many people in the speech community, including us, who predicted that that would fail.
That is not the only experiment of that kind. For example, talking cash registers were tried out in supermarkets. There is nothing that will drive people crazier faster than having a machine read to them every single grocery item that passes in front of it. The point is that the technology has to serve a purpose. It has to meet demanding criteria in order to be perceived as having value-that value can’t be just glamour.
About 90 percent of all business communications are handled verbally. Virtually every market survey conducted in recent years has pointed to a multibillion-dollar market for speech recognition by itself-not including the voice response or voice message or other technologies-within a decade.
Doesn’t the response time depend a great deal upon the individual teaching the machine? What about regional speech?
Dragon Systems technology, as evidenced in our Dragon Dictate program, is unique because it adapts to the user as the user starts speaking to the system. We do have smaller systems where one can provide samples by speaking all the words in the vocabulary in any language. For the 30,000 word recognizer, though, you don’t have to do any training.
The system has built-in standard pronunciations, if you will, for all the words in its vocabulary. You may say those words very differently, so the first time you say that word the system may or may not recognize it. But whether it does or not, there’s a very easy correction mode. So the system in some regard has a learning capability.
Have you been told you are missing opportunities?
We get at least one call a day from somebody who is interested in investing in the company. I think some people have misunderstood our reasons for not allowing outside investment in Dragon. They think, “Oh these people, they don’t want anybody to have a piece of their pie.” That is not true. An investment in our company, we feel, has to make sense. The investor must be concerned with addressing the marketplace today, and advancing the technology.
We think much of the investment today in R&D in the
IBM is the only computer company in this country that has made significant investments in speech technology.
Have they made any offers or expressed interest in being a shareholder?
Unfortunately I can’t disclose any business relationships that aren’t public.
You’re small now. What happens when the market gets a little bigger. You’ll have to do more with more players in the game.
We are going to grow. We will insulate the technology area from the other areas. By licensing our technology we have a much smaller proportion of the products that are sold. On the other hand, we haven’t had to assume the marketing and distribution costs, which are very high.
Over the last five years, we have done internal business plans and looked at a variety of different scenarios. Had we decided to go out and do that kind of marketing, we would have had to raise several million dollars, minimum-immediately. We strongly considered doing that, but felt it was not the best choice at the time.
Over the course of time, the market will change and we will be quite flexible. One of our principles that we use in our research environment is that if somebody can show us a better way of doing something, we will do it.
What percent of your business comes from licensing versus your direct selling of your products?
It’s about 50/50. About 30 percent of our licensees are foreign-about one quarter of our sales. We have a good international representation.
Internally, licensing is done at the managerial level. Other than that, we have three people who are directly involved in the sales and support levels. Most of our sales are done through our licensees and our value-added resellers and distributors. We’re not trying to go directly to the end users.
Won’t that have to change at some point?
Yes. We intend to put the systems into place and support the policies that are going to allow us to do that with as little disruption as possible. There are businesses that experience a tremendous boom growth and just as rapid bust. We would be willing to forgo some of the boom to avoid the bust.
Don’t you worry that someone will come along and pinch your technology?
In a high-technology area all you ever have is lead time. Anybody who ever thinks of anything else is simplywrong. They are fooling themselves, the turnaround time is the only advantage. We have a great deal of momentum and a long string of patents.
I’m not sure. We have significant patent protection. The only other
There has got to be plenty of market plays for many players and we will be a significant player as long as we manage our business. Even if IBM comes in and takes 30 to 70 percent of the marketplace, there will still be the other 30 to 70 percent left and we expect to be a significant player. Right now we have the only commercial voice typewriter capability in the world. It has a number of unique features and characteristics, such as the automatic learning capability.
IBM doesn’t have it?
Nobody else has ever done this.
Isn’t having IBM as a competitor unnerving?
Certainly IBM is a company that we respect most highly. We are not concerned about any of their commercial systems that are in the market at present. Customers who do any comparison at all usually buy our system.
How much lead time do you have at present?
At least five to seven years, and I am being conservative. Speech recognition is not one of those situations where you dump a lot of money into it and you get a result. You have to have a lot of operational experience and that doesn’t come overnight. We introduce one or two products a year, significant products.
Will you be able to continue to do that?
Yes. It all depends on what the magnitude is. We spent five years putting out the Dragon Dictate product. We have a number of capabilities in the pipeline and we are expanding those. We plow most of our revenues into advancing our technology and we think that is the right thing to do.
If you could get one idea or notion about this technology or your company across to CEOs, what would it be?
I would encourage those individuals to start thinking about how they can use speech effectively in better pursuing their goals in the marketplace.
There are some who are already very seriously working with speech. And there are a lot who aren’t. They are waiting for their competitors to do it. They are going to lose that two- to three-year lead time.
People often think about new technology in a very restrictive sense. One of the primary reasons that Xeroxgraphy had such a hard time getting off the ground was that people were comparing it to carbon paper.
Is SR, in effect, a time-based competitive tool?
Yes. It allows you to save time and be less dependent on other people.
Today people spend their time worrying about having someone else prepare or misprepare their documents. Speech is a mechanism for getting text and data entered very rapidly under your own control as accurately as possible.