Language recognition programs
The further spread and deepening of the use of various information systems leads to the need to provide the user with maximum convenience when working with a computer in dialogue mode. The tendency to improve the communication interface leads to the simplification of the dialogue between the user and the computer. In recent years, the development of a user-friendly interface has received close attention from the leading manufacturers of software products. Multi-window systems equipped with visual controls according to GUI (Graphical Users Interface) principles have become the usual standard. Managing information systems no longer requires searching for the right key on the keyboard. Everything is done visually, and the user sees the results of his actions on the computer monitor, at any moment he can turn to the help system, which has become an integral component of any information structure.
An overview of speech recognition programs:
Four leading products were tested – Dragon NaturallySpeaking Preferred, IBM ViaVoice 98 Executive Edition, Lernout & Hauspie Voice Xpress Professional and Philips FreeSpeech 98.
All packages were installed on a Pentium II-266 computer with 128 Mbytes of RAM and a Sound Blaster sound card. With each package, experiments were conducted on text dictation in the main screen (the usual version of Windows WordPad), as well as in the environment of a word processor, a spreadsheet and a mail program. Among the test documents were a business letter containing lots of bold, centered paragraphs and unusual words, as well as URLs and tables, as well as simple emails and poems.
In general, the program Dragon NaturallySpeaking Preferred was the least error-prone and easiest to use, although it was quite inconvenient to work in spreadsheets and manage screen movements. However, each of the considered packages has both strengths and weaknesses. Here’s how they compare in terms of basic parameters.
Primary education. In the descriptions of all packages, it is stated that it is enough to install them, read a few sentences aloud – and you can start dictating in a well-placed voice. I admit that this is so, but the results will be much better if you spend 10 to 50 minutes on teaching the program the features of your voice. The corresponding procedure consists in reading a series of test fragments; it is tedious, but it is performed only once. In general, Dragon training turned out to be the least burdensome.
First, you need to study for the tests. Next, you need to give the recognition systems words that were missing from their built-in dictionary (ranging from 30 to 64 thousand words). It is necessary to create an additional dictionary in all considered packages, but Dragon NaturallySpeaking recognized better how to write a new or unusual word. In general, the more you teach the program and work with it, the more accurately it understands you.
Dictation. All four packages are primarily designed to ensure that an unformatted stream of text is entered into the document. Dragon copes best with this task. He was the only one of them all to achieve the claimed recognition accuracy of 95%. But 95% accuracy means that every 20th word is misinterpreted, and correction takes time. In IBM ViaVoice 98, Philips FreeSpeech 98 and L&H Voice Xpress Pro, the recognition accuracy was around 90%.
Support for other programs. All four packages provide dictation directly in the window of any program that works with texts, including Microsoft Word, Excel and such popular mail clients as Outlook Express and Netscape Messenger. True, on computers with respect to old models, the processing of spoken words may take place with a delay.
Command and control. You can not only dictate to programs, but also give them instructions – open this or that file, print this or that page – and, in addition, manage movements on the Desktop. The corresponding functions worked in all four programs, but not always. Some commands, such as click File (click on the File item) or click Save (click on the Save item) in Word, have to be repeated several times before the computer agrees to execute the command – especially when working with Philips FreeSpeech 98. IBM ViaVoice 98 performed best.
Sound equipment. All programs assume the use of certain sound cards – including, however, such popular brands as Sound Blaster. Three of the four packages (the exception being Philips) come with a standard microphone with headphones. But to achieve better results, it seems to make sense to buy a higher-quality microphone with noise reduction.
Dragon NaturallySpeaking Preferred.
Advantages: the highest accuracy of recognition, ease of use.
Disadvantages: inconvenient input of numbers, mediocre control of the screen.
The only program that comes close to living up to its advertising promises, the $160 Dragon NaturallySpeaking Preferred, inputs well and lets you easily switch between dictation, correction, and formatting.
Dragon’s package far surpassed the others in recognizing the text of a business letter, recording such complex proper names as O’Keeffe, Bernardo, and Peterborough with surprising accuracy. In general, he came very close to achieving the declared accuracy of recognition – 95%.
When Dragon does make a mistake, you can enter correction mode by simply saying “delete that” or “scratch that” and then repeat whether the word combination is correct. Formatting text is also very easy: you highlight text and say words like “set font Arial 24”, “center that” or “bold that”. But the set of commands for movement and correction in Dragon, as in the other three packages, is complex. The result of dictation was much faster and easier to correct using a mouse and keyboard.
You can work with Dragon in two ways: firstly, the package allows you to dictate in its text window (the resulting document is then inserted into the desired program), secondly, it contains built-in utilities that provide input directly into the window of a word processor, mail program, electronic tables. Which is easier depends on your personal preference and the program you use. It is not at all difficult to transfer the dictated text to the mail program window, and it is more convenient to dictate numbers directly into the cells in the spreadsheet.
The NaturalWord module for dictation in the Microsoft Word 97 window (it does not work with earlier versions) is very similar to the main Dragon speech input screen. It also provides access to Word’s menu commands, but you have to make several attempts before Word executes the command, and even then it was faster to use the keyboard and mouse. The company warns that on computers with a relatively slow processor recognition may occur with pauses, but on the Pentium II-266 this never happened.
The NaturalText utility provides dictation in the environment of almost any program for Windows 95/98. After its installation, text and microphone icons appear in the system tray on the Taskbar. To start working with NaturalText, just click on the microphone icon – and you can speak.
IBM ViaVoice 98 Executive Edition.
Advantages: good recognition of simple words, improved movements on the screen and design.
Disadvantages: low quality recognition of proper names and abbreviated words, slow work in the environment of some programs.
The package does well on the plain text sections of the test letter, but stumbles on some proper names and abbreviations. For example, he wrote the surname Bernardo as Bernad O, the name of the town Westwood as West would it, and Peterborough as Peter burrow. This significantly reduced the final percentage of recognition errors. In my experience, the more context provided for a word, the more likely ViaVoice was to recognize it correctly.
Like Dragon, IBM provides easy switching between dictation, correction, dictation and command entry modes. Just say what you’re going to do now and the package will usually get you right. Occasionally during the testing process, we encountered minor problems when trying to get ViaVoice 98 to accept a move command such as move up four lines. This is best done if you give commands in a short and monotonous manner; at the same time, sometimes there is an unpleasant feeling that you are not teaching the program, but it is you.
ViaVoice can be used directly within applications such as Word, Excel and Internet Explorer Mail. Dictation in Word takes place almost without delay, but in other programs you have to wait a little while until the dictated text is processed. But in ViaVoice 98, working with numbers is more natural than in Dragon: to write the amount of $23,432 in an Excel cell, you need to say “twenty-three thousand four hundred thirty-two dollars” (twenty-three thousand four hundred and thirty-two dollars).
Another advantage of ViaVoice 98 is the excellently organized management of the Windows Desktop. To start Excel, it is enough to say “open Excel”, to expand the menu item, it is enough to name it. You can select buttons by saying the words written on them (such as OK or Cancel). In the event that the program will not recognize the commands, it provides training, this tactic is rarely resorted to.
L&H Voice Xpress Professional.
Advantages: simple and quick correction of incorrectly perceived words, excellent recognition of numbers.
Disadvantages: pickiness in pronouncing commands, uneven recognition quality.
At first glance, Lernout & Hauspie’s Voice Xpress Professional looks very similar to NaturallySpeaking Preferred. But, although this $150 package has certain advantages – good number recognition, close integration with Office 97 components – it is inferior to the Dragon program when it comes to recognizing words and commands.
The training procedure in Voice Xpress Pro is the longest of all. It takes more than 50 minutes, during which you have to read 230 screens with text – lists of commands, exercises in dictation by letters and excerpts from a book about Antarctica with such difficult-to-pronounce passages as vulpine Russian glaciologist.
The program mostly coped with the recognition of a business letter, but from time to time something happened to its work. Westwood Park turned into west with a park, June twenty-second – into June twenty seconds, quarter – into water. Articles and short service words (such as a, the, that) were also a real problem. Recognition defects may be related to the small volume of the main dictionary – 30,000 words, which is about half as much as in other packages.
Voice Xpress Pro also had difficulties recognizing some navigation and control commands, such as go to the end of the document. It is possible to reduce the number of errors by starting to speak slowly and unnaturally clearly. An even better result was the fusion of words – you had to say something like “downtwoparagraphs” (two paragraphs down), pausing before and after the command.
As for the positive aspects of Voice Xpress Pro, this package has the best error correction system of all considered. You say “correct that” and a list of options appears on the screen. If a suitable option is in the list, you need to say “take” (accept) and the number of the correct word, after which it is inserted into the document. Another advantage of the package is good integration with Microsoft Office 97 components: work with them takes place without any delays. In addition, Voice Xpress Pro is brilliant at entering numbers: dictating them in Excel is quite natural. Still, it is significantly inferior to Dragon in terms of recognition quality and IBM Via Voice in ease of management.
Philips FreeSpeech 98.
Advantages: availability of a free trial version, cheapness.
Disadvantages: mediocre recognition quality during dictation, some commands are not recognized, lack of a microphone.
Of the four programs we reviewed, Philips FreeSpeech 98 has the most accurate name: it can be tested completely free of charge (one of the meanings of the English word free is “free”. com setup file with a volume of 30 MB. Use of the package after the seven-day trial period costs $39.
Despite the very low price, the FreeSpeech 98 package is functionally complete. It supports both a standard WordPad-type dictation window, and the ability to dictate in any program for Windows, where there is text input, and control movements through menus and windows on the Desktop. Unlike other packages, FreeSpeech requires manual switching between dictation, control, dictation by letter, and sleep mode. It is supposed that the corresponding commands can be given by voice, but they worked so rarely that it is better to prefer the mouse.
The initial recognition quality of FreeSpeech 98 did not make a very favorable impression. In a test business letter, he perfectly recognized basic vocabulary, but any unusual word threw him off course. The surname O’keeffe was written as both keys, and Bernardo – as burn our goal. Numbers, as in Dragon NaturallySpeaking, are oriented only if you dictate them one digit at a time.
The commands for moving and formatting FreeSpeech 98 are very similar to those available in other packages (in fact, some of the commands simply overlap). But FreeSpeech often ignored my cues to highlight or move the cursor with surprising stubbornness.