Softnet Systems, Inc. Speech Recognition Specialists


Converting Large Numbers of Documents for Vocabulary Builder

The NaturallySpeaking Vocabulary Builder only accepts a few word processing file formats. Many persons have past documents in different formats that would be valuable input to the Vocabulary Builder. If you only have a few such documents, most word processors have suitable means for converting the documents to MS-DOS text format.

It may be desirable to combine many small documents into a single document for building a vocabulary. One way to do this is at a MS-DOS prompt, using a command such as:

copy *.txt largedoc.tex

This copies all your ".txt" files at once into the file largedoc.tex. Then rename largedoc.tex to largedoc.txt so Vocabulary Builder can process it. If you really have lots of text, do a "copy a*.txt alarge.tex", then "copy b*.txt blarge.tex", ... to break up the text into smaller chunks.

If you wish to build a custom vocabulary but are not comfortable with converting a large set of files, for approximately $1/MB (less for higher volumes) you can e-mail us files and we can return documents converted to text. E-mail us for details. Alternately we may be able to do this remotely, never moving the documents from your systems.

Many document sets have standard headers and footers. We have available programs which will strip this information (e.g. delete first 8 lines from each file) so that it is not analyzed by the Vocabulary Builder. This is particularly appropriate when the documents are relatively short, where this header/footer information constitutes over 5% of the text. The same techniques can be helpful in deleting patient/client names from these text files.

By creating larger files containing extensive samples of your vocabulary, you can make edits to better reflect your own vocabulary needs. For instance, “2011” may be underrepresented in your past vocabulary. So we might replace all instances of “2007” with “2011” to better reflect your current vocabulary. We'd probably also replace most instances of “2008” with “2012” or something similar so your vocabulary would be good for next year too.

If these are medical documents, we would first execute a Business Associates agreement to meet confidentiality requirements.


Hints, Recommendations

New to Speech Recognition
User Profiles
Dragon NaturallySpeaking Hints


Nuance Dragon Medical Practice Edition
Dragon NaturallySpeaking
Upgrades for Dragon NaturallySpeaking
Books, Videos
Clearance Items
Ordering Options


Training, Consulting
Sales and Support


Ordering Options

Home » Hints » Convert Documents