Transcribing Interviews or Lectures

The following mainly covers technical issues. The license is vague concerning using the product for interviews. One parsing of the language suggests that each individual in an interview would need a license. Bottom line is that the software is NOT designed for transcribing interviews.

Multiple Subjects

A common request is to transcribe interviews with multiple persons. Sorry, in 95% of these cases NaturallySpeaking or any other commercially-available software will fail to provide usable results.

  • The people generally are not dictating, they are talking
  • There is no punctuation to help guide NaturallySpeaking
  • There has been no enrollment
  • Each person has a different vocabulary, so the vocabulary-building tools are of little help
  • Often these are recorded with inexpensive recorders (or expensive recorders coupled with poor microphones) so that the audio quality is poor
  • Most often the people involved are not handling the microphone(s) consistently so that the audio volume changes throughout the interview
  • External noise is often an issue in these recordings
  • Multiple people speaking at once can sometimes be interpreted by humans, but in 2008 is beyond the state of the art for computers


Single Subject

Less common is a case of having hours of interviews or recordings of the same person. In this case, there is a slight chance of good results.

Factors for Success

We know little about these successes because they don't occur often.

  • Understanding audio systems and getting excellent recordings
  • Understanding of Dragon and building a suitable vocabulary
  • Excellent-quality raw audio
    • In one case, studio-quality recordings on tape worked OK
    • In another case, digital recordings on upper-end ($400+) digital recorder worked fine
    • A stationary speaker is likely to have more success than one who roams the floor
  • Excellent speaker -- one success story was with a preacher, another with a professional speaker
  • Almost all of the audio is a single voice. There may be a 2nd voice but it accounts for a very small amount of the audio.


Normally interview transcription can be handled better by having someone listen and then redictate the interview. Some people can listen and talk at the same time, others find this difficult to do. Some learn to do this fast enough that they can redictate a one-hour interview to Dragon in approximately an hour. Others find they have to listen, speak, listen, speak, etc. so that transcribing a one hour interview takes at least 2 hours, typically three hours. This can be done with any version of Dragon NaturallySpeaking or Nuance Dragon Medical Practice Edition.


