Transcribing Audio to Text
Introduction
Good transcription requires knowledge of which non-speech audio information should be included in the transcription. It’s more art than science — for example, it’s not always clear which non-speech audio information to include and how to communicate it in text.
If you have the resources to hire professionals to do your transcribing, that is best. If you don’t, please don’t be deterred from providing transcripts or captions. This page helps you do it yourself (DIY).
In some cases, there is already text available in a script. You’ll probably need to make some minor edits so it’s accurate with the final audio content.
How to Transcribe
You can just listen to the audio and type it up. That’s usually pretty tedious because you have to stop and restart the audio a lot. There is software that can help by slowing down the audio and providing easy pause buttons.
You can start with an automatically-generated text file. There’s lots of software and services that provide speech-to-text. These have various levels of accuracy. Often the text does not match the spoken audio — and in ways that change the meaning (or are embarrassing). For example, missing just one word such as “not” can make the captions contradict the actual audio content.
Plan to spend time correcting automatically-generated transcription.
More details on options and tools for transcribing are in: Transcripts on the Web, How to get or make transcripts .
A little about captioning tools is in the Captions/Subtitles page of this resource: Captioning Tools.
What to Transcribe
Generally, you transcribe all speech and relevant non-speech sound (such as: baby cries, fireworks going off, horse hoofs approaching). Keep in mind that the main purpose is to provide the information that you hear to people who cannot hear the audio. That will help you know which sounds to transcribe, and which are not needed. The following are common practices, not requirements.
Basics
-
Identify the speakers as relevant. Often it is best to use the full name the first time and single name throughout — either first/given or last/family depending on the formality.
-
You can include relevant information about the speech. For example:
( between gritted teeth ):
I hate this computer! -
Put non-speech sounds in parentheses, lowercase, italics, with a space before and after. For example:
( computer crashing into bits and parts sliding across the floor ) -
When a speaker is off-screen, you can put their speech in italics. For example:
Jose: What was that awful noise?
Zoe: You don’t want to know.
Jose: Well, I’m coming to find out. -
Only include background music if it’s important to understand the content of the video. Use objective descriptions that indicate the mood; avoid subjective words, such as “beautiful.” If the words in the music are important, add a musical note to the beginning and end of each caption. Put music information in italics. For example:
♪ scary music, JAWS theme ♪ -
Do not emphasize a word using all capital letters, except to indicate yelling. For example:
Jose: YOU KILLED MY NEW LAPTOP!
Transcribe Accurately and Honestly
- Do not change or adapt or add to the text. Transcribe what is said accurately.
- For example, it is usually not appropriate to correct grammar or other mistakes.
- Do not censor. For example, if objectionable words are said, include those in the captions. If the audio is edited to obscure a phrase (e.g., “bleeped” audio), reflect that in the captions, e.g., –bleep–
- Do not provide additional clarifying information in the captions. (You can provide some in the transcript as appropriate.)
- Include the appropriate level of detail:
- For some content, such as legal depositions, transcribe everything verbatim, including things like “um”, “ah”, and repeated phrases.
- For most web content, it is acceptable to leave out non-substantive text to make the captions easier to process — while adhering to the tips above. For example, if the speaker says:
I just got so frustrated (cough, cough) sorry – uhhh what was I saying?…, oh yea - I got so frustrated with my computer.
You can caption:
I just got so frustrated with my computer. - If there is speech that is not at all relevant, indicate that it has been excluded from the captions. For example:
[participants discuss the weather while the presenter reboots his computer]
- If you cannot understand what is said, transcribe:
[unintelligible]
More on Captions
For captions:
- Captions are one or two lines. Generally is is best to keep them under 32 characters per line.
- Put a new sentence on a new line.
- If you need to break a sentence into multiple segments, break it at a logical phrase.
Captions also include the time that each phrase will be displayed. Most people use tools to develop and refine captions.
Learn more about captions in another page of this resource: Captions/Subtitles.
More on Transcripts
Learn more about transcripts in another page of this resource: Transcripts.
Back to Top