Thursday, May 10, 2018

Mail bag: On transcribing storytelling

I'm deep in some intense (and fascinating) reading as I work on the last essays in the "Store Bought Stories" book. But the blog is hungry, so I went looking in my mail bag again. Here's an anonymized (names and details changed) excerpt from some emails I sent recently to somebody who was working on transcribing storytelling in some interviews. (I wrote about the topic of transcription in WWS as well, on page 183.) Thank you, correspondent, for asking such useful questions!

Sentences in speech and print

When you are listening to a person talk, you can hear their sentences in the tones and pauses they use as they speak. However, in a transcript, the same string of words that makes sense with the audio information in it can look like nonsense without it.

The truth is, most people do talk in complete sentences -- except for reframings, where they stop and start again with a new phrasing -- but you need to listen to how each person communicates the pauses (commas) and endings (periods) of their thoughts. For example, I could hear the endings in Melanie's sentences, but only after listening for a while to her style of talking. Her pauses and sentence endings are brief, but they are there, and you can help people "hear" them by rendering them in the transcribed text. It may seem like you are "improving" Melanie's words when you add capitalization and punctuation to them, but you want the people who read her stories to be able to make sense of them, and for that they need complete sentences.

Reframing

When people change direction in the middle of a sentence -- as they do, often -- it's best to show this in your transcript with dashes, because that's how people express such reframings in print (like I just did). Every time a person reframes a thought, if the transcript shows a break and restart, people will understand what they meant by it. Without such markings, people tend to spend a lot of time puzzling out what people meant by strings of words that don't make sense together.

Also, when a person leaves out a word that is obvious when you hear the audio but can't be guessed at without it, it's fine to add that word in [square brackets] as a clarifying comment. For example, in the first sentence of your transcript, you and I both know that Melanie is referring to a question she was asked before the recording started. But the person reading the story won't understand that. So it's okay to add in that context to help people understand what she meant.

I try to leave it in when people reframe the way they are telling a story, because it says something about the story. However, if somebody repeats a word just because they said the wrong word and then corrected themselves, you don't need to keep that in.

For example, at one point Melanie said,
"we can’t offer a whole lot of support for one side of the purchaser of the purchase in advance"
As text, that makes no sense at all. But listening to the audio, I can hear this:
"We can't offer a whole lot of support for one side of the purchaser -- [backtracking, correction] of the purchase -- in advance"
Though in that case I would just remove the backtracking part (because she didn't mean to say "purchaser") and make it:
We can't offer a whole lot of support for one side of the purchase in advance"
My rule is, if repetititions and reframings add to the understanding of the story, keep them in. If they don't add anything, leave them out. For example, when Melanie said "even the short term income, outcome was not his favorite" -- you could just render this as "even the short term outcome was not his favorite," because her saying "income" was just a slip of the tongue.

Filler words

With "ums" and "ahs" ... when there are few of them I keep them all, but when there a lot of them I tend to remove them all. Some people say only a few, but some people say so many of them that the transcript gets hard to read. In this case, if you transcribed each um or ah Melanie said during those few minutes, you would have dozens of them. I'd rather remove them all than make arbitrary decisions about which ones to keep. The same thing goes for when people simply say the same word or phrase over again, like when she said "of the, of the" (meaning nothing but "I haven't yet thought of what to say next").

Sometimes I will take out a repetitive thing somebody said, then read the story and see if it's better (easier to understand, more authentic, more personable) with or without it. There's a balance: if you remove all of the hesitations and repetitions it doesn't sound like a person talking, but if you leave in too many it's hard to find the story in the mess. The goal of transcribing stories is not to create a perfect record (as in a court case) but to get across what the storyteller meant by the story -- and what the story meant to them.
 Listen until it sounds the same

What I usually do when transcribing stories is, I listen to one sentence at a time, then I pause the audio and type that sentence. Then, when I get to the point where it sounds like a paragraph break should go (because that thought has been completed), I run back the recording and listen to the audio again while reading what I've typed. Wherever what I've written doesn't capture what they are saying, I fix it. I do this until I'm sure that what they said -- and what they seemed to mean by what they said -- is captured enough that I can take the audio away and it still "sounds" the same. Sometimes I will listen to the whole story again after I've finished transcribing, because I understand things about the story once I've heard the whole thing that I didn't understand after the first few sentences.

That sounds like a lot of hassle, but it does make a big difference in getting stories that people can read, understand, and use. A little attention to this up front can make a big difference in how useful your stories are to the people who will be using them later on.

Transcription and social signals

When I make a transcription of a person telling a story, I have three goals in mind. My test is that any random person reading the transcription should be able to tell me:
  1. what happened in the story
  2. how the storyteller felt about what happened in the story
  3. how they themselves feel about 1 and 2
Getting down every word the storyteller said can reduce the transcription's ability to meet the first goal. Without auditory information about tone and pitch and volume, people have to do a lot of work to make sense of text that doesn't hold together into complete sentences with coherent thoughts. Dashes to indicate quick turns in the meaning of the text can help, but sometimes verbatim text is such a word salad that you just have to remove some words.

On the other hand, getting down every word the storyteller said improves the transcript's ability to support goals 2 and 3. That's because in a spoken conversation, hesitations, filler words (like "like" and "you know), and repetitions serve a purpose. When people feel uncertain about something they're saying, they hesitate and add filler words and repeat themselves. When they feel confident about something, or want to indicate that something is important to them, they speak more coherently and completely. People know this and pick up on these differences and use them to figure out people's motivations. That's why it's important to include some of the hesitations and repetitions we hear when we transcribe stories, because we're trying to preserve the social signals in the way the story was told.

So it's a balance. I base my decisions on what to keep in and leave out on what I can hear of intent in the speaker's voice. Sometimes you can tell that somebody is repeating themselves just because their thoughts are colliding with each other. Like that time when Melanie accidentally said "pace" instead of "place," and she corrected herself. That sort of editing-while-talking I tend to leave out, because it's not socially meaningful.

But when repetitions and hesitations seem to have social meaning, I leave that stuff in. An example is when Melanie said,
"So he understood, even though the short term outcome was not his favorite, he really felt -- and I was surprised and really happy with that -- really felt that the experience had been well worthwhile."
In that case she put a sentence right into the middle of another sentence. The meaning of her interjection was clear. It was: "This is a thing I need to tell you, and it's so important that I will interject it into the middle of another sentence to make sure you know it." That's the kind of social signal I try to leave in.

And of course there are borderline cases. When Melanie said,
"the folks at the -- what’s the purchasing part? I forgot the -- [Interviewer: Procurement Department] -- Procurement Department, yeah."
I was thinking of taking that reframing out, because it was just Melanie forgetting a term. But then I was like, well, maybe her forgetting that term would be important socially to the person reading the transcription. So I left it in.

I don't think any two people could possibly write the same transcription from the same speech recording, but as long as they are both keeping the goals of the transcription in mind, both versions should be useful. As long as people can make sense of what they read and make some pretty good guesses as to what the storyteller meant and how they themselves feel, it's a good transcription.