NarraCat: Tools for Narrative Catalysis

NarraCat is a set of python scripts I have developed for my in-house use to do narrative catalysis for clients. I use it to look for/at patterns in collected stories and in answers to questions about them. It could be used for any sort of mixed-methods research really, but I use it mainly for narrative work.

I have released NarraCat under the Affero General Public License.

{For those who want to get right to it: Google Code repository here.}

What NarraCat is for

NarraCat helps you look at stories and answers to questions about them using quantitative and qualitative methods. It's a power tool for power producers of narrative catalysis.
That sounds exciting, doesn't it? But beware, this cat scratches. NarraCat is really just a heap of python scripts that run on top of some very excellent open-source libraries for data visualization and statistical analysis, namely SciPy, matplotlib and Graphviz. So it is not a professional, polished software package. It's a tool I built myself to use myself. You can use it too, but only if you treat it with patience and care. It's a bit of a scaredy-cat.

These are some images generated by NarraCat.

Here is a bar graph of answer counts. Perhaps this many people said their story was about dogs, and this many said it was about pizza.   Click on any of these images to see them larger. Note that I have removed all the labels in them to ensure client confidentiality. Normally you will see lots of labels on graphs!
Here is a summary of answer contingency counts. Perhaps this many people said their story was about dogs and pizza, or about dogs but not pizza, or pizza but not dogs, and so on.
Here is a histogram of scalar values. Perhaps people rated their story's memorability thus. Below the graph are several descriptive statistics. I like to juxtapose scales (rate this from 1 to 100) and choices (happy, sad, angry?) mainly because some questions work better as scales and some work better as choices. And it makes a more diverse range of pretty pictures to spark discussion.
Here is a summary of many t-tests between means of scalar distributions. Labels on this diagram would be answers to choice questions (like dog versus cat stories), and sizes of dots represents the degree of difference between means of some scalar question (like memorability).
Here is a correlation matrix showing which scales are correlated with others. Row and column labels would be scalar questions. I usually generate one "grand" matrix showing all the overall trends, and then a lot of subset matrices by which I can compare, say, what portraits of correlation arise when you look only at dog owners or cat owners.
Here is a scatter graph of two scalar distributions. Perhaps these might be ratings of memorability versus ratings of positive outcomes. The giant number is the correlation value, and you will see later on why it is so huge (but it can be removed when desired).
Here is a 3D landscape showing two scalar distributions against a vertical "perceived stability" question. Perhaps in the blue corner where stories were rated highly positive and highly memorable, they were also considered to describe stable events. (By the way, these graphs are very similar to what you can see in the matplotlib gallery, and in fact every time I look there it inspires me to try another even cooler visualization.)
Here is one of several "slice" views of trends across important categories. The columns represent slices of the data. Perhaps it is how dog and cat owners differed in the patterns they showed across the questions. I find that often people have one over-arching comparison of perspectives and experiences they want to make - say it's managers versus employees, staff versus customers, older versus younger, and so on. The ability to "slice and dice" the data, sometimes in more than one way at once, is quite useful.
Here is a network diagram showing relationships between things. Perhaps it is how dog stories were linked to cat stories through shared descriptors. I used GraphViz to create it. (Could not remove the labels for confidentiality, so it's tiny instead.)
Here is one of the results of a cluster analysis, showing three groups of stories that formed based on patterns across several scalar questions. Perhaps it shows over-arching "dog person" versus "cat person" perspectives as evidenced in the answers. I don't believe cluster analysis on narrative interpretations provides definitive classifications of stories or people; but it can provide useful general insights.
Here are some ternary data (three relative scales) plotted against one more scale (as circle size). Perhaps people rated their story for its relative connection to dogs, cats and pizza, and also described how much they liked it.

A conspiracy of results

So in a typical project I use NarraCat to generate something like several thousand of these images. Some hundred of them usually get included in a catalyzing report of some sort. If things go as they should, some dozen of them rush around in the dreams and discussions of somebody somewhere for months, upsetting assumptions and generating ideas. That's the point of the whole thing, to catalyze thought, imagination and discussion.

You are asking: Where do you view all of these thousands of images? NarraCat simply dumps them into a vast conspiracy of folders it creates under your "output" folder. You then use whatever file browser you prefer to look at them. This picture is just some files in the Mac finder. (I don't usually use a black background on it; I just did that so you can't read the file names.) Now you know why the correlation number on that scattergraph was huge; it was so it can "jump out" when looking at lots of such graphs at a small scale.

Now, I have been told that I have a gift for looking at thousands of tiny images and having those that are different in useful ways "jump out" at me. I don't believe I am unique in that. I think it's a matter of practice. But even so, I have built into NarraCat a variety of ways to reduce the number of images I am required to process, just to save time for more valuable uses. I generate sorted lists of the strongest correlations, and I set thresholds below which trends are tucked away, to be explored only on jaunts through the data chasing an enticing trend.

And I try to pack lots of information into few images. For example, the correlation matrix you saw above summarizes about a hundred detailed scatterplots, which NarraCat also generates but which I consider only when particular details merit consideration.

Where are the stories?

Are you asking where the qualitative part is? I was coming to that. NarraCat has a minimal "browser" interface in which you can look at selected lists of stories based on answers to questions about them, like all the stories told by dog owners that featured excellent cats.

On the left of the browser is a list of things NarraCat can compute, related to folders it can populate with lots of image files. You can use this "command center" to make it spit out images. And then you can get a cup of coffee, or several, while NarraCat does your bidding. Note that this can take hours if your data set is complex and you ask NarraCat to do a lot of your bidding.

In the middle section of the NarraCat browser is a story, optionally with some answers to questions below it. Or it might be several stories one after another that you want to read together. How do you make story selections? With that ugly little box under the list of stories. Yep, it's a nasty command line interface. (Or a beautiful command line interface, depending on whether you remember liking these. I liked them. They were magical. You spoke into the darkness and your bidding was done.) At least NarraCat remembers the last several selections you made so you can recall them and see them again.

What can you select?
  • choices (say, stories told by dog owners)
  • choices with choices (say, stories told by dog owners who "rarely" eat pizza)
  • parts of scales (say, stories in which cats were rated as influential 50-100 out of 100)
  • parts of two scales (say, stories in which cats were highly influential but dogs were not)
  • scales and choices (say, stories with influential cats told by dog owners)
  • stories with particular texts in them (standard search)
  • a story with a particular name or sequential number
  • a random story (warning: time thief)
Finally, on the right of the NarraCat browser is a list of all the questions you have in your data set, and some how-to help, all of which is meant to help you remember which magic words to type into the stony-faced command line.

Not very pretty, is it?

If you are saying under your breath "this is not a user-friendly program" -- go ahead and say it loud, because I knew it already. You won't offend me. I like to say NarraCat has a face only a programmer could love.

You can take this handy test to determine if NarraCat is for you.
  1. Can you write a computer program that reads data and calls upon libraries to generate graphs and statistics? Then you could write NarraCat yourself, but here it is so you don't have to. (Maybe you can make it better. If you want to contribute to it, let me know.)
  2. Can you create a computer program that prints out "Hello" twenty times but only if it's not Friday? And on Fridays it asks you what to print and how many times? If you can do that, you can probably struggle through figuring out how to use NarraCat in a week or two. And then you will know how to use it ... unless you stop using it, whereupon you will forget everything and have to figure it out again later. But it's free :)
  3. Can you program at all? No? Well, then you will probably not like using NarraCat yourself. First of all, you have to install several libraries on which it depends. And second, the way you make it work for your needs is ... to edit source code files. If you can't imagine doing that, or don't even know what that means, don't use NarraCat yourself, unless you want to take the first step on a fascinating journey through the world of programming - maybe you do, maybe you don't. The good news is that lots and lots of people do know how to do that (see #1 and #2 above). Some of them are probably your friends and colleagues. Find one of them and get them to use NarraCat for you or help you learn how to use it.
Questions? Suggestions? Ideas? Send me a note at cfkurtz at cfkurtz dot com.

More options: Take a look at Creative Bloq's article on "The 33 best tools for data visualization" to find other things that are somewhat like NarraCat and may work even better for you!