“The work better be right:” The data kingdom of Jeremy Singer-Vine

We use analytics and advertising tools by default. You can update this anytime.

“The work better be right:” The data kingdom of Jeremy Singer-Vine

Welcome to Free Radicals — a collaboration between Everything and Sherrell Dorsey, founder of The Plug. Each edition features a conversation between Sherrell and someone who embodies leadership, focusing on equity, advancement, expansive thinking, and progress. This is the fourth edition, featuring Jeremy Singer-Vine.

To read previous editions, and sign up for future updates, click here.

Enjoy.

Jeremy Singer-Vine is the data editor for the investigations team at Buzzfeed News. Their recent FinCEN Files investigation, done in collaboration with the International Consortium of Investigative Journalists, illuminated the movement of more than $2 trillion in suspicious financial transactions around the globe, and resulted in the most riveting podcast about paperwork to date.

Singer-Vine’s approach to finding and sharing information exemplifies intellectual generosity, and his commitment to sharing data and making his work process transparent pushes the boundaries of what journalism can do. He’s quick to identify ways in which data folks and reporters can collaborate, not only to add depth to reporting, but to broaden the scope of which stories can be told at all.

Although he started out as a photographer, and then worked as a general assignment reporter in several newsrooms, Singer-Vine discovered data journalism and data visualization during an internship at Slate in 2009. He started learning computer programming languages in his spare time, to be able to do more with the information he was grappling with, and how he could present it to readers. In 2014, Singer-Vine and reporter John R. Emshwiller were named Pulitzer Prize finalists for a Wall Street Journal series investigating largely-overlooked factories and labs that once helped to develop and produce nuclear weapons in the US, many of which had not been properly cleaned up.

Each edition of his weekly newsletter, Data is Plural, delivers a collection of data sets organized around five different topics. Some are connected to the news cycle, like data about global coups or student loan debt, and others, like the surprisingly large number of databases dealing with different aspects of Bob Ross paintings, are just delightful.

Sherrell Dorsey spoke with Singer-Vine spoke about his approach to journalism and leadership, how he finds interesting data sets, and what he sees as the surprising success of Data is Plural.

Can you give us an example of how the work you do, and your approach to it, is changing data journalism?

One thing we've been very focused on at BuzzFeed News since the formation of this data team is showing as much of our work as possible. That often means publishing GitHub repositories that include all of our analytic code, plus whatever data we're able to release, plus methodologies that explain our approach.

This is something we do for a few reasons. The most important, I think, is transparency. We want readers to be able to see what we really did and trust that a new data finding is backed up by something real. Another aspect is reusability and the transmission of this information to other journalists. If someone wants to do analysis like ours, or use data that we used or collected, they now have that ability.

The third is self-accountability, and it goes hand-in-hand with the goal of transparency. We know that if we're going to publish, the work better be right. It forces us to be especially careful and rigorous about the data analysis we do, because we know we're holding ourselves to that standard.

How do you think about leveraging data for important conversations around power as part of the media’s role as watchdog? Does better data equal a better world?

My professional work is predicated on the idea that having data—clean, high-quality data—is useful and meaningful, and helps hold powerful people in organizations accountable. At the same time, I don't think that data is a silver bullet that will solve all our problems.

Data can play a unique role in journalism, yet I don't think in our journalism the data stands alone. It's always buttressed by more reporting and context, and other types of information-gathering and synthesis. But I do hope that these data sets can provide a boost to projects and add to investigative journalism. They add something to our work, even if they aren't the totality of the work itself.

Tell me about your Wall Street Journal project that was a Pulitzer Prize finalist.

That was a collaboration with John Emshwiller. He did the bulk of the work on that series, but the data was a significant part of it. He had come across a fascinating and underreported aspect of the Cold War buildup in nuclear processing and storage capabilities in the US. We did a series of stories, a component of which was this searchable database of all of the information we could find about this list of hundreds of sites that the government had indicated at one point to have processed or stored nuclear material, many of which had not been properly cleaned up or appropriately documented.

Much of the most significant investigative work is collaborative, across different journalistic skill sets. How do you think about the role of data versus reporting?

There is a tension between the richness of the world and the way that data inherently flattens it. Data can make legible information about a huge number of things, all at once, at our fingertips.

And yet, I think there is a risk of losing some of the nuance. That is, to me, what I like about data journalism—marrying the richness of real people, of the stories behind the data, the stories that the data exemplify, with the numbers. It's the interplay between the ability for data to reach across a huge variety of people and cultures and countries and jurisdictions, married with the ability of in-depth reporting to flesh out what those data points really mean, where they come into contact with day-to-day life, that continues to captivate me.

How did you end up as a data journalist?

I didn’t have a computer science background—I studied liberal arts in college. I maybe played around with HTML and knew a line or two of JavaScript, but I would say I was underexposed to computer programming. But after college I had an internship at Slate Magazine where I was seeing all of the amazing things that my supervisor, Chris Wilson, could do with just a little bit of programming. And that was exciting to me.

He helped me take the first steps, understanding how to set up my computer, suggesting some high bang-for-your-buck projects I could undertake before I had fully mastered a computer programming language. I started doing that at home in the evenings and on the weekends, until I felt like I had built up enough enough knowledge to start incorporating it into my work. Once I reached that critical threshold, everything accelerated. Once I could make computer programming and data analysis and data visualization part of my day job, I learned faster, I got more projects under my belt, I expanded my general understanding of the field, and that snowballed. Now, a decade later, it feels like second nature, but at first it wasn't at all.

It sounds like the relationships, including collaboration with other journalists and mentorship situations, have been very formative for you. How do you think about leadership in a newsroom, and as a part of career development?

I sincerely wish I had deeper and more radical thoughts on leadership. It's probably something I don't think about enough. I've experienced and been part of leadership in a very personal, one-to-one way.

I have, over the years, through panels at conferences, and other more public work, tried to champion some aspects of data journalism that I think are important, like reproducibility. But I am most comfortable doing work that I think exemplifies what data journalism could be, rather than telling people what it should be.

Earlier you mentioned reusability and the idea of transmission of data to other journalists. That’s at the core of your personal newsletter, Data is Plural, which is a weekly collection of data sets. Tell me about how you created your newsletter, how long you've been running it and why.

It's now a little more than five years that I've been publishing Data is Plural. One of the reasons I launched the newsletter is that I wanted it to be a conversation. I knew from my own work that there were all sorts of data sets out there that I didn't know about. I was hoping—and this has been borne out and has been one of the more gratifying parts of publishing the newsletter—that there would be people out there who would respond with data sets that they'd seen themselves. And so the newsletter as a format has been great; it feels a little bit like a conversation or a water cooler.

It started out just scratching an itch. I was thinking, it would be great if there were some way to learn about new data sets. And I couldn't find anything like that — or at least what I had in mind. The format I started with is pretty much the one I've stuck with, five data sets or topics per edition, trying to keep the summaries fairly short, but dense enough to contain useful information. A mix of things that seem broadly useful to people from all sorts of backgrounds and interests, combined with things that are a little more offbeat.

Often I'll include a data set just because the fact that someone collected it is interesting, even if it's probably not a data set that's going to be the most useful. The fact that it exists, in my mind, says something interesting about the world.

I appreciate that sense of Data Is Plural being a watercooler for data nerds. What kind of a resource are you ultimately building here? What is the impact of the newsletter?

I think it has impact in two ways. The more immediate way is learning about a data set that is directly relevant to the work you're doing, or spurs an idea. The other, hopefully, is building up a mental Rolodex of the data that exists in the world. And the data that doesn't exist. Obviously, Data is Plural will never be a catalogue of every data set that does exist, but I hope it gives readers a sense of the sorts of things that are out there and helps them develop a mental model of what sorts of organizations are collecting data, what sorts of government agencies are publishing data, and what sorts of projects academics are undertaking, for instance.

How did you find your audience? And how many subscribers do you have today?

Initially, I sent the newsletter to a few friends and colleagues. After I got some good feedback, I posted it to the National Institute for Computer-Assisted Reporting (NICAR) mailing list and to Hacker News, picked up a good number of new subscribers from that. Since then, with the exception of a few news articles and blog posts that brought in a rush of new subscribers, it's basically been word of mouth, people hearing about it from their colleagues, or their classmates, or their friends.

When I first launched I thought it would be nice to get 100 subscribers. It's so hard to accumulate any large audience for anything. I might have underestimated people’s interest in learning about new data sets, I thought it was maybe more of a niche thing. It turns out to be something that, to my happy surprise, people from all fields—students and artists and academics in the humanities and sciences—find exciting, even if not every data set in the newsletter is relevant to them. And now it's about 23,000 subscribers.

How do you source your data sets?

It's a mix of Google Scholar alerts on keywords about data sets, subscribing to other data related newsletters, subscribing to scientific journals that focus on data, like Scientific Data, a bunch of blogs and RSS feeds, a few carefully tailored Twitter searches. Then a lot of people, thankfully, reaching out to me directly. That's always exciting—to hear about something that I would have had virtually no way of discovering if it were not for a reader alerting it to my attention.

There's also a more amorphous way of sourcing. As I'm reading the news, I might wonder to myself, what is the data set behind that? If I see an article that talks about the number of Christmas trees being harvested in each state, I try to figure out the source. Sometimes a news article will say what the sources are, other times it might take some creative Googling, and trying to imagine where the data lives.

Looking ahead, what are you anticipating or planning for 2021?

That's a great and very daunting question. I've never been a great long-term planner. A lot of the work I do is just trying to understand right now.

I never had a plan for Data is Plural to last a certain amount of time, but every week that I come back to it still feels like a valuable service and a valuable thing to be doing.

I think we've all reached for some semblance of normalcy this year, and it has been nice to have something to return to every week that feels productive, and helpful at a time when it feels like lots of things are bewildering and uncertain. What that means for next year, I don't really know.

There's a new administration coming in, and a whole new set of topics to report on. That's always an exciting time for journalists, no matter the incoming administration—a whole new set of people in power, a whole new set of priorities that are worth scrutinizing, and digging into, and holding accountable.

Data will always be a component of the investigative work that I do, and that my colleagues do. But it's really hard to predict what data will be useful. The changing of an administration often means that agencies change their priorities in the data they're collecting and publishing. I would be interested if there are new data sets, or data sets newly available, that enable a sort of reporting that we couldn't do before.

This was a conversation between Sherrell Dorsey and Jeremy Singer-Vine, edited for length and clarity by Annaliese Griffin and Rachel Jepsen.