Report on Big Data and Digital Scholarship

Capitalizing on Big Data: Toward a Policy Framework for Advancing Digital Scholarship in Canada

I spent today in an Ottawa conference room talking about data management plans for Canada’s digital scholars. It was hosted by the main federal granting agencies (SSHRC, NSERC, CIHR, and the CFI), collectively known as the TC3+. The TC3+ recently reported on the future of research data, including data stewardship and funding guidelines. Today’s conversation was based on that report, and on its 58 responses from universities, organizations, and individual researchers.

I’m a humanist, so my impressions of the main concerns are informed by my own expertise and datasets. 2014-02-05 13.56.02There was a lot of interest in the digital humanities today — but most of the expertise was in science and the libraries, fields where data has been a going concern for years, and where big data is now a big issue.

Humanists can contribute a lot to these conversations; for instance, our data is often constrained by copyright. But we also have to learn from our cross-campus colleagues — like how we can manage our data efficiently and responsively? What does a data management plan for humanities research even look like?

The first place to go with project-level questions like this is DevDH.org, which has a whole section on data management plans. In their seminar at the DHSI last summer, I learned these guidelines:

  • use open formats (e.g. XML)
  • use common repositories (e.g. GitHub)
  • document your protocols
  • backup, verify, and backup again
  • use stable data identifiers

Common sense, right? Actually, you’d be surprised how few people back up their data regularly. (Confession: I’m writing this on a new laptop I haven’t backed up in weeks.)

The principle is that your data should outlast your project, so others can fork, adapt, and repurpose it. No opaque protocols, broken links, or proprietary formats like Word or Quicktime. So those who come after you can smoothly extract your data, not pry it out of various systems. That’s a mantra of the data liberation movement, and I learned today that it doesn’t end with well-known models like Google Takeout; our own Statistics Canada is a leader on this.

Why worry about the afterlives of our research data? Because like archivists, researchers need to stay agnostic about how data will be used in the future. If you foreclose future uses of your data, you’re like an archivist shredding records. Yesterday’s data is today’s information is tomorrow’s knowledge.

The last item in that list — use stable data identifiers — is a bit obscure. It means maintaining pointers to stable data, so it’s interoperable. It means linked open data: using PURLs and URIs like DOIs. To throw another acronym at you, it means doing for all data what the TEI does for text.

So is a data management plan (DMP) always necessary? Maybe not. But increasing numbers of digital projects will need to develop them, or justify their absence. Their place in grant adjudications and even merit and tenure decisions will rise.

Finally, one refrain of today’s meeting is that Canada’s researchers will need to think more about project management. As I’ve written elsewhere, learning project management is like finding a flashlight in a dark room. Suddenly you can see the barriers in front of you, instead of stumbling over them. Some of this expertise will be in support roles (HR, libraries, IT), but researchers themselves will need to understand and document their data protocols, for instance.

What comes next? A MOOC in data management? More like a movement, to borrow a phrase from Cathy Davidson. Will Canada move to a model of open-access publishing for publicly-funded research, like the Open Science initiative of the White House’s OSTP? The future is open, so to speak.

14 Comments

Add Yours →

Michael … very cool. A good way to share. I too hope that there will be more discussions and that they will be as productive.

Cheers, Brian

Leave a Reply to SSHRC_CRSH Cancel reply