Data management is exciting!

Trust me.

No, this is a reflection of the level of enthusiasm we were asked to have as part of our data management subject at Uni this semester. Our first assignment was to write a press release explaining research data management to the general public in a way that wouldn’t send them straight to sleep. I chose to take a narrative approach and promised that if I got a good mark, I’d reproduce it here. If you work in data management, skip the next bit – but if you’re not in an academic or research library and you’re curious about what we are all talking about with data, you might like this.

Sydney is playing host this weekend to social science researchers from around the world as the inaugural Social Science Research Futures conference gets underway.

“Managing research data output will be a focus of the papers presented”, conference organiser Clare McKenzie said today. “Imagine the impact on your life if you lost your laptop with all the contacts, photos and other personal information in it. Now imagine you are a researcher on a project that has interviewed 500 homeless about their situation and that the laptop was storing all the responses to the questions.”

While such loss of data can be catastrophic to a project, managing research data is not just all about avoiding disaster. As many research projects are funded with public money, there has been a push in recent years to make the results of that research publicly available at the end of the project.

What exactly are research data? Broadly, they are the factual information collected and recorded during a research project in order to prove or disprove the original research question (Carlson 2011). The Australian public’s responses to the Australian Bureau of Statistics (ABS) census are data, as are the daily air temperature recordings a high school science student collects as part of a school project. The data are rarely meaningful without analysis, so the ABS puts the data together in combinations to look for trends and the high school student may graph the daily temperature to compare against the average for the time of year in order to draw conclusions.  All of this is research data.

Making arrangements for back up and proper storage of research data is just one aspect of data management and is part of what’s known as data management planning.  Jane Smith, a senior social sciences researcher at City University has developed a data management plan at the beginning of her last two research projects and likens it to the idea of business planning. “You don’t normally plan for your business to fail, but you can fail to plan for your business” she says. “Research projects are the same. If you don’t plan for the fact that someone may wish to access your data in twenty years when the technology is different and the original research team long dispersed, then all your hard work during the project can’t be shared or expanded.”

Researchers need to think about planning for storage, rights of use by others, naming the data in such a way that others can find it, putting details of the data in a repository where it can be found, as well as the possibility that files created today may become an obsolete format in the future (ICPSR 2012).  These details are known as metadata – literally “data about data” – and are a way of attaching useful information to an object such as a dataset.

When it comes to data management planning, it doesn’t matter whether the research is social sciences or the ‘hard sciences’. Both McKenzie and Smith advise that time spent creating a data management plan (DMP) at the start of a research project can save a lot of time further down the track, particularly if the project is large and collaborative with many individual researchers. Establishing file formats and file naming conventions such as the complex file naming system the ABS use (Australian Bureau of Statistics 2009) ensures consistency and accuracy of records no matter who is working on the project at the time. Smaller projects need not go to this level of complexity, but writing it all down in a DMP can help ensure these details are not forgotten or lost. In fact, some research funding bodies have made preparation and submission of a DMP a condition of applying for a grant (Van den Eynden et al 2011).

Sharing and re-use of data becomes easier if that data has been managed properly. Making data accessible to others or allowing re-use and re-purposing of that data later on for another project is part of making research more collaborative and reduces the chance that money will be wasted on ‘re-inventing the wheel’ (Van den Eynden et al 2011). It also may help establish trends, such as comparing the interviews with the homeless (from the lost laptop scenario above) to information collected again in five years time.

Smith comments that for one of her recent projects she was able to search Research Data Australia (RDA, an online catalogue of research datasets, to find details of a project from a number of years ago that had data relevant to her project. Through contact details in the RDA listing, Smith, in her words “got access to the most wonderful population data from five years ago that I was able to re-use in the context of my current research project”.

Like preparing a DMP, research funding bodies in Australia and overseas are beginning to make continuing access to research data a condition of the funding.

The future of publicly funded research in Australia is going to depend on good planning.

I enjoyed the subject, it was serendipitous timing with my secondment to Library Repository Services and like all my uni subjects, I’m now glad it’s over.


3 thoughts on “Data management is exciting!

