The added value and simplicity of data management: the how, the why and what sound data management can bring to your lab
Making it transparent, reproducible, re-usable – the benefits of good data management are manifold. In Monique den Boer’s lab at the Princess Máxima Center for pediatric oncology, the structured practices of data management are now part of the lab’s culture. This has brought not only a wealth of data that everyone can use, but also various international collaborations and publications. We spoke to Judith Boer, long-time data steward within the lab, about the easy but efficient structure she devised and implemented, and the challenges and long-term benefits of storing data well.
The change started more than five years ago, looking for a structured way to store the ever-growing amount of data the lab was producing. “We wanted to make sure that when people left the lab, others could still find and understand their data. In the beginning we only requested them to make folders for published papers, so that others can see how they went from raw data to processed data and the actual figures or tables in the paper” says Judith Boer.
But that was just the beginning. “We further figured that we have many data collections that are continuously being used, and a lot of primary patient materials. We had a collection of over 1.000 patients with different kinds of molecular data – microarray, sequencing, proteome data. And we understood that for example a PhD student generally contributes by running a few hundred samples, so we always make sure we can combine them in one cohort by using the same methods and reference samples. When new people come to our lab they can use data that has been collected. At the same time, when they leave, their own data is also collected” she explains.
It all works like a snowball and for this, it is important to have consistent techniques of data storage implemented, for everyone to know where they can find it, what version it is and how it is being pre-processed. “This is when we started with the idea to link the experiment you write in your lab notebook - or more recently type in your e-journal – with the generated data stored on a network drive. The notebook experiment name corresponds with the name of the data folder, and in addition, a ‘data storage box’ in the experiment has a link to the data folder. It is very simple, no technology, no science, but it works. Now everyone does it. You can just point to the folder location and everyone reading the experiment can go back to the data, so there is a link between how the data was made and the data itself. This is the most important thing. Everyone can go look at other people’s folders and find and copy the data and use it for themselves” explains Boer.
According to her, achieving structure requires that the Principal Investigator sees the value of it. “We have this in Monique, our group leader” she says. “She really stands for high quality data that you know you can trust. And the only way that she can guarantee that you can trust the data is by implementing such a system. If the PI doesn’t want to put time in it, no one in the lab will be motivated to do it. In our group doing this is already part of the culture. And it takes a couple of years to come to this” she adds.
The system she devised and implemented within their lab is simple and efficient but she rather not take credit for it. “We had a technician who had worked with a similar system in another lab. It’s the whole idea of linking your data to your experiment. Otherwise, people call their projects names that are hard to guess, they have places on their network folders that are difficult to find and so on. Ours is just a structured way to store data and it makes life easier. Also archiving when one leaves is much easier. I developed it within our lab, but I cannot say we totally developed it ourselves. We mainly saw the value of it and implemented it” she says. “What is really important after implementation is teaching new group members and monitoring to keep up the good practice” she adds.
During Oncode’s Annual Meeting in June, Boer presented this approach to data management by drawing a parallel with the happiness hypothesis defined by Sonja Lyubomirsky, a professor of positive psychology at the University of California. “I found it interesting that my presentation then appeared on LinkedIn with the idea that research data management makes you happy. Well, that is not exactly what I said” she laughs. “But perhaps it can be so”.
It all started with a pep talk for the team in the lab, but what came out is a full analogy with our ability to influence happiness, or in this case good data management within a lab. The happiness hypothesis states that 50% of the variability in happiness between people is genetic, 10% is due to circumstances and 40% depends on what we think and what we do. This 40% is something we can influence. When it comes to archiving, “probably 50% of your tendency/ability to archive is genetic” says Boer half-jokingly. “10% is circumstances (e-lab notebook, IT) and 40% is what we think and what we do, therefore it is something we can change. Guidelines, monitoring, and understanding the benefits of such practices make the lab culture. And the culture of our lab is to try to come to good archiving” she adds.
At the Princess Máxima Center, some of the practices have become centralized in the last years. “Now we have a good data management structure at the Máxima, and that really helps. A lot of things I used to tell people in their introduction, now is centrally explained. But the specifics are still up to each lab. What is centrally explained are the network shares type, who has access, how to set those up. Where to put your raw data and shared data. Where you put data to keep private. No patient identifiers, so the data integrity part”.
But the rest depends on each lab. “Building a culture of good data management starts with understanding that you have people with different levels of being organized, and you need guidelines to help bring all of them to an agreed level of archiving, while also monitoring them over time to see if they keep using those. And the benefits are manifold: transparent, reproducible, re-usable, collaboration. The result is a large data collection that can be used either in the country or internationally” she explains.
The reticence of some towards implementing a structured data management mostly links to fear of time spent on it but the benefits clearly outweigh the effort. “If you ask people in our lab what they think of this, well, it is a little bit more work, but it is part of the job”.
By now, everyone in the group can instruct a new lab member on how the lab is working and storing the data. They also have monthly lab discussions directly from the lab notebook. “You don’t do a presentation; you just open an experiment. And then people switch and look at someone else’s experiment and see if they can follow it or have any tips to improve it. This keeps the system alive” says Boer. They also practice what they call an internal audit: asking a group member to reproduce a figure from a paper about to be published.
In a nutshell, the secret of good data management in a lab is simple and easily achievable, and the benefits are clear. The PI sets the tone creating a group culture. A bottom-up practical approach means group effort, and so everyone is involved. The data steward ensures the structure. “It’s an experience that in practice this works and helps. Rare disease in an international setting is our type of work. For example, childhood ALL (acute lymphoblastic leukaemia) is quite a rare diagnosis and there are all sorts of subtypes that make an even smaller group. It is very helpful to share our data with different countries to be able to say something about a subtype of the disease that is present in only 1% of the cases. It has brought us many international collaborations and papers by contributing our sound data. And when this practice becomes culture, the data get wings” adds Boer.
If you want to know more about data management or need advice getting started, you can contact Inga Tharun, programme manager Open Science and FAIR data.