Video_findabledata

By Julien Colomb | September 18, 2018

Inside the eeFDM project, we are to produce videos for the promotion of RDM and open data toward researchers. Here is our first output, we are hoping for your feedback before producing the final tone. (So far, I have been doing the voiceover and we did not put any work in finding good music.)

The video is available on Figshare (https://doi.org/10.6084/m9.figshare.7163396), the tone is not the definitive one.

#give feedback: For RDM enthusiasts: Would you use it on your webpage and in your talks, if not, why ?

Please comment below via github issues, or email.

Text

Generic:

We did - waste time during data analysis

We did - lose data

We had - a paper rejected because we could not locate the raw data

We did - throw data away because we could not make sense of it

And We failed - to recognize open data as the new standard

But Never again !

We want Better Research In less time and this, while producing open, fair data and foster collaboration

That is the promise of… Data management

episode 1 (corrected for gender problems)

(dancing popeye) The researchers do love their data, and if the data is lost, their dance will stop.

While researchers can avoid the crash of complete data loss by doing backups, proper data data management goes beyond just preventing a crash.

(order-olive)

Data is accumulating fast. While packing their data for future analysis, reuse, and sharing, the researchers often lack a system. This strategy may work on the short term, but when the amount and complexity of datasets increase, the researchers put themselves in danger!

(popeye order made)

Because you do not want to spend days sorting out your data, you shall master data safety, organisation, naming, and computer readability. Then, the process will happen smoothly and efficiently.

solutions

To make your data findable, you need a backup system, 3 copies, 2 storage media and 1 offsite storage is a good start.

You also need a folder organisation and explain this organisation in a readme file. You need to name your files correctly, adopting conventions and not using special characters like spaces and points, do not be afraid of long names.

Use a master metadata spreadsheet indicating the path to all other important files.

This went too fast? Then get training and advice online or at your institution’s Research data management helpdesk.

In our next episode, we will see how research data management can fuel your data analysis. So stay tuned!

Paper bin (ideas and text not used)

1. Data folder organisation

In three years, you will probably need access to the data you are collecting now, and in three years, you will have produced a lot more data: it might be difficult to find your files again. Diverse solution need to be implemented to facilitate the task. Using a heirarchy of folders to organsise your files is one of them. For instance, you can use one folder per project, as well as subfolders for raw data, secondary data, analyses and reports. In any case, writing a readme file explaining how the data is organised makes it easier and faster to understand where is which version of the data and analysis.

2. Data file naming

File names should be as short as possible, as long as necessary, i.e. they should be self explanatory. Never use special characters like spaces or points. Use conventions for naming your files, and make sure you and your co-workers are using the same ones. Pro tip: If you want to use dates, use the standard yyyymmdd, such that sorting files by their names will correspond to their chronology.

3. Data index and metadata

Conventions are good, but they are nearly always failing. A practical way to go around the problem is to create an index of files, where metadata information can be added and which can be read by a computer. Small programs will be able to reach to the data only by reading the index and operations like concatenating tabular data, joining tables, analysing data and producing figures (for the production of manuscript, presentation and poster) can then be automated. You can also make an index of projects, which would link to these data indexes, such that one single file will comprehend links to all your outputs.

4. Backup plans

Of course all these efforts will be in vain, if the data is lost completely: indeed computer are not to be trusted, they can break, they can be stolen, they can burn or be drown. You therefore should make sure that if anything happens, your data will still be safe. The basic rule of 3 copies on at least 2 different locations can save a PhD.

comments powered by Disqus