Conscorcio Madrono

Conscorcio Madrono

Introduction to Research Data Management Sarah Jones Digital Curation Centre, Glasgow [email protected] Twitter: @sjDCC Carpentry workshop - Open Research Data, 5 December 2016, Oslo What will we cover? 1. What is research data management? 2. Consideration and pointers when:

Creating data Managing data Sharing data 3. Useful tools and resources What is RDM? Image CC-BY-SA by Janneke Staaks www.flickr.com/photos/jannekestaaks/14411397343 What is Research Data Management? Create Preserve

Documen t Share Use Store the active management and appraisal of data over the lifecycle of scholarly and

scientific interest Data management is part of good research practice What is involved in RDM? Data Management Planning Data creation Annotating / documenting data Analysis, use, versioning Create

Storage and backup Publishing papers and data Preserve Document Share Use

Preparing for deposit Archiving and sharing Licensing Citing Store Why manage and share data? Direct benefits for you Research integrity

Potential to share data To make your research easier! To avoid accusations of fraud or bad science So others can reuse and build on your data Stop yourself drowning

in irrelevant stuff Evidence findings and enable validation of research methods To gain credit several studies have shown higher citation rates when data are shared Make sure you can

understand and reuse your data again later Advance your career data is growing in significance Meet codes of practice on research conduct Many research funders worldwide now require Data Management and Sharing Plans

For greater visibility, impact and new research collaborations Promote innovation and allow research in your field to advance faster What if this was your laptop? Why YOU need a Data Management Plan http://blogs.ch.cam.ac.uk/pmr/2011/

08/01/why-you-need-a-data-manage ment-plan Good data management is about making informed decisions http://xkcd.com/949 Creating data Image CC-SA-ND by Bill Dickinson www.flickr.com/photos/skynoir/8270436894 Data creation tips

Choose appropriate formats Adopt a file naming convention Create metadata and documentation as you go Ensure consent forms, licences and agreements dont restrict opportunities to share data Choose appropriate file formats Different formats are good for different things - open, lossless formats are more sustainable e.g. rtf, xml, tif, wav

- proprietary and/or compressed formats are less preservable but are often in widespread use e.g. doc, jpg, mp3 One format for analysis then convert to a standard format BioformatsConverter batch converts a variety of proprietary microscopy image formats to the Open Microscopy Environment format - OME-TIFF

Data centres may suggest preferred formats for deposit www.data-archive.ac.uk/create-manage/format/formats-table How will you name your files? Keep file and folder names short, but meaningful

Agree a method for versioning Include dates in a set format e.g. YYYYMMDD Avoid using non-alphanumeric characters in file names

Use hyphens or underscores not spaces e.g. day-sheet, day_sheet Order the elements in the most appropriate way to retrieve the record Example from ARM Climate Research Facility www.arm.gov/data/docs/plan www.jiscdigitalmedia.ac.uk/guide/choosing-a-file-name

What is metadata? Data about data Documentation and metadata Metadata Standardised Structured Machine and human readable

Metadata helps to cite & disambiguate data Documentation aids reuse Documentation Metadata Metadata standards These can be general such as Dublin Core Or discipline specific

Data Documentation Initiative (DDI) social science Ecological Metadata Language (EML) - ecology Flexible Image Transport System (FITS) astronomy Provided in catalogues to aid discoverability Structured so search engines can uncover it Exposed in machine-readable form e.g. XML Dublin Core metadata example

Creator: Donald Cooper Role=Photographer Subject: Shakespeare, William, 1564-1616, Antony and Cleopatra [LC] Description: Vanessa Redgrave as Cleopatra Date: 1973-08-09 Type: Image Format: JPEG Identifier:4150 [catalogue no] Source: negative no 235 Relation: Antony and Cleopatra: Thompson/738 IsPartOf

Coverage: Bankside Globe Role=Spatial Rights: Donald Cooper www.ahds.ac.uk/performingarts Use metadata standard Metadata Standards Directory Biosharing Broad, disciplinary listing of standards

and tools. Maintained by RDA group A portal of data standards, databases, and policies Focused on life, environmental and biomedical sciences http:// rd-alliance.github.io/ metadata-directo ry

https://biosharing.org Documentation Can others understand the data? Think about what is needed in order to find, evaluate, understand, and reuse the data. Have you documented what you did and how?

Did you develop code to run analyses? If so, this should be kept and shared too. Is it clear what each bit of your dataset means? Make sure the columns/rows are labelled, variable ranges defined, abbreviations explained in data dictionaries ReadMe files We recommend that a ReadMe be a plain text file containing the following: for each filename, a short description of what data it includes,

optionally describing the relationship to the tables, figures, or sections within the accompanying publication for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units any data processing steps, especially if not described in the publication, that may affect interpretation of results

a description of what associated datasets are stored elsewhere, if applicable whom to contact with questions Managing data Image tools CC-BY by zzpza www.flickr.com/photos/zzpza/3269784239

Legal and ethical issues Be aware of legislation that applies to you: Offentlighetsloven (FoI) EIR (Environmental Information Regulations) Data Protection Health Research Act Understand what this means in terms of how data are stored, transferred and shared Use appropriate services TSD provides a platform for researchers working at

UiO and in other public research institutions to collect, store and analyze sensitive research data. TSD complies with the directive of privacy and electronic communication in Norway. www.uio.no/english/services/it/research/ storage/sensitive-data/index.html Ask for consent for data sharing If not, data centres wont be able to accept the data regardless of any conditions on the original grant.

www.data-archive.ac.uk/create-manage/consent-ethics/consent?index=3 Where will you store the data? Your own device (laptop, flash drive, server etc.) And if you lose it? Or it breaks? Departmental drives or university servers Cloud storage Do they care as much about your data as you do? The decision will be based on how sensitive your data are, how robust you need the storage to be, and who needs access to the data and when

i ma CC ge b rro Mo n y

har yS r lick nF o w CC image by momboleum on Flickr One copy = risk of data loss

Who will do the backup? Use managed services where possible (e.g. University filestores rather than local or external hard drives), so backup is done automatically 3 2 1 backup! at least 3 copies of a file on at least 2 different media with at least 1 offsite Ask central IT team for advice

Backup and preservation not the same thing! Backups Used to take periodic snapshots of data in case the current version is destroyed or lost Backups are copies of files stored for short or near-long-term Often performed on a somewhat frequent schedule Archiving Used to preserve data for historical reference or potentially during disasters Archives are usually the final version, stored for long-term, and generally

not copied over Often performed at the end of a project or during major milestones How to keep you data secure? Develop a practical solution that fits your circumstances Store your data on managed servers Restrict access Keep anti-virus software up-to-date Encrypt mobile devices carrying sensitive information www.wsj.com/articles/SB10001424052748703843804575534122591921594

Data sharing Image CC-BY-NC-ND by talkingplant www.flickr.com/photos/talkingplant/2256485110 The data deluge is upon us Sensors ability to produce data outstrips ITs ability to process it Why not keep it all? Globally, data volumes are doubling every two years

John Gantz and David Reinsel 2011 Extracting Value from Chaos www.emc.com/digital_universe. Storage mgmt costs rise long-term Hardware costs decline, but power and staff costs keep rising David Rosenthal blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html The storage is cheap fallacy Decreasing hardware costs offset by exponential growth in data volume Backup and mirroring multiplies cost of

preserved data Discovery becomes harder as the chaff outweighs the wheat Curation of unused data is a waste of resources Select what to keep (and share) 1. What must be kept to manage compliance risk? 2. What data could be re-used? 3. What data has value and should be kept? 4. Given costs what will or wont be

kept? 5. How will it be kept and shared, on what terms? www.dcc.ac.uk/resources/how-guides/appraise-select-data Should all data be open? NO Many reasons most to do with human subjects But data existence should always be open Allows discovery & negotiation on use Avoids pointless replication

How to make data open? 1. Choose your dataset(s) What can you may open? You may need to revisit this step if you encounter problems later. 2. Apply an open license https://okfn.org Determine what IP exists. Apply a suitable licence e.g. CC-BY 3. Make the data available Provide the data in a suitable format. Use repositories.

4. Make it discoverable Post on the web, register in catalogues License research data openly This DCC guide outlines the pros and cons of each approach and gives practical advice on how to implement your licence CREATIVE COMMONS LIMITATIONS Horizon 2020 Open Access guidelines point to:

NC Non-Commercial What counts as commercial? ND No Derivatives Severely restricts use or These clauses are not open licenses www.dcc.ac.uk/resources/how-guides/license-research-data EUDAT licensing tool

Answer questions to determine which licence(s) are appropriate to use http://ufal.github.io/lindat-license-selector Deposit in a data repository The EC guidelines point to Re3data as one of the registries that can be searched to find a home for data http://databib.org http://service.re3data.org/search

www.fosteropenscience.eu /content/re3data-demo How to select a repository? Look for provision from your community, university, publisher, funder etc Check they match your particular data needs: e.g. formats accepted; mixture of Open and Restricted Access.

See if they provide guidance on how to cite the deposited data. Do they assign a persistent & globally unique identifier for sustainable citations and to links back to particular researchers and grants?

Look for certification as a Trustworthy Digital Repository with an explicit ambition to keep the data available in long term. www.openaire.eu/opendatapilot-repository Norwegian repository landscape http://www.nsd.uib.no https://archive.norstore.no Zenodo

Zenodo is a multi-disciplinary repository that can be used for the long-tail of research data An OpenAIRE-CERN joint effort Multidisciplinary repository accepting Multiple data types Publications Software

Assigns a Digital Object Identifier (DOI) Links funding, publications, data & software www.zenodo.org What are persistent identifiers? They are an alphanumeric code identifying a resource,

organisation or individual They must be Unique Persistent Ideally they should be actionable too How do persistent identifiers work Citing research data: why? http://ands.org.au/cite-data

How to cite data Key citation elements Author Publication date Title Location (= identifier) Funder (if applicable) www.dcc.ac.uk/resources/briefing-papers/introductioncuration/data-citation-and-linking Resources Image Energy Resources | Energie Quelle CC-BY-NC by K. H. Reichert www.flickr.com/photos/reupa/19502634575

Managing and sharing data: a best practice guide http://data-archive.ac.uk/media/2894/managingsharing.pdf Guidance and training resources ESIP Data Management Training clearing house for environmental sciences http://dmtclearinghouse.esipfed.org DataONE best practices

www.dataone.org/best-practices DCC resources http://www.dcc.ac.uk/resources FOSTER open science portal https://www.fosteropenscience.eu Acquire research data skills http://datalib.edina.ac.uk/mantra

Tools for managing data www.dcc.ac.uk/resources/external/tools-services/ managing-active-research-data Also look for national & local support! www.uio.no/english/foremployees/ s upport/research/research-data www.openaire.eu/ contact-noads

Finally Well-managed data makes your research easier, now and in future Well-managed data is easier to share, more likely to be re-used

Sharing data is good for you Its good for all of us It isnt as hard as you think were here to show you how!

How do you share data effectively? Use appropriate repositories, this catalogue is a good place to start http://www.re3data.org Document and describe it enough for others to understand, use and cite http://www.dcc.ac.uk/resources/how-guides/cite -datasets Licence it so others can reuse

Thanks for listening For DCC resources see: www.dcc.ac.uk/resources Follow DCC us on twitter: @digitalcuration and #ukdcc

Recently Viewed Presentations

  • Psychopathology & treatment

    Psychopathology & treatment

    Identifiable life crisis (marital breakdown, death in family). Living environment (e.g. bad neighborhood) Additional stresses associated with low socio-economic status. The Diasthesis-Stress Model. Person who has gene variant + no significant stresses in life. Low Risk ...
  • Mindfulness - Microsoft Azure

    Mindfulness - Microsoft Azure

    A multicentre study of physician mindfulness and health care quality. 45 clinicians (34 physicians, 8 nurse practitioners, and 3 physician assistants) caring for patients with HIV who completed the Mindful Attention Awareness Scale and 437 HIV-infected patients at 4 HIV...
  • Chapter 5: Switch Configuration CCNA Routing and Switching

    Chapter 5: Switch Configuration CCNA Routing and Switching

    Secure Shell (SSH) An alternative protocol to Telnet. Telnet uses unsecure plaintext of the username and password as well as the data transmitted. SSH is more secure because it provides an encrypted management connection. Secure Remote AccessSSH Operation.
  • Welcome to FIT100

    Welcome to FIT100

    Development Support. Campus Computer Lab: UW2-140. If you know what you are doing: Make sure your PC has a decent graphics card . Support DirectX 9.0 and above (< three years old)
  • 1.1 Safety in the Science Classroom

    1.1 Safety in the Science Classroom

    (c) McGraw Hill Ryerson 2007 Bohr Models Electrons appear in shells in a very predictable manner. There is a max. of 2 electrons in the first shell, 8 in the 2nd shell, and 8 in the 3rd shell. The period...
  • Future R&amp;D: beta-beam

    Future R&D: beta-beam

    Production ring with ionization cooling A new approach Beam cooling with ionisation losses - C. Rubbia, A Ferrari, Y. Kadi and V. Vlachoudis in NIM A, In press "Many other applications in a number of different fields may also take...
  • Introduction to the Australian Privacy Principles &amp; the OAIC ...

    Introduction to the Australian Privacy Principles & the OAIC ...

    APP Guidelines. APP 1 — Open and transparent management of personal information. ... Staff training and awareness — OAIC's . ten tips for protection customers' personal information. Robust IDR process. Data breach notification — OAIC's .
  • Kagan Refresher…

    Kagan Refresher…

    Kagan promotes… From Traditional to Cooperative Learning "A good class is a quiet class." > "Learning involves healthy noise." "Keep your eyes on your paper." > "Help your partner solve it."