Preservation and Curation of ETD Research Data and Complex ...

Preservation and Curation of ETD Research Data and Complex ...

Preservation and Curation of ETD Research Data and Complex Digital Objects DATA ORGANIZATION Welcome and Workshop Background Instructors o Gabrielle V. Michalek, Director of Connected Scholarship Purpose o Provide you with resources and tools to help you address the challenges and opportunities data organization methods pose and provide for you as a researcher, particularly regarding your research outputs. Learning Objectives: Students Understand options for data management and data organization. Gain exposure to techniques and

resources you may use to ensure your data will be readable and understandable in the future. Understand where to look for field-specific analysis methods, services, tools, and repositories. Workshop and Guidance Briefs - Topics Copyright Data Organization File Formats Metadata Storage Version Control https://educopia.org/deliverables/etdplus-guidance-briefs

Key Takeaway The decisions you make about how you organize and structure your data today will have implications for how you and others can access and make use (or sense!) of that data in the future. Why is data hard to deal with? Data without data documentation (e.g., a data dictionary) is often impossible to understand. Without access to specific (often expensive) software, a data file may be unable to be viewed or used. IRB and funder requirements may impact the way you need to structure your data. As data usage increases, data often needs to be interoperable in order to enable sharing and reuse. Questions to askrepeatedly! What are the data organization standards for your field? What are the data export options in the

software you are using? What forms of the data will be needed for future access? Structuring your data well enables you to: Reproduce results Reuse it in the future Share it with others Gain and retain credibility Comply with IRB/funder requirements Providing Context for Your Data Document o o o

The datas purpose A list of the files in your data package Data dictionary listing and describing all variables Data Organization Principles Use one variable per column. Make one observation per row. Use human-readable column names. Include one table per tab. Indicate relationships between tables using a key. Movie Title Director Distributor

Running Time Budget Released Peter Pan Herbert Brenon Paramount Pictures 105 minutes 40,030 Dec 29 1924 Girl Shy Fred C. Newmeyer and Sam Taylor

Pathe Exchange 82 minutes 400,000 Apr 20 1924 Greed Eric Von Stroheim Metro-Goldwyn-Mayer 140 minutes 665,603 Dec 4 1924 Additional Principles Do:

Consider what your NULL values are and how they are represented Consider what contextual documentation is required Use standard data representations (e.g., (YYYYMMDD for dates) Do Not: Use formatting to convey information Place comments in cells Use special characters in field names Use blank spaces or symbols in column names

Discipline-based data repositories: Social Sciences: ICPSR http://www.icpsr.umich.edu/icpsrweb/deposit/index.jsp Genomics: GenBank https://www.ncbi.nlm.nih.gov/genbank/ Earth Sciences: NASAs Earthdata https://earthdata.nasa.gov/ Archaeology: tDAR http://www.tdar.org/ Oceanography: NODC http://www.nodc.noaa.gov/ BioSciences: Dryad https://datadryad.org/ Carnegie Mellon data repository and Tools: Kilthub University Institutional Repository https://kilthub.figshare.com/ Open Science Framework - free and open source tool that can be used for managing projects and collaborations in any discipline https://www.library.cmu.edu/about/publications/ne ws/open-science-framework Version control is all about PROCESS.

Version Control OK image1_v1.jpg image1_v2.jpg image2_v1.jpg image2_v2.jpg ... OOPS Better image1_v1.jpg image1_v10.jpg image1_v2.jpg ... image1_20151021 image1_20151214 image1_20160123 ...

Version Control Collaborative Documents dataset1_20160402_KES dataset1_20160301_WTC dataset1_20160814_GSC Resources MATRIX at Michigan State University gives file naming advice: http://ohda.matrix.msu.edu/2012/08/file-naming-in-the-digi tal-age Udacity offers a free online course on using Git and GitHub: https://www.udacity.com/course/how-to-use-git-and-github-ud775 Hello World offers another helpful GitHub guide: https://guides.github.com/activities/hello-world/ Version Control with Subversion is a free book authored by

Subversion software developers: http://svnbook.red-bean.com/ Data Organization Structuring your data well enables you to: Reproduce results Reuse it in the future Share it with others Gain and retain credibility Comply with IRB/funder requirements Whether your data is organized in lists,

The decisions you make about how you organize and structure your data today will have implications for how you and others can access and make use (or sense!) of that data in the future. Context and Data Documentation: Include the following in a readme text file: 1. The datas purpose 2. A list of the files in your data package 3. Data dictionary listing and describing all variables Data Organization Principles: 4. Use one variable per column 5. Make one observation per row 6. Use human-readable column name 7. Include one table per tab 8. Include an ID or key to indicate any relationship between tables arrays, hash sets, dictionaries, queues, trees, heaps, or relational databases, it is Do:

Consider what your NULL values are and how they are represented Consider what data documentation is required Use standard data representations (e.g., (YYYYMMDD for dates) important to be aware of disciplinary norms, as well as both institutional and funder requirements, that will make its deposit, storage, and long-term support more likely. Increasingly, the path for long-term support Do Not: Use formatting to convey information Place comments in cells Use special characters in field names Use blank spaces or symbols in column names

involves taking steps to make sure your data is deposited alongside data collected by others in your field or discipline. Questions to consider for any data project: 1. What are the data organization standards for your field? 2. What are the data export options for your software? 3. What forms of the data will be needed for future access? Discipline-based data repository examples: -Social Sciences: ICPSR -Genomics: GenBank -Earth Sciences: NASAs Earthdata

-Archaeology: tDAR -Oceanography: NODC -BioSciences: Dryad The DataONE Best Practices database provides individuals with recommendations on how to effectively work with their data through all stages of the data lifecycle. https://www.dataone.org/best-practices Source - Guidance Briefs: Managing Your ETD Research Files Version Control Version Control: The process of managing changes to your files over time (aka, revision control or source control) Manual Version Control A simple method to store the current revision is at the end of the file name. This way, files can be grouped by their names and sorted by version number: filename-v01.jpg filename-v02.jpg

You can also use dates to designate version numbers, using year-month-day (20150930) to help your computer sort versions in chronological order: filename-20160402.jpg filename-20160407.jpg If the files you are using are created or edited collaboratively, incorporate names or initials so you know who updated which version: filename-20160402-KES.jpg filename-20160407-WTC.jpg Software-Assisted Version Control There are also software tools that can help you version your content. These tools store your content in such a way that they can remember its state from revision to revision. Usually, they also allow you to check in and check out

your content, ensuring that revisions never happen simultaneously in two different locations (e.g., if collaborating researchers both attempt to revise the same file at the same time, or a researcher unwittingly tries to revise the same file on two different machines). Key differences between these software-assisted methods and the manual methods include: 1. 2. You can only view and edit the working version of a file When you change a file, you can save a revision and attach a short summary of your changes. Research is active and iterative. You will edit and re-edit your research materials many times before finishing your thesis or dissertation. How will you know that you are working with the

most current revision of your materials? Resources (For more information) The digital humanities center MATRIX (Michigan State University) provides advice on how to structure file names based on oral history projects that is broadly applicable: http://ohda.matrix.msu.edu/2012/08/fil e-naming-in-the-digital-age Udacity offers a free online course on how to use Git and GitHub with interactive exercises to familiarize you with using the tools. https://www.udacity.com/course/how-t o-use-git-and-github--ud775 Another helpful GitHub guide is available from Hello World.

https://guides.github.com/activities/hell o-world/ The Subversion community provides free access to the book Version Control with Subversion: http://svnbook.red-bean.com/ Source - Guidance Briefs: Managing Your ETD Research Files Activity Choose one spreadsheet you are using for a current data-gathering project. o Use the Data Organization Principles and check to see if your file meets those requirements. o Create a data dictionary for the spreadsheet that

describes the meaning of each column header. ?

Recently Viewed Presentations

  • Presentation Name

    Presentation Name

    Presentation on the topic Globalization Privatization Liberalization CONTENTS GLOBALIZATION CONCEPT TYPES ADVANTAGES AND DISADVANTAGES PRIVATIZATION CONCEPT AND TYPES ADVANTAGES AND DISADVANTAGES LIBERALIZATION CONCEPT ADVANTAGES IN CONTEXT OF NEPAL Concept of Globalization: Globalization is the process of integration between national economies...
  • Winlink

    Winlink

    Resources Needed for Winlink ExpressHF Winmor/ARDOP/VARA. Same computer and software requirements as V/UHF Packet. Winmor and ARDOP modems are included with Winlink Express. HFradiowithdata(sound)portand optionally computercontrol (CI-V, CAT, etc. for rig control). SignaLinkor similar soundcard interface, may be built-in on...
  • Solutions of the Schrdinger equation for the ground

    Solutions of the Schrdinger equation for the ground

    Introduction Schrödinger equation Problems with traditional methods Helium atom system Challenge of solving He with FEM Governing equations In Cartesian coordinates, the spin-independent, nonrelativistic Schrödinger equation for the two electrons in the helium atom is: In spherical coordinates: L is...
  • Why Should I Be Baptized? First Principles Baptism

    Why Should I Be Baptized? First Principles Baptism

    Baptism is an act of faith -1 Peter 3:21One must believe to be properly baptized (not infant baptism) We need the grace of God - Ephesians 2:8-9. Grace - God's part in saving us. Baptism alone does not save
  • Dialectical Journal - Ms. McKee

    Dialectical Journal - Ms. McKee

    Dialectical Journal . Kite runner. ... Hassan held the slingshot pointed directly at Assef's face. His hand trembled with the strain of the pulled elastic band and beads of sweat has erupted on his brow. ... Dialectical Journal Last modified...
  • Pop positivity, not pills. Rx Drug Abuse Pre-Survey

    Pop positivity, not pills. Rx Drug Abuse Pre-Survey

    Unlike underage binge drinking and marijuana use, misusing and sharing prescription medications is legal . Myths about Rx Drugs. Myth: Prescription drugs are safer to abuse than other drugs because they are prescribed by a doctor. Fact: When used as...
  • CPC seminar_strat meds

    CPC seminar_strat meds

    ' is a biological measurement made before treatment to indicate long-term outcome for patients either untreated or receiving standard treatment (Simon 2010). Associated with disease outcome. Centre for Biostatistics, The University of Manchester. 16th August 2012. Biomarkers and the evaluation...
  • Joanne Rowling prv spisovateka, ktor sa stala miliardrkou

    Joanne Rowling prv spisovateka, ktor sa stala miliardrkou

    Joanne Rowling prvá spisovateľka, ktorá sa stala miliardárkou Rodina a vzdelanie 31.júl 1965, Yate Ann a Peter → Joanne+Di St. Michael's Primary School Alfred Dunn = Albus Dumbledore Wyedean Comprehensive University of Exter Práca - Londýn, Manchaster Prvé manželstvo Portugalsko...