Preservation and Curation of ETD Research Data and Complex ...

Preservation and Curation of ETD Research Data and Complex ...

Preservation and Curation of ETD Research Data and Complex Digital Objects DATA ORGANIZATION Welcome and Workshop Background Instructors o Gabrielle V. Michalek, Director of Connected Scholarship Purpose o Provide you with resources and tools to help you address the challenges and opportunities data organization methods pose and provide for you as a researcher, particularly regarding your research outputs. Learning Objectives: Students Understand options for data management and data organization. Gain exposure to techniques and

resources you may use to ensure your data will be readable and understandable in the future. Understand where to look for field-specific analysis methods, services, tools, and repositories. Workshop and Guidance Briefs - Topics Copyright Data Organization File Formats Metadata Storage Version Control https://educopia.org/deliverables/etdplus-guidance-briefs

Key Takeaway The decisions you make about how you organize and structure your data today will have implications for how you and others can access and make use (or sense!) of that data in the future. Why is data hard to deal with? Data without data documentation (e.g., a data dictionary) is often impossible to understand. Without access to specific (often expensive) software, a data file may be unable to be viewed or used. IRB and funder requirements may impact the way you need to structure your data. As data usage increases, data often needs to be interoperable in order to enable sharing and reuse. Questions to askrepeatedly! What are the data organization standards for your field? What are the data export options in the

software you are using? What forms of the data will be needed for future access? Structuring your data well enables you to: Reproduce results Reuse it in the future Share it with others Gain and retain credibility Comply with IRB/funder requirements Providing Context for Your Data Document o o o

The datas purpose A list of the files in your data package Data dictionary listing and describing all variables Data Organization Principles Use one variable per column. Make one observation per row. Use human-readable column names. Include one table per tab. Indicate relationships between tables using a key. Movie Title Director Distributor

Running Time Budget Released Peter Pan Herbert Brenon Paramount Pictures 105 minutes 40,030 Dec 29 1924 Girl Shy Fred C. Newmeyer and Sam Taylor

Pathe Exchange 82 minutes 400,000 Apr 20 1924 Greed Eric Von Stroheim Metro-Goldwyn-Mayer 140 minutes 665,603 Dec 4 1924 Additional Principles Do:

Consider what your NULL values are and how they are represented Consider what contextual documentation is required Use standard data representations (e.g., (YYYYMMDD for dates) Do Not: Use formatting to convey information Place comments in cells Use special characters in field names Use blank spaces or symbols in column names

Discipline-based data repositories: Social Sciences: ICPSR http://www.icpsr.umich.edu/icpsrweb/deposit/index.jsp Genomics: GenBank https://www.ncbi.nlm.nih.gov/genbank/ Earth Sciences: NASAs Earthdata https://earthdata.nasa.gov/ Archaeology: tDAR http://www.tdar.org/ Oceanography: NODC http://www.nodc.noaa.gov/ BioSciences: Dryad https://datadryad.org/ Carnegie Mellon data repository and Tools: Kilthub University Institutional Repository https://kilthub.figshare.com/ Open Science Framework - free and open source tool that can be used for managing projects and collaborations in any discipline https://www.library.cmu.edu/about/publications/ne ws/open-science-framework Version control is all about PROCESS.

Version Control OK image1_v1.jpg image1_v2.jpg image2_v1.jpg image2_v2.jpg ... OOPS Better image1_v1.jpg image1_v10.jpg image1_v2.jpg ... image1_20151021 image1_20151214 image1_20160123 ...

Version Control Collaborative Documents dataset1_20160402_KES dataset1_20160301_WTC dataset1_20160814_GSC Resources MATRIX at Michigan State University gives file naming advice: http://ohda.matrix.msu.edu/2012/08/file-naming-in-the-digi tal-age Udacity offers a free online course on using Git and GitHub: https://www.udacity.com/course/how-to-use-git-and-github-ud775 Hello World offers another helpful GitHub guide: https://guides.github.com/activities/hello-world/ Version Control with Subversion is a free book authored by

Subversion software developers: http://svnbook.red-bean.com/ Data Organization Structuring your data well enables you to: Reproduce results Reuse it in the future Share it with others Gain and retain credibility Comply with IRB/funder requirements Whether your data is organized in lists,

The decisions you make about how you organize and structure your data today will have implications for how you and others can access and make use (or sense!) of that data in the future. Context and Data Documentation: Include the following in a readme text file: 1. The datas purpose 2. A list of the files in your data package 3. Data dictionary listing and describing all variables Data Organization Principles: 4. Use one variable per column 5. Make one observation per row 6. Use human-readable column name 7. Include one table per tab 8. Include an ID or key to indicate any relationship between tables arrays, hash sets, dictionaries, queues, trees, heaps, or relational databases, it is Do:

Consider what your NULL values are and how they are represented Consider what data documentation is required Use standard data representations (e.g., (YYYYMMDD for dates) important to be aware of disciplinary norms, as well as both institutional and funder requirements, that will make its deposit, storage, and long-term support more likely. Increasingly, the path for long-term support Do Not: Use formatting to convey information Place comments in cells Use special characters in field names Use blank spaces or symbols in column names

involves taking steps to make sure your data is deposited alongside data collected by others in your field or discipline. Questions to consider for any data project: 1. What are the data organization standards for your field? 2. What are the data export options for your software? 3. What forms of the data will be needed for future access? Discipline-based data repository examples: -Social Sciences: ICPSR -Genomics: GenBank -Earth Sciences: NASAs Earthdata

-Archaeology: tDAR -Oceanography: NODC -BioSciences: Dryad The DataONE Best Practices database provides individuals with recommendations on how to effectively work with their data through all stages of the data lifecycle. https://www.dataone.org/best-practices Source - Guidance Briefs: Managing Your ETD Research Files Version Control Version Control: The process of managing changes to your files over time (aka, revision control or source control) Manual Version Control A simple method to store the current revision is at the end of the file name. This way, files can be grouped by their names and sorted by version number: filename-v01.jpg filename-v02.jpg

You can also use dates to designate version numbers, using year-month-day (20150930) to help your computer sort versions in chronological order: filename-20160402.jpg filename-20160407.jpg If the files you are using are created or edited collaboratively, incorporate names or initials so you know who updated which version: filename-20160402-KES.jpg filename-20160407-WTC.jpg Software-Assisted Version Control There are also software tools that can help you version your content. These tools store your content in such a way that they can remember its state from revision to revision. Usually, they also allow you to check in and check out

your content, ensuring that revisions never happen simultaneously in two different locations (e.g., if collaborating researchers both attempt to revise the same file at the same time, or a researcher unwittingly tries to revise the same file on two different machines). Key differences between these software-assisted methods and the manual methods include: 1. 2. You can only view and edit the working version of a file When you change a file, you can save a revision and attach a short summary of your changes. Research is active and iterative. You will edit and re-edit your research materials many times before finishing your thesis or dissertation. How will you know that you are working with the

most current revision of your materials? Resources (For more information) The digital humanities center MATRIX (Michigan State University) provides advice on how to structure file names based on oral history projects that is broadly applicable: http://ohda.matrix.msu.edu/2012/08/fil e-naming-in-the-digital-age Udacity offers a free online course on how to use Git and GitHub with interactive exercises to familiarize you with using the tools. https://www.udacity.com/course/how-t o-use-git-and-github--ud775 Another helpful GitHub guide is available from Hello World.

https://guides.github.com/activities/hell o-world/ The Subversion community provides free access to the book Version Control with Subversion: http://svnbook.red-bean.com/ Source - Guidance Briefs: Managing Your ETD Research Files Activity Choose one spreadsheet you are using for a current data-gathering project. o Use the Data Organization Principles and check to see if your file meets those requirements. o Create a data dictionary for the spreadsheet that

describes the meaning of each column header. ?

Recently Viewed Presentations

  • Hechos - Iglesia Biblica Bautista

    Hechos - Iglesia Biblica Bautista

    Selección Múltiple 37. En su segundo viaje misionero, Pablo emprendió su viaje acompañado por: a. Bernabe b. Marcos c. Lucas d. Silas III. Selección Múltiple 38. Lydia, la vendedora de púrpura: a. se encontraba en un lugar de oración cuando...
  • Minnesota Driver's Manual - Chapter 2

    Minnesota Driver's Manual - Chapter 2

    Minnesota Driver's Manual - Chapter 2 ... MUST have two white headlights that work on HIGH and LOW beam And, red taillights And, red brake lights that come on when the brake pedal is pressed Headlights and Taillights High beam...
  • Data Driven Instruction and Decision Making Can We Get There?

    Data Driven Instruction and Decision Making Can We Get There?

    Developed by Dr. William Sanders for Tennessee in 1992 Largest data base on educational performance ever assembled PVAAS is not another test to administer by schools PVAAS is a way of looking at the test scores of tests already taken...
  • Order of the Arrow Unit Election - Boy Scouts of America

    Order of the Arrow Unit Election - Boy Scouts of America

    Order of the Arrow Unit Election. Scouting's National Honor Society. ... 5 days of which at a long-term Boy Scout summer camp. Gain the approval of your unit leader ... Voters are encouraged to write down the names of those...
  • Year 10 Options - ulidiacollege.com

    Year 10 Options - ulidiacollege.com

    Welcome to: Year 10 Options Wednesday 22 February 2017
  • Synchronous Motors - UMass D

    Synchronous Motors - UMass D

    Ef = ns fkf Equivalent Circuit of a Synchronous Motor Armature (One Phase) Phasor Diagram for one phase of a Synchronous Motor Armature Synchronous Generators Motor-to-Generator Transition Motor-to-Generator Transition (cont) Begin with motor driven from the infinite bus and the...
  • ACL Solutions for Continuous Auditing and Monitoring John

    ACL Solutions for Continuous Auditing and Monitoring John

    Continuous Auditing - ACL's Experience Continuum of Audit Analytics Continuous Auditing: Issues to Address Slide 6 Slide 7 Audit Analytics Repository Slide 9 ACL: Continuous Auditing and Continuous Monitoring ACL Continuous Controls Monitoring Technology Framework ACL CCM Product Suite ACL...
  • Project Condition

    Project Condition

    5 - La mauvaise foi : En ayant une posture humble, et en acceptant que l'on puisse se tromper, on gagne en sincérité et en force de conviction. 6 - Le mensonge : Si vous ne savez pas répondre, dîtes-le....