WAS to Archive-It Metadata Migration - [email protected]

WAS to Archive-It Metadata Migration - Wiki@UCSF

WAS to Archive-It Metadata Migration March 11, 2015 WAS -> Archive-It

WAS Project/Archive 3 levels of hierarchy Project

Site (can contain 1 or more Seed URLs) Seed URL Archive-It Collection

2 levels of hierarchy Collection Seed URL

2 Seed URLs per Site 1 Seed URL per Site 1 Seed URL per Site

1 Seed URL per Site 2 Seed URLs per Site

Multiple seeds flattens out; each Seed URL gets all the Site Metadata BEFORE starting, you should Delete sites (seeds) that you have never

captured or you captured, but you deleted all the captures. Probably sitting under never captured or inactive sites How to move

Move project (collection) by project (collection). When you sit down, start and finish the move of a project. You dont have to do all projects/collections in

one day Run two reports (Administration > Project Admin) 1. Click Archive-It Seed Export > Export Seeds 2. Click Archive-It Seed Metadata Export > export

metadata Coming Soon in your accounts Export Seeds

Seeds export from WAS It is in .txt format, open it with notepad Your seeds will be segmented by crawl frequency.

E.g., Seeds with custom schedule of 1x per year You will copy and paste URLS from the .txt document and upload them in chunks by frequency

Example text file Consult the WAS- Archive-it mapping document to decide on the equivalent frequency

https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArchive-It.pdf?version=1&modificationDate=1422304077000&api=v2 Create a Collection (project) in Archive-It Create Collection (aka Project)

Select frequency: for now leave at OneTime, click next Enter Collection level metadata. This metadata displays in the public site. You can go back and

fully enter this later Topics will appear in public site (along with any Subjects you have)

Example display In order to create a collection, you must upload seeds.

If you have Historical seeds, Upload those FIRST(!) Historical sites/seeds are seeds where the seed URL has changed over the life of the captures. They will be at the top of your seeds .txt document

Do these first because it is easiest to do a bulk edit and select deactivate Example seeds list with Historical Seeds

Copy and paste seeds from .txt fie into box. Leave Default selected > Next VERY important: 1. Ignore this error for ALL your seed uploads.

2. URL is correct; use as is MUST be checked regardless of the error you see. If it is not selected for any seeds, go thru now and change it for all instances.

Another example, click: URL is correct; use as if for all Collection created

Bulk Edit Historical Seeds (where applicable) Under Seed Management click All

Click top box to select all. Note: you will select all for what is displayed, if there are more than 400 items, they are on another page. You will have to repeat Click bulk edit

Choose Deactivate Go back to bulk edit > Add Metadata Suggestion: add a Notes field if you dont

already have one, where you note that these are historical seeds. Most likely will never want to crawl these again so you may want to keep track

Add a custom field Go back Collection management and repeat for the next frequency in your seed list

Back to Seeds .txt file Leave as one-time they will not crawl until you say crawl now

Copy and paste seeds into box. Leave Default selected > Next For this case, choose Quarterly

Import metadata Click ALL seeds > Import metadata Upload the metadata file > Upload File (leave default setting)

You could stop here and do the clean up at a later day Metadata cleanup

If there is a WAS field that is not in Archive-it, on import Archive-it creates a custom field. All fields will display in the public interface by default The following fields may be in your upload,

but they should ALL be made private: Note, Scope, Robots honored, Max crawl seconds, Capture frequency, Seed type, Site ID How to make fields private in Archive-it:

1. Go to Admin (link in the upper right corner) 2. Account Settings 3. In the text box toward the bottom of the page called 'Private Metadata Fields' enter all these fields: Note , Scope, Robots honored, Max crawl

seconds, Capture frequency, Seed type, Site ID 4. NB: Enter each field name on a separate line, in all lower case letters. Scope > Seed Type

What about Directory only? What about Page only? NB. Archive-it offers a lot of additional scoping options for crawls. View: Help Documentation (linked top, right of collection page)

Directory is not a separate scoping option in Archive-it ( it is handled through slash - /) NO action need by you, except to QA WAS Directory crawls

Rosalie.com/presentations We will add the ending slash for you if you didnt Rosalie.com/presentations/ It moves over as is

Rosalie.com/presentations.html It will crawl as host What about page only crawls?

For Page only you will have to manually go back and change crawl scope (seed type) You can find these by opening the metadata export. It is in .ods format, which you can open in Google docs, with most versions of excel or download open office.

Do NOT edit the .ods file before doing the metadata upload; make a copy. Then sort scope column to find the relevant URLs How to change it: Page: click on Settings > Crawl one page only (can also be bulk

edited) Change Frequency under Settings > Seed Type

When will my crawls start? When you start them. When do I shut off WAS crawls? FIRST set up your crawls in Archive-It

Make sure daily crawls are running Then you can stop your WAS crawls VERY important: Do NOT make any edits to WAS data, crawls, ANYTHING once you have moved a project to Archive-It!

Batch shut off crawling in WAS Sites > Manage Sites > all > select all > Reschedule Selected

Select off and click Reschedule Send CDL your info

After you have created all your collections, 1. Send Rosalie this info for each collection a) Collectionid b) Accountid

AND 2. Add Rosalie as a user to your account (for now) CollectionId and AccountId in URL

Wheres my data? Archive-It will work with CDL staff to move over your data. Timeline: May/June 2015

Resources WAS Archive-It Migration wiki: https://wiki.library.ucsf.edu/display/UCLCKG/WAS+-%3E+Archive-it+Migration Mapping of terms and metadata: WAS - Archive-It:

https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArc hive-It.pdf?version=1&modificationDate=1422304077000&api=v2

Recently Viewed Presentations

  • Apresentação do PowerPoint - lionslideranca.org.br

    Apresentação do PowerPoint - lionslideranca.org.br

    Como líderes, hoje somos encarregados de manter nossos distritos fortes e vitais. Mas também temos uma oportunidade de garantir que eles estejam posicionados para o sucesso amanhã.
  • Leprosy - Penn State York

    Leprosy - Penn State York

    Research. Attempt to identify new drugs that can stop the neural damage caused by the bacteria. Bacteria needs to recognize certain type glycoprotein on the cell surface to bind with and subsequently enter the cell
  • 11 Vocabulary Unit 8 Ambulatory - Visionary Ambulatory

    11 Vocabulary Unit 8 Ambulatory - Visionary Ambulatory

    Impartial. Definition: unbiased, neutral, not favoring one side or the other. Part of speech: adjective. Sample Sentence: Jurors are supposed to be impartial; they aren't supposed to make up their minds until they've heard all the evidence
  • Ib Psychology Sl

    Ib Psychology Sl

    Use the provided grid to take notes. This is an example of event sampling, you are only to make notes when someone enters into the conversation . ... Which ethical protocols were broken in this experiment? Of the ethical protocols...
  • www.novamil.org


    Entry #441: Starter 6th April 201830 Days Until Final Exams Week. NOTEBOOK QUIZ #1: Entry 382: What is the date for Entry 382? Entry 395: What was the title of the set of Cornell
  • Bell Work

    Bell Work

    The word tectoniccomes from the Greek word tektonikos meaning "builder." How do you think this meaning is appropriate for tectonic plates? The Earth's crust solidified billions of years ago.
  • CAP Observe Course slides

    CAP Observe Course slides

    In other words, at 8,000 feet, DME ground range will be very different from the slant range below 8 miles. DME can be co-located with a VOR. (VOR/DME -- VORTAC) TACAN - is a combination of DME and azimuth (azimuth...
  • Association „Club canoe kayak Levski"

    Association „Club canoe kayak Levski"

    According to the Orthodox calendar celebrates Yordanovden celebration baptism of Jesus by John the Baptist in the Jordan River. On this day, wherever there is water, the ritual is throwing a cross from the local church. After removal of the...