Interlinking and annotating (parts of) images

Interlinking and annotating (parts of) images

Heidelberg Research Architecture: Status and perspectives Workshop: LitLink: A Cue Card System in a Research Environment of Collaborative Work, Online Publishing and GIS, Heidelberg, February 25, 2010 Peter Gietz [email protected] Agenda Introduce the Cluster Challenges for the HRA HRA Databases and projects

How we integrate different databases and services Some remarks on possible LitLink integration The Cluster of Excellence "Asia and Europe in a Global Context" Part of the German Federal Excellence Initiative "to establish internationally visible, competitive research and training facilities" A cluster of over 60 interdisciplinary projects Analysing cultural interactions between Asia and Europe

Includes a virtual research infrastructure called Heidelberg Research Architecture (HRA) 3 The Cluster of Excellence Has 4 Research areas: Research Area A: Governance and Administration Research Area B: Public Spheres

Research Area C: Health and Environment Research Area D: Historicities and Heritage And as a 5th Area the HRA Scholars from different fields are involved: Sinology, Indology, History of Arts, East Asian Arts, Science of Religions, Archaeology, History, Assyrology, Medicine, Computer linguistics etc... 4 The Cluster of Excellence Has a dedicated agenda: Finding and analysing flows of concepts

That happen within shifting asymmetries Globalisation is not a new phenomenon, but has happened since the beginning of man kind Every thing is part of this global process People are a medium of such flows More information available at: 5 What are the technical requirements? Large amount of different databases already exist

Language resources Image resources Bibliographical resources Music, Films, etc. etc. Most Cluster project will have the need for such or new data bases 6

How to store concepts How to get all these data into one system? How can one-dimensional metadata be enhanced? How can flows be described? How can phenomena be linked with each other? How can the system answer intelligent questions? 7 First answers Different data bases can be integrated in a loosely coupled system

In a Service Oriented Architecture One central metadata base can be a central retrieval point One big full text index on cluster resources can be helpful New Semantic Web technologies might be even better for storing and finding conceptual flows 8 Aims of HRA Sustainable competitive advantages can only be

achieved by efficient utilisation of IT A common platform called Heidelberg Research Architecture (HRA) will be set up that is accessible to all participants and partner organisations For maximum efficiency and minimal expenditure, the Cluster will, wherever possible, work with systems already available at the University 9 HRA HRA consists of two main sections:

a database architecture that can be utilised for research projects such as the Translingual Concepts Dabatase (TCD) and the Transcultural Images Database (TID); the IT infrastructure required for a modern work environment, providing tools for close interaction within the Cluster's Research Areas and other scholarly publics. 10 Existing Infrastructure Relevant infrastructure already available at

the institutions participating in the Cluster the University Computing Centre the University Library the Universitys Interdisciplinary Centre for Scientific Computing 11 Existing Infrastructure

Relevant infrastructure already available: file- and directory-services storage and backup provisions eLearning platform a content management system

image database electronic publishing facilities a digitisation centre 12 Database Infrastructure both the analytical work on the databases and their use by external scholars require advanced search and information retrieval facilities which are capable of processing metadata tags to enable cooperative work of scholars in Asia and the West, both input and retrieval of database

content must be possible via web-interfaces. 13 Physical location of the HRA Databases The databases are hosted on servers of the Cluster located at the University Computing Centre with backup arrangements installed to prevent data loss. 14 Databases acquired for HRA

ProQuest Dissertation and Theses the Cluster has acquired access to ProQuest Dissertation and Theses. With more than 2.4 million entries the most comprehensive collection of academic dissertations and theses in the world Restricted Access (members of the University of Heidelberg) 15 Databases acquired for HRA

ARTstor a digital library areas of art, architecture, and archaeology of Europe, Asia, and America nearly one million images Restricted Access (members of Heidelberg University) 16 Databases acquired for HRA Index of Christian Art (Princeton Art Index) bibliographic references to more than 20.000 works of

art with over 60.000 digital images especially on medieval art emphasis on European art Restricted Access (members of the University of Heidelberg) 17 Databases acquired for HRA Yomiuri Shinbun Meiji and Taish Eras articles of the daily newspaper in original layout, from 1874 to 1926

Center for East Asian Studies Access requires password Kokka monthly periodical on art and architecture Since 1889 18 HRA Databases Within HRA a number of new databases have been set up They are working independant but are loosely coupled

Via Several Services That make use of the data That provide a single retrieval point for all the data Transcultural Image Database The TID makes use of the "Heidelberg Image Database" (HeidICON), hosted by the University Library. Currently 15 cluster projects are using the database for storing more than 45.000 images and their metadata in the system. More on this in a later presentation Translingual Concepts Database There are several strategies for storing concepts

In a first approach we developed a statements database, a resource where Cluster researchers can make statements about information objects (images, texts, bibliographic references, etc.) or about other statements. There have been experiments with ontology technologies (RDF and Topic maps) Ontology A multi dimensional system for relating information objects A classification system (like Dewie Decimal System) can be called a single dimensional metadata system

there is only one relation type: Is subclass of A Thesaurus (like Roget's Thesaurus) uses more dimensions: is part of semantic field, is connected with Other thesauri have even more, like synonyms, antonyms, subclass, etc. 22 Ontology An Ontology is all the above and much more We have a hierarchical class model (like in DDC) We have an unrestricted number of relation types (not

only the few of a thesaurus) So we can store classifications and thesauri in ontology stores The best way to formalize an ontology entry is: Subject predicate object Where subject and object are classes or class instances And predicate is a relation type RDF triples 23 What do we want to do with ontologies

Find integrated knowledge Produce new knowledge Provide evidence for new hypotheses Verify or challenge old hypotheses Topic Maps are being evaluated to model the Cluster, its Projects, People and research topics 24 Cluster Bibliographic Database Based on the open source software Refbase Includes besides bibliographic references the actual texts as PDF files (visible only after Login)

It provides the possibility to share bibliographical entries and the associated texts Has the usual features (citation and data export) Some fields are only for the single user so you can e.g. differentiate own keywords from keywords for all The Cluster is creating an own version of refbase with additional features Refbase Cluster bibliography New Features added by the HRA Authentication integrated into the Active Directory based central user management

(unified login strategy) Authorization based on Active Directory Group memberships Without Login only the bibliographical references are visible After login PDFs are accessible and input or import of new data is possible New Features added by the HRA New input masks (quick input and extended mode) New database fields, e.g. for inputting original script titles Fields for Cluster publication management and

integration into the website Improved Endnote import Etc. More features on the agenda More detailed access control User can define who May read the bibliographic entry May read the PDF Based on Group memberships Automated Import of whole endnote bibliographies with the attached PDF-Files

Additional import fortmats: Citavi Litlink see below HRA - Document indexing Service HRA provides a customized document search engine with fulltext search A fulltext search engine, which uses both the metadata and the content for its indexing service. Current implementation supports pdf files. But per request customizable to index microsoft office formats, mp3 , simple text, Etc. Search interface for the refbase pdfstore

Integrated HRA Projects Thesaurus Linguae Sericae an historical and comparative encyclopaedia of Chinese conceptual schemes an international collaborative project aimed to explore the conceptual schemes of the Chinese language a major expansion through the addition of the databases Wissenschaftssprache Chinesisch (WSC), or "Studies in the Formation of Modern Chinese Terminologies" Integrated HRA Projects GeoTWAIN (GeoTool Without An Important Name) See later presentation

Quotation Finder See later presentation HyperEvaluation See later presentation Turkology Annual Online aimed at digitizing the 26 volume journal and republishing the entries in an online database with new and efficient search options. HRA the overall Architecture first shot Graphical User Interface (GUI)

Lucene Refbase Pdf MySQL HRA Integration SQL Database SOAP


Ontology Translingual Concepts Dabatase (TCD) HRA follows the service oriented approach GeoTWAIN HSE Image Search Ontology WS-Client WS-Client

WS-server Dictionary Location normalizer WS-server WS-server DBs Lucene Indices

Web resources What is a Service Nowadays every IT resource can be implemented as a service Grid Computing, Clouds Service Oriented Architecture is the new paradigm There are Standards that make Services talk to each other General standards like SOAP/WSDL or REST Application domain specific standards specify the

XML-Data sent via such an infrastructure HRA -Web Services Architecture HRA provides a set of web services to be used by service consumers in the Universitt Heidelberg One of the main services is the search service in the Thesaurus of Geographic Names (TGN) example search ui The Web Services are REST protocol based and can be easily extended to meet the consumer needs. More services A more complex service will have a whole text as input and will give back a list of all locations occurring in the

text An application can then use the visualisation engine to display a map of all these places Services technology is also used to synchronize Data Some of the Services are based on eXist XML database HRA Web Services Architecture Service Consumers Exist Services Provider

Client Applications Exist DB REST Web Services Web Browsers Libraries Libraries

Exist itself as client What about LitLink There is an interest to integrate the work of cluster near activities such as SFB Ritual Dynamics and the Transcultural Studies Project If LitLink is used to collect data relevant to the clusters agenda we should think of integrating it How could a bridge between LitLink (Filemaker) and HRA look like? Graphical User Interface (GUI) Lucene

Refbase Pdf MySQL HRA Integration SQL Database SOAP WSDL

LitLink Server FileMaker LitLink FileMaker Client LitLink FileMaker Client Ontology Translingual

Concepts Dabatase (TCD) Our Experiments with FileMaker It is easy to input data but not so easy to export them But there are methods: ODBC/JDBC only with Filemaker Server Advanced HTTP/XML-Export We used the Python Library pyFileMaker for accessing the later interface It actually works

Summary HRA as it is now Repository of information objects of very different kinds distributed in several databases Integration only on the level of metadata Including a full text index Data flow partly via Web services Separation between Front and Back End Semantic aware tools on top New applications are needed (e.g. Georeferenced data) Is there a good platform to migrate to? While looking around for platforms, we found Fedora Commons In Germany eSciDoc is then a very good choice:

Enhanced infrastructure features, in terms of AAI, search, statistics, etc. All three existing solutions could be interesting within the cluster (Pubman, VIRR, Faces) A good platform for new solutions relevant to the cluster (e.g. for geo-referenced data or for

collaboration tools) Why not cooperate? Got into contact with FIZ and MPDL for possible cooperation Also contacted Hochschule Bonn-Rhein-Sieg, where WikiDora (JSPWiki + Fedora) had been developed Wrote a proposal within the 2nd DFG Call on Virtual Research Environments (2 years project) Bringing together Cluster and Transcultural Studies researcher and eSciDoc developers VFTS

Virtuelle Forschungsumgebung fr Transkulturelle Studien Integration of Parts of HRA into the eSciDoc Framework Integration of WikiDora into eSciDoc Development of eSciDoc infrastructure services for geo-referenced data

Development of a eSciDoc solution for historical georeferenced data Development of a Cluster project specific Web Service for analyzing geo-referenced data Demonstrators in 5 research scenarios VFTS Thank you!

Questions? More info at: heidelberg-research-architecture [email protected]

Recently Viewed Presentations

  • Enclosed are the proposed dates for events and

    Enclosed are the proposed dates for events and

    Enclosed are the proposed dates for events and activities for the coming school year. Additional information will be sent out over the course of the school year and you will be informed of any new events/activities through our school newsletter.
  • Museo D'Orsay

    Museo D'Orsay

    * La danza. Estatua de piedra. Jean-Baptiste Carpeaux. 1869-1964. * El príncipe imperial (hijo de Napoleón III y de la Emperatriz Eugenia de Montijo) y su perro Nero. Grupo en mármol. Jean-Baptiste Carpeaux. 1865. * Hércules arquero. Estatua en bronce...
  • Recent FDA Announcements - Agricultural Water

    Recent FDA Announcements - Agricultural Water

    § 112.151 requires that the laboratory must test using (a) U.S. EPA Method 1603 (membrane filtration using modified mTEC) or (b)(1), a method that is at least equivalent to Method 1603 in accuracy, precision, and sensitivity or (b)(2) a scientifically...
  • Welcome to 6th Grade Language Arts!

    Welcome to 6th Grade Language Arts!

    Shiloh (P) Reading (cont.) Students will also be required to keep weekly . reading logs. I require 30 minutes per day of independent reading, 15 of which we will complete in class. Students will complete the other 15 minutes at...
  • Information Visualization with Self-Organizing Maps

    Information Visualization with Self-Organizing Maps

    Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco, CA, USA. 1999 [Kohonen (1982)] Teuvo Kohonen. Self-organized formation of topologically correct feature maps. Biol. Cybernetics, volume 43, 59-62 [Kohonen (1995)] Teuvo Kohonen.
  • R4BP 3 - Europa

    R4BP 3 - Europa

    In November 2018 the new dissemination portal will be available. R4BP 3 will become "transparent" and any information foreseen to be disseminated and not identified as confidential will be available. ... Support the eCAs. The survey and follow-up (3/3)
  • Choose a category. You will be given the

    Choose a category. You will be given the

    The result of a factoring process The GCF of 12, 22 The reverse process of multiplication The GCF of a list of common variables raised to powers True or False: x3 + 19x + x = x(x2 +19) Factor completely:...
  • Learning objectives  To identify differences between a manager

    Learning objectives To identify differences between a manager

    Volunteer Leadership * Business Manager vs. Volunteer Leader Directs staff Appointed Achievement focused Intellectual/logical Focus on results Scientific Power from position Permanent appointment Problem oriented Recruits staff Elected Relationship focused Focus on process Emotional/caring Creative Power from people Appointment changes...