A roadmap for MT : four keys to

A roadmap for MT : four  keys to

A roadmap for MT : four keys to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002 Christian Boitet GETA, CLIPS, IMAG, 385 av. de la bibliothque, BP 53 F-38041 Grenoble cedex 9, France [email protected], http://clips.imag.fr/geta Outline Basic concepts What is MT ? Goals: Quality / User Architectures: Vauquois' triangle State of the art MT of texts: examples, problems MT of spoken dialogs The future of MT Goals 4 keys Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 2/30

What is M(a)T ? At least 3 types of automation MT = Machine Translation MAT = Machine Assisted Translation MAHT = Machine Aided Human Translation A scientific technology Informatics (computer science) Linguistics Mathematics Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 3/30 Goals: Quality / User User Quality rough, quick from raw to very good Ch. Boitet linguistically naive

linguistically specialized MT for access MT for translators special fields : atom, chemistry general information helps: lexicons, proposals from a translation memory MT for individual authors MT for revisors (posteditors) with interactive disambiguation raw MT, polishable ICUKL2002, Goa, 25-29/11/2002

4/30 Architectures: Vauquois' triangle Deep nding understa level Ontological interling Interlingual vel Conceptual le Semantico-lin interling transfer SemanticSPA-structu transfer Logico-semant ic level & predicate ument) Mixing elsMultilevel Ascending lev Multilevel transfer n transfer

Syntactico-functio nal Syntactic level F-structure transfer (de ) C-structure transfer (su Syntagmatic levelSyntactic Morpho-syntac tic Semi-direct level nDescending translatio Tagged t trant Direct Graphemic l leve translation Text Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 5/30 Deep nding understa level

Ontolo inte Architekturen: Vauquois (grer)inte Interlingual vel Conceptual leDreieck Semantic tran SemanticSPA-str transfe Logico-semant ic level & pred um Mixing elsMultilevel Ascending lev Multile tran n transf Syntactico-functio nal Syntactic level F-struc transfe

) C-struc transfe Syntagmatic levelSyntactic Morpho-syntac tic Semi-direct level nDescending transla Tagg t Direct Graphemic l leve translatio Te Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 6/30 Formal intermediate structures Linguistic level(s) Surface Deep 1-level n-level

Ch. Boitet Linguistic main organization Geometrical structure Algebraic structure Labels Struct. string String Syntagms (constituents) Chain graph (chart) Boolean Dependencies Tree structure features Logical and semantic relations Graph / Network

Structured attributes Hypergraph Feature structures ICUKL2002, Goa, 25-29/11/2002 Correspondence StructureText Scope Sentence concrete (text readable from structure) (almost all) Paragraph Page abstract (Ariane-G5, Sygmart)

(e.g. UNL) Document 7/30 How to produce an MT system Choose an architecture Program the "tools" Spezialized languages for linguistic programming (SSLP) Development environment (MT shell) Build the "lingware" Lexical data / rules / weights Grammatical data / rules / weights Possible specialization to a typology ("sublanguage") How? Human work computer help / support Automatic learning (weights, likeliness) Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 8/30 State of affairs only a small number of language pairs is covered by MT systems designed for information access

Systran EC (2000): 19/110 language pairs, 8 OK for intended use See also examples by Ronaldo Martins even fewer are capable of quality translation or speech translation Now a few examples Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 9/30 Examples: MT for access, Web (1) ENGLISH (human version) FRENCH (human version) ENGLISH (Systran FRE-ENG version) The European-Heritage.net thesaurus covers the fields of archaeology and architecture as defined in the Council of Europe conventions signed in Granada (1985) and Malta (1992). It encompasses information ranging from the partners involved, categories of cultural

assets and legislation, to activities, skills and funding. It is supplemented by a number of specific thesauruses compiled by each member state on a particular topic, such as the thesaurus on Andalusian heritage or the architectural thesaurus from the Mrime database in France. This new, open-ended search tool will come on line shortly, together with a management and administration system shared among the various contributors. Le thesaurus European-Heritage.net couvre les champs de l'archologie et de l'architecture au sens des conventions du Conseil de l'Europe de Grenade (1985) et de Malte (1992). The European-Heritage.net thesaurus covers the fields of archaeology and architecture within the meaning of conventions of the Council of Europe of Grenade (1985) and Malta (1992). Il prend en compte des aspects aussi varis que les acteurs, les catgories de biens culturels, la lgislation ou

encore les interventions, les mtiers et les financements. Il est complt et prolong par des thesaurus spcifiques dvelopps par chaque Etat membre sur tel ou tel sujet spcifique, comme le thesaurus du patrimoine historique andalou ou le thesaurus d'architecture de la base de donnes documentaire Mrime en France. Cet instrument de recherche, forcment volutif, sera mis prochainement en ligne accompagn d'un dispositif de gestion et d'administration rparti entre les diffrents contributeurs. It takes into account aspects as varied as the actors, the categories of cultural goods, the legislation or the interventions, the trades and the financings. It is supplemented and prolonged by thesaurus specific developed by each Member State on such or such specific subject, like the thesaurus of the Andalusian historical inheritance or the thesaurus of architecture of the documentation data base Mrime in France. This instrument of search, inevitably evolutionary, will be put soon on line

accompanied by a device of management and administration distributed between the various contributors. Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 10/30 Examples: MT for access, Web (2) FE quite "easy", compared with EG and mainly FG GERMAN (Systran ENG-GER version) GERMAN (Systran FRE-GER version) Der European-Heritage.netthesaurus umfat die Felder von archaeology und von Architektur, wie in den Europaratvereinbarungen definiert, die in Granada (1985) unterzeichnet werden und in Malta (1992). Er gibt die Informationen um, die von den betroffenen Partnern, von den Kategorien der kulturellen Werte und der Gesetzgebung, bis zu Aktivitten, von den Fhigkeiten und von der Finanzierung reichen. Er wird durch eine Anzahl von den spezifischen Thesauren ergnzt, die durch jeden Mitgliedsstaat auf einem bestimmten Thema, wie dem Thesaurus auf

Andalusian Erbe oder dem architektonischen Thesaurus von der Datenbank Mrime in Frankreich kompiliert werden. Der European-Heritage.net-Thesaurus bedeckt die Felder der Archologie und der Architektur im Sinne der bereinkommen des Europarats von Granada (1985) und von Malta (1992). Dieses neue, offene Suchhilfsmittel kommt auf Zeile kurz, zusammen mit einem Managementund Leitungssystem, das unter den verschiedenen Mitwirkenden geteilt wird. Ch. Boitet Er bercksichtigt Aspekte dermaen variierte, da die Beteiligten, die Kategorien kultureller Gter, die Gesetzgebung oder noch die Interventionen, die Berufe und die Finanzierungen. Er wird vervollstndigt und wird durch ein spezifische Thesaurus entwickelt durch jeder Mitgliedstaat ber das eines oder andere spezifische Thema verlngert, als der Thesaurus des andalusischen historischen Kulturgutes oder der Thesaurus der Architektur der urkundlichen Datenbank Mrime in Frankreich. Dieses notgedrungen entwicklungsfhige Forschungsinstrument wird gestellt demnchst online begleitet von einer Ve rwaltungs- und Ve rwaltungsvorrichtung, die aufgeteilt unter den

verschiedenen Beitragenden. ICUKL2002, Goa, 25-29/11/2002 11/30 Comparison: raw vs rough MT SpanAm raw Spanish-English output Reverso raw Spanish-English output Message of the Director-General of the World Health Organization From its discovery, antibiotics have completely transformed the perspective of humankind with respect to infectious diseases. Today the use of antibiotics, combined with improvements in sanitation, housing, and nutrition, together with the advent of the vaccination programs generalized, have caused a notable reduction of infectious diseases that previously were common and annihilated entire populations. Scourges that terrified millions of people, as plague, whooping cough, poliomyelitis, and the scarlatina, have been controlled or are on the verge of being controlled. Now, in the dawn of a new millennium, humankind faces another crisis. Previously curable diseases as the gonorrhea and typhoid fever are becoming rapidly difficult to treat, while old assassins as tuberculosis and malaria now are armed

of the increasingly impenetrable resistance to the antimicrobial drugs. This phenomenon is potentially contenible. The problem is increasingly profound and complex, accelerated by the abuse of antibiotics in the developed countries and the paradoxical underutilization of the quality antimicrobial drugs in the developing countries due to the poverty and to the scarcity resulting from an effective health care. Message of the Chief operating officer of the World Organization of the Health From his{*its*} discovery, the antibiotics have transformed completely the perspective of the humanity with regard to the infectious diseases. Today the use of the antibiotics, cocktail with improvements in the reparation, the housing and the nutrition, together with the advent of the programs of widespread vaccination, they have given place to a notable decrease of infectious diseases that before were common and were annihilating entire populations. Ch. Boitet Scourges that terrified million persons, as the pest, the savage cough, the poliomyelitis and the scarlatina, they have been controlled or are on the verge of be controlling. Now, in the dawn of a new millenium, the humanity faces with another crisis. Diseases before curable as the gonorrhea and the fever tifoidea they are becoming rapidly difficult to treat, whereas killer old men as the tuberculosis and the

malaria are armed{*assembled*} now with the increasing impenetrable resistance the antimicrobial ones. This phenomenon is potentially contenible. The problem is increasingly deep and complex, accelerated by the abuse of the antibiotics in the developed countries and the paradoxical subutilization of the antimicrobial ones of quality in the countries in development due to the poverty and the resultant shortage of an attention of effective health. ICUKL2002, Goa, 25-29/11/2002 12/30 Examples: MT for revisors Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 13/30 with BV-aero/FE (2) Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 14/30 MT of spoken dialogs

Specialized systems are already usable e.g. ATR/Matsushita, IBM, CSTAR/Nespole! Much "noise" and "ungrammaticalities" But specializing is very helpful! General systems are also possible e.g. NEC/Xroad, Linguatec/Talk&Translate Speech recognition is already good enough Rough may be good enough (e.g. for chatting) Interpretation is different from translation and participants are intelligent ! Similarity with access-oriented-MT Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 15/30 French-Korean through IF (1) Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 16/30 French-Korean through IF (2)

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 17/30 French-Korean through IF (3) Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 18/30 A road map to which goals? MT of adequate quality Not only for access For all languages Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 19/30 Four keys 2 on the technical side 2 on the organizational side Compromize: a far wider coverage, a somewhat smaller asymptotic quality Automatic learning techniques

Using non-textual pivots (intermediate formal descriptors) Democratization, cooperation Cooperative development of open source linguistic resources on the Web Towards systems where quality can be improved "on demand" by users Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 20/30 Learning techniques Extend the use of hybrid techniques symbolic, numerical, or mixed ==> they have demonstrated their potential at the research level stochastic grammars weighted (or "neural") dictionaries or build new tools, intrinsically numerical inspiration from voice recognition 2 examples learning analyzers : text > semantic tree (IBM) learning implicit very detailed DG from tree bank (NAIST) Ch. Boitet ICUKL2002, Goa, 25-29/11/2002

21/30 Using non-textual pivots Semantico-pragmatic (ontological) pivots task & domain oriented ==> limited applicability Abstract linguistic descriptors the most precise, but often too sophisticated depend on each language Anglo-semantic pivot: UNL "the HTML of linguistic content" in UNL, a hypergraph represents the abstract structure of (supposedly) equivalent English utterance less precise but "robust" symbols constructed from English ==> usable by all developers Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 22/30 A simple UNL graph score(icl>event,agt>human,fld>sport) [email protected]@[email protected] agt Ronaldo

(icl>proper noun) obj ins plt head(pof>body)[email protected] pos corner(icl>thing)[email protected] goal(icl>abstract thing) pos goal(icl>concrete thing) mod left(aoj

of open source linguistic resources on the Web Mutualization is necessary at least for lexical knowledge too costly even for the leaders size (#entries) has to augment for each language (300K, 3M?) #languages has to increase dramatically (11 > 20 > 180?) Integration of human- and machine-oriented knowledge is useful e.g. to produce mixed MT/MAHT systems Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 24/30 A contribution: the Papillon project Goal: produce many open source dictionaries from a central lexical data base Means: build rich (DiCo) monolingual dictionaries of lexies (senses) interlink lexies by interlingual links (axies) use XML & associated tools as basis to generate many formats for humans and for machines start from (free) digital resources induce "consumers" to become "producers" (contributors)

Quality control: private accounts central validating/integrating group Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 25/30 Papillon database macrostructure User User User Interaction with the Dictionaries Dictionary Dictionary Extraction of Dictionaries Lexical Human Contributors Database

Integration of existing resources Resource Ch. Boitet Resource Resource ICUKL2002, Goa, 25-29/11/2002 26/30 PAPILLON diagram French. DiCo Vocable carte n.f. Lexie carte.1 carte jouer Lexie carte.2 carte gographique Thai DiCo Japan. DiCo Interlingual links Acception 343

UNL: card(icl>play), card(icl>thing) Acception 345 UNL: map(fld>geography) Acception 1002 UNL: card(fld>money) a Engl. DiCo Vocable card N Lexie card.1 playing card Lexie card.2 money card Interlingual links based on translations = "AXIEs" Possibility to link 1 lexie with >1 acceptions References to other semantic systems: AXIE1n>UW Ch. Boitet ICUKL2002, Goa, 25-29/11/2002

Vocable=lexie map 27/30 Construct systems where quality can be improved "on demand" by users a priori through interactive disambiguation in the source language or a posteriori by correcting the pivot representation (UNL or other) through any language (as in MultiMeteo) ==> In the 2 cases, all versions (in all languages) are improved possibility to merge MT multilingual generation computer-aided authoring Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 28/30 Conclusion 4 keys to open the door to MT of adequate quality to all languages On the technical side, dramatically increase the use of learning techniques use pivot architectures, the most universally usable pivot being UNL

On the organizational side, cooperatively develop open source linguistic resources on the web construct systems where quality can be improved "on demand" by users On the practical side, seek keys to unlock private investment, public funding, voluntary cooperation could this conference become a decisive turning point? Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 29/30

Recently Viewed Presentations

  • Bureaucratic Reform of State Secretariat and Progress of ...

    Bureaucratic Reform of State Secretariat and Progress of ...

    BUREAUCRATIC REFORM OF STATE SECRETARIAT Henry Soelistyo, SH., LL.M December 2008 ... Providing training program such as Emotional Quotient (EQ), Emotional Spiritual Quotient (ESQ); - Applying ethics of employee. 4. Holding a program to Enhance Employee's capability in administrative services.
  • U1 Product Design Evolution Student Exemplar

    U1 Product Design Evolution Student Exemplar

    Unit 1 - Lesson 1.1 - Introduction to Design Process This is an important slide to go over for students to understand the process that a product undergoes during its changes. "MIT Invention Index at the Massachusetts Institute of Technology...
  • Towards Efficient Dataflow Frameworks for Big Data ...

    Towards Efficient Dataflow Frameworks for Big Data ...

    Summary of Twister2: Next Generation HPC Cloud + Edge + Grid. We suggest an event driven computing model built around Cloud and HPC and spanning batch, streaming, and edge applications. Highly parallel on cloud; possibly sequential at the edge. Expand...
  • Good Thinking a Path to Resilience

    Good Thinking a Path to Resilience

    (Philippians 4:8, NIV). What is the essence of Paul's words to us here? What is the key to doing what he says? Wholesome Thinking "Finally, brothers, whatever is true, whatever is noble, whatever is right, whatever is pure, whatever is...
  • Why Springer for R&D - Springer - International Publisher ...

    Why Springer for R&D - Springer - International Publisher ...

    Springer Protocols. SpringerProtocols is the world's largest and most comprehensive collection of biomedical and life sciences protocols. Available through the dedicated platform springerprotocols.com as well as through SpringerLink, SpringerProtocols ensures that pharmaceutical and biotech researchers have access to reliable, reviewed...
  • Physiology - Lake Stevens School District

    Physiology - Lake Stevens School District

    Physiology. Ch. 18.4, 47.2-3, 35.5. Regulation of gene expression that orchestrates development. Activities of the cell depend on the genes it expresses and the proteins it produces. ... Morphogenesis: process that gives an organism its shape.
  • Pulmonary Volumes and Capacities - MU

    Pulmonary Volumes and Capacities - MU

    Describe the changes in these volumes and capacities in obstructive and restrictive pulmonary diseases. Pulmonary function tests. are group of procedures that are designed to measure (evaluate) the functions of the lung. ... It is mesured by helium dilution. Inspiratory
  • Appendicular skeleton The regions that include : -arms,

    Appendicular skeleton The regions that include : -arms,

    Appendicular skeleton The regions that include :-arms, hands-legs, feet are supported by your appendicular skeleton. Appendicular Skeleton Axial Skeleton Axial Skeleton Appendicular Skeleton Axial Skeleton Axial Skeleton Jointed appendages allow humans to move.