Patrol Team Language Identification System for DARPA RATS

Patrol Team Language Identification System for DARPA RATS

Patrol Team Language Identification System for DARPA RATS P1 Evaluation Pavel Matejka1, Oldrich Plchot1, Mehdi Soufifar1, Ondrej Glembek1, Luis Fernando DHaro1, Karel Vesely1, Frantisek Grezl1, Jeff Ma2, Spyros Matsoukas2, and Najim Dehak3 Brno University of Technology, [email protected] and IT4I Center of Excellence, Czech 2 Raytheon BBN Technologies, Cambridge, MA, USA 3 MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA 1 [email protected], [email protected], [email protected] 1 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Outline 2

About DARPA RATS program Datasets and task description Subsystems with analysis Fusion and Results Conclusion Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka DAPRA RATS Program RATS = Robust Automatic Trascription of Speech Goal : create algorithms and software for performing the following tasks on speech-containing signals received over communication channels that are extremely noisy and/or highly distorted. Tasks : Speech Activity Detection Keyword Spotting Language Identification Speaker Identification

Data collector : LDC Evaluation by SAIC Performer: PATROL Team led by BBN 3 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Data Specification Languages: Dari, Levantine Arabic, Urdu, Pashtu, Farsi >10 out of set languages Durations: 120s, 30s, 10s, 3s Telephone conversations retransmitted over 8 noisy radio communication channels [marked as A-H] Available: collections of 2-min audio samples LDC2011E95 split to train and dev by SAIC LDC2011E111 split to train and dev by Patrol team LDC2012E03 supplemental training for non-target languages The amount of audio data for different languages heavily unbalanced Added shorter duration samples Derived from 2-min samples, based on our SAD output

4 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Datasets Train Main Only files where VAD detects >60s of speech 30774 files together Unbalanced = 668 files for Dari, 12778 for Leventine Arabic Balanced Balanced over files for each language and channel 7150 files for each duration 673 files for Dari, otherwise ~1300 Extended Main + all 30sec cuts from Main set + entire LDC2012e03 (only nontarget languages) ~170k segments

Development Set Corpus was driven by Dari - only 679 source files, other languages limited to 1000 files, 2432 files for non target languages ~7120 files for each duration Evaluation Data 2527 files for each duration 5 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka LID Patrol System Architecture JFA LID iVector LID BUT SAD Calibration & Fusion

CZ Phoneme Recognition Combined Score Phonotactic iVector LID Audio BBN SAD 6 iVector LID Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Speech Activity Detection One of the most important blocks since the data really difficult See separate paper about SAD development on Wednesday 16:00 in Pavilon West Used both GMM-based (BBN) and MLP-based (BUT) detectors.

7 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Speech Activity Detection Comparison of different SAD systems Robust SAD tuned for noisy telephone speech Robust SAD tuned for RATS Results (Cavg) are on DEV set (but scored with SRC channel) iVector system (600dim) used for this experiment SAD type/ Cavg[%] 120s Telephone 2,2 RATS 1,6

25% relative gain 8 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka iVector LID System (BUT) Acoustic Features Dithering, bandwidth 300-3200Hz for 25 Mel-filters, 6 MFCC+C0 CMN/CVN (based on SAD), RASTA normalization Shifted Delta Cepstra (SDC) 7-1-3-1 UBM Language independent, diagonal-covariance, 2048 Gaussians Trained on balanced train set iVector 600 dimensions Trained on main set Neural network classifier

9 iVector input, 6 outputs (1 nontarget + 5 target languages) Hidden layer with 200 nodes Stochastic Gradient Descent training with L2 regularization Trained on extended set (all data + all 30 sec splits) Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Comparison of Logistic Regression and Neural Network as final classifier BUT iVector system (600dim) Results on Development set Logistic Regression trained by: Trusted Region Conjugate-GD Results on Development set 10

Neural Net: one hidden layer 200 trained by: Stochastic-GD with L2 regularization also experiments with Conjugate-GD, but no improvement Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka JFA LID System (BUT) Acoustic Features Same as for iVector system + Wiener filtering Universal Background Model (UBM) Language independent, Diagonal-covariance, 2048 Gaussians Trained on balanced train set JFA Trained on main train set = m + Dz + Ux Models of languages D are MAP adapted from UBM with tau =10 Channel matrix U with 200 dimensions Linear scoring 11

Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Importance of Wiener Filter 400dim i-vector + logistic regression experimental system Results on Development set 12 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka iVector LID System (BBN) Acoustic Features RASTA-PLP Block of 11-frame PLPs, projected to 60 dimensions via HLDA UBM Language dependent (5 target, 1 non-target), 1024 Gaussians iVector 400 dimensions Group adjacent speech segments into 20s chunks, estimate one iVector per chunk

improves performance on short duration conditions by 28% Estimate 6 iVectors (one per UBM) Apply neural network (NN) to each iVector - 6 outputs (1 nontarget + 5 target languages) Combine NN outputs to form 6-dimensional score vector 26% relative improvement compared to using language independent i-vectors 13 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Analysis of iVector LID System (BBN) Analysis of the BBN iVector extractor training and UBM: 1. Whole audio segments, single UBM 2. Audio split to 20s segments, single UBM 3. Audio split to 20s segments, language dependent background models (LDBM) 14 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka

Phonotactic iVector LID System (BUT) Phoneme recognizer Czech CTS recognizer trained on artificially noised data Added noise with varying SNR (lowest 10dB) to 30% of the corpus 38 phonemes 3-gram counts: sum of posterior probabilities of 3-grams from phone lattices iVector Multinomial subspace modeling 600 dimensions, trained on main train set Training a low-dimensional subspace in the framework of total variability model using multinomial distribution Using point-estimate of the models latent variable for each utterance as our new features Logistic regression as final classifier Trained on main train set 15 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Fusion and Calibration Regularized logistic regression

Objective: minimize cross-entropy on development set Duration-independent trained on files from 10s, 30s, and 120s conditions Procedure Calibrate (tune) each system individually Combine calibrated system outputs into a single output vector Fusion parameters estimated on the same development set Performance evaluation Primarily Cavg score Also computed PMISS and PFA at Phase 1 target operating points 16 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Overall Results EVL DEV System\Duration\Cavg[%]

17 JFA iVector NN BUT iVector NN BBN CZ3-iVec600 Fusion JFA iVector NN BUT iVector NN BBN CZ3-iVec600 Fusion 120s 1.61 1.60 2.58 2.60 0.83 7.05 7.83 9.03 9.06 6.56 30s 6.14

4.94 5.92 8.95 2.92 12.92 9.97 11.96 15.56 8.33 Patrol LID System for DARPA RATS P1 Evaluation 10s 12.52 10.36 12.06 16.84 6.85 17.36 14.52 17.83 21.90 11.40 3s 23.53 21.73

28.21 30.53 18.08 22.68 21.46 27.10 29.12 17.45 Pavel Matejka Robustness There is channel B completely removed from the training of the contrastive system (noB) (channel B is unseen channel) Results on Development set with BUT iVector system (600dim) Over all results System/Cavg[%] iVector NN iVector NN noB 120s 1.9 2.5 30s 6.6 7.2

10s 13.0 13.4 3s 23.6 23.8 120s 3.4 8.1 30s 9.7 14.7 10s 16.8 19.8 3s 25.8 29.0 Results only for channel B System/Cavg[%]

iVector NN iVector NN noB 18 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Conclusion 19 SAD is crucial De-noising helps Benefit from using Language dependent UBM Benefit from using NN as final classifier for LID Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka

Recently Viewed Presentations

  • Training Powerpoint

    Training Powerpoint

    What is a Type III School Bus ? Type III school buses are restricted to passenger cars, station wagons, vans, SUV's and buses having a maximum manufacturer's rated seating capacity of ten or fewer people, including the driver, and a...
  • Reauthorization of the Indian Health Care Improvement Act

    Reauthorization of the Indian Health Care Improvement Act

    *Sec. 408 Non-Discrimination in Qualifications for Reimbursement. Provides for payment of I/T/U programs by any Federal health care program without regard to licensed status so long as meet other generally applicable requirements for participation *Sec. 124 Exemption from certain fees.
  • Présentation PowerPoint

    Présentation PowerPoint

    en capacité de rassembler au-delà des disciplines, secteurs, domaines, tous les acteurs (territoires de projet, élus, techniciens, entreprises, associations, SCOOP, SCIC, chercheurs, laboratoires, consultants, collectifs citoyens, Conseils de Développement, personnes physiques…) et la construction d'un . Fonds de Dotation
  • Community Outreach Subcommittee report

    Community Outreach Subcommittee report

    Social Media Presence. People were not only showing up to the meetings, but engaging online. We had more than 5,500 visits on the Facebook page this week alone with 459 followers and more than 161 followers on Twitter.
  • Présentation La Banque Postale - e-MFP

    Présentation La Banque Postale - e-MFP

    Commentaire conclusif Depuis le 1er janvier 2008, l'Activité entreprises est animée et pilotée par la Direction des entreprises, des collectivités et des associations. Ainsi structurée, La Banque Postale accueille et accompagne 364 000 associations, 17 000 PME, 99 000 TPE,...
  • CTE Success Indicators

    CTE Success Indicators

    Student Success - The Ideal Pathway. Access to college. Placement into English/ESL and Math courses. Retention/Success rates in courses. Progress from basic skills to transfer level
  • Decameron

    Decameron

    Nell'ambito di questa ampia attività filologico-erudita di tipo umanistico si collocano i suoi repertori sulle divinità classiche (De genealogiis deorum gentilium), sulla geografia (De montibus, silvis, fontibus, lacubus, fluminibus, stagnis seu paludibus, et de nominibus maris), sulle più illustri figure...
  • MCE/CSA Meeting - Rosie D

    MCE/CSA Meeting - Rosie D

    MCE/CSA Meeting July 16, 2010 ... MBHP With special thanks to Eric Bruns & April Sather CSA Statewide Meeting Purpose of this Presentation Create shared understanding of our accomplishments in fostering quality Wraparound in year one of the CBHI MA...