List Info

Thread: International Workshop on Spoken Language Translation (IWSLT 2006) - CFP




International Workshop on Spoken Language Translation (IWSLT 2006) - CFP
user name
2006-06-23 14:57:41
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-

    International Workshop on Spoken Language Translation
(IWSLT 2006)
        -- Evaluation Campaign on Spoken Language
Translation --

                 Second Call for Participants / Papers

                         November 27-28, 2006
                             Kyoto, Japan

                    http://www.slc.atr.jp
/IWSLT2006

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-

Spoken language translation technologies attempt to cross
the language
barriers between people having different native languages
who each want
to engage in conversation by using their mother-tongue.
Spoken language translation has to deal with problems of
automatic
speech recognition (ASR) and machine translation (MT).

One of the prominent research activities in spoken language
translation is
the work being conducted by the Consortium for Speech
Translation Advanced
Research (C-STAR III), which is an international partnership
of research
laboratories engaged in automatic translation of spoken
language. Current
members include ATR (Japan), CAS (China), CLIPS (France),
CMU (USA), ETRI
(Korea), ITC-irst (Italy), and UKA (Germany).
A multilingual speech corpus comprised of tourism-related
sentences (BTEC*)
has been created by the C-STAR members and parts of this
corpus were already
used for previous IWSLT workshops focusing on the evaluation
of MT results
based on text input (http://www.slc.atr.jp
/IWSLT2004) and the translation
of ASR output (word lattices, N-best lists) using read
speech as input
(http://penance
.is.cs.cmu.edu/iwslt2005). The full BTEC* corpus
consists
of 160K of sentence-aligned text data and parts of the
corpus will be
provided to all evaluation campaign participants for
training purposes.

In this workshop, we focus on the translation of spontaneous
speech which
includes ill-formed utterances due to grammatical
incorrectness, incomplete
sentences, and redundant expressions. The impact of
spontaneity aspects
on the ASR and MT systems performance as well as the
robustness of
state-of-the-art MT engines towards speech recognition
errors will be
investigated in detail.

Two types of submissions are invited:
 1) participants in the evaluation campaign of spoken
language translation
    technologies. Each participant in the evaluation
campaign is requested
    to submit a paper describing the utilized ASR and MT
systems and
    to report results using the provided test data.
 2) technical papers on related issues.

An overview of the evaluation campaign is as follows:

=== Evaluation Campaign

Theme:

    * Spontaneous speech translation

Translation Directions:

    * Arabic/Chinese/Italian/Japanese into English (AE, CE,
IE, JE)

Input Conditions:

    * Speech (audio)
    * ASR Output (word lattice or N-best list)
    * Cleaned Transcripts (text)

Supplied Resources:

    * training corpus:
          o AE, IE:
                + 20,000 sentence pairs of BTEC*
                + three develop sets (3x500 sentence pairs,
16 multiple 
references)
          o CE, JE:
                + 40,000 sentence pairs of BTEC*
                + three develop sets (3x500 sentence pairs,
16 multiple 
references)

    * develop corpus:
          o speech data, word lattices, N-best lists of 500
input sentences
            with 7 reference translations for each
translation direction
            and input condition

    * test corpus:
          o speech data, word lattices, N-best lists of 500
input sentences
            for each translation direction and input
condition

  => word segmentations will be provided according to the
output
     of the provided ASR engines

Data Tracks:

    The past IWSLT workshop results showed that the amount
of BTEC* sentence
    pairs used for training largely effects the performance
of the MT 
systems
    on the given task. However, only CSTAR partners have
access to the full
    BTEC* corpus. In order to allow a fair comparison
between the systems,
    we decided to distinguish the following two data tracks:

    * Open Data Track ("open" for everyone
:->)
          o no restrictions on training data of ASR engines
          o any resources, besides the full BTEC* corpus and
proprietary 
data,
            can be used as the training data of MT engines.
            Concerning the BTEC* corpus and proprietary
data, only the 
Supplied
            Resources (see above) are allowed to be used for
training 
purposes.

    * C-STAR Data Track
          o no restrictions on training data of ASR engines
          o any resources (including the full BTEC* corpus
and proprietary
            data) can be used as the training data of MT
engines.

Evaluation Specification:

    * ASR output
          o (automatic) WER

    * MT output
          o (automatic) BLEU(*), NIST, METEOR
          o (subjective) fluency(*), adequacy(*)

     -> systems will be ranked according to the metrics
marked '(*)'
     -> human assessment will be carried out for the
top-10 systems
        (according to the BLEU metric) of the
Chinese-to-English
        Open Data Track (ASR Output condition).

=== Technical Paper:

The workshop also invites technical papers related to spoken
language 
translation.
Possible topics include, but are not limited to:

    * Spontaneous speech translation
    * Domain and language portability
    * MT using comparable and non-parallel corpora
    * Phrase alignment algorithms
    * MT decoding algorithms
    * MT evaluation measures

=== Important Dates

  + Evaluation Campaign

        April  7, 2006 -- System Registration Open
          May 12, 2006 -- Training Corpus Release
         June 30, 2006 -- Develop Corpus Release
       August  7, 2006 -- Test Corpus Release [00:01 JST]
       August  9, 2006 -- Result Submission Due [23:59 JST]
    September 15, 2006 -- Result Feedback to Participants
2006
    September 29, 2006 -- Paper Submission Due
      October 14, 2006 -- Notification of Acceptance
      October 27, 2006 -- Camera-ready Submission Due

     - system registrations will be accepted until release
of
       test corpus
     - late result submissions will be treated as unofficial
       result submissions

  + Technical Papers

    September 15, 2006 -- Paper Submission Due [23:59 JST]
      October 17, 2006 -- Notification of Acceptance
      October 27, 2006 -- Camera-ready Submission Due

=== Application / Submission Guidelines / Updated
Information

  + available at http://www.slc.atr.jp
/IWSLT2006

=== Organizers

  + Satoshi Nakamura (ATR, Japan; Chair)
  + Herve Blanchon (CLIPS, France)
  + Gianni Lazzari (ITC-irst, Italy)
  + Youngjik Lee (ETRI, Korea)
  + Alex Waibel (CMU, USA / UKA, Germany)
  + Bo Xu (CAS, China)

=== Program Committee

  + Michael Paul (ATR, Japan; Evaluation Campaign Chair)
  + Marcello Federico (ITC-irst, Italy; Technical Paper
Chair)
  + Nicola Bertoldi (ITC-irst, Italy)
  + Christian Boitet (CLIPS, France)
  + Genichiro Kikui (NTT, Japan)
  + Kevin Knight (ISI, USA)
  + Phillip Koehn (Univ. of Edinburgh, UK)
  + Sadao Kurohashi (Univ. of Tokyo, Japan)
  + Young-Suk Lee (IBM, USA)
  + Jose B. Marino (UPC, Spain)
  + Arul Menezes (Microsoft, USA)
  + Masaaki Nagata (NTT, Japan)
  + Hermann Ney (RWTH, Germany)
  + Seung-Shin Oh (ETRI, Korea)
  + Wade Shen (MIT, USA)
  + Stephan Vogel (CMU, USA)
  + Andy Way (Dublin City University, Ireland)
  + Chengqing Zong (CAS, China)

=== Local Arrangements

  + Genichiro Kikui (NTT, Japan)

=== Conference Venue

  + Paruru Plaza Kyoto (right in front of Kyoto Station)

=== Supporting Organizations

  + Advanced Telecommunication Research Institute
International (ATR)
  + Association for Computational Linguistics (ACL)
  + Center for the Evaluation of Language and Communication
Technologies 
(Celct)
  + European Language Resources Association (ELRA)
  + International Speech Communication Association (ISCA)

=== Contact

  Michael Paul    
  e-mail: michael.paulatr.jp
  ATR Spoken Language Communication Research Laboratories
  2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288
Japan

=== References

  + IWSLT 2005 (http://penance
.is.cs.cmu.edu/iwslt2005)
  + IWSLT 2004 (http://www.slc.atr.jp
/IWSLT2004)
  + C-STAR (http://www.c-star.org/)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )