Skip to content
Snippets Groups Projects

Review window functionality

Closed Verbeek, J.M. (Janneke) requested to merge review-window-functionality into master
19 files
+ 765
274
Compare changes
  • Side-by-side
  • Inline
Files
19
+ 189
39
********** "MISTAKES" FOR SHADOW WORDS: **********
This document serves as an overview of all the mistake types that Umbra can
detect and assign in shadowing tasks. The goal is to inform the user of the
mistake types by providing an intuition, a formal definition and Dutch
examples of every possible mistake type. The mistakes are sorted in two
categories.
1. Source word mistakes: mistakes that are attributed to the words in the
source file of a shadowing task.
2. Shadow word mistakes: mistakes that are attributed to the words in the
shadow file of a shadowing task.
********** "MISTAKES" FOR SOURCE WORDS: **********
FORM MISTAKE
Intuition:
If a source verb is shadowed in another tense or another plurality (i.e.
singular instead of plural or vice versa), or if a source noun is shadowed
in another plurality, it is assigned this mistake type.
Formal definition:
A source word is matched with a shadow word and assigned this mistake type if
- it starts with a prefix from prefixes and is identical to the shadow word
if the prefix is replaced by another prefix from prefixes.
OR
- it ends with an affix from affixes and is identical to the shadow word if
the affix is replaced by another affix from affixes
where:
prefixes = ['', 'ge', 'be', 'ver', 'on', 'ont']
affixes = ['', 'en', 't', 'te', 'ten', 'de', 'den', 's', "'s"]
A source word is also matched with a shadow word and assigned this mistake
type if
- it is a irregular verb, and source and shadow word belong to the same
imperative and are in umbra\resources\irregular_verbs.csv
OR
- it is a regular verb, and identical to the shadow word if put in another
tense.
OR
- it is a regular verb, and a conjugation of the shadow word.
OR
- a combination of the above two.
Examples:
In the following example, the word 'gingen' will be assigned the form mistake
type:
Source: "Wij gaan naar huis toe."
Shadow: "Wij gingen naar huis toe"
In the following example, the word 'wil' will be assigned the form mistake
type:
Source: "Ik wil naar huis toe."
Shadow: "Ik wilde naar huis toe."
In the following example, the word 'auto' will be assigned the form mistake
type:
Source: "De auto's worden verkocht."
Shadow: "De auto worden verkocht."
REPETITION: (IMPLEMENTED for anchor)
- 2 or more times the same word in a row in the shadow, while it only appears
once in the source. Example: 'en' and 'en' in file 7 (2nd 'en' is repetition)
- incorrectly shadowed word before a correctly shadowed word is the beginning
of the correctly shadowed word. Example: 'toe' and 'toestuurt' in file 7
('toe' is repetition)
- incorrectly shadowed word after a correctly shadowed word is the end of the
correctly shadowed word. Example: 'bestaat' and 'staat' in file 9 ('staat' is
repetition)
- Same as two above, but with more than 1 word. Example 'je', 'dat', 'je' and
'dat' in file 17 (latter 'je' and 'dat' are repetitions)
FORM:
- Shadow word that has a pre- or postfix that the respective source word does
not have.
- Prefixes: ge-, be-, ver-, on-, ont-
- Postfixes: -en, -t, -te, -ten, -de, -den
SEMANTIC MISTAKE
Intuition:
If a source word is semantically related to a shadow word, it is assigned
this mistake type.
Formal definition:
A source word is matched with a shadow word and assigned this mistake type
if:
- the source and the shadow word are in Open Dutch Wordnet and
- they are synonyms according to Wordnet
OR
- one is the hypernym of the other, according to Wordnet
Examples:
In the following example, the word 'lopen' will be assigned the semantic
mistake:
Source: "De mensen lopen naar voren."
Shadow: "De mensen wandelen naar voren."
In the following example, the word 'bestek' will be assigned the semantic
mistake:
Source: "Leg het bestek op tafel."
Shadow: "Leg het mes op tafel."
SEMANTIC:
- Shadow word that is semantically similar to the source word that it belongs
to. Said out of of confusion. Semantic relatedness is determined by using
OpenDutchWordnet.
PHONETIC:
- 1 letter difference with a source word. Example: 'die' en 'de' in file 7
- Combination of this word and the next word looks like a source word. Example:
the combination 'fout' with 'zus', and 'fotootjes' in file 7.
- Any combination of shadow words looks like any combination of source words.
Example: "nextdoor is een" and "stoornissen" in file 9.
- Probably: more than 50% overlap between words.
Intuition:
If a source word is phonetically very similar to a shadow word, it is assigned
this mistake type.
Formal definition:
A source word is matched with a shadow word and assigned this mistake type
if:
- any possible phonetic representation of the source word is a phonetic
representation of the shadow word.
Examples:
In the following example, the word 'bureau' will be assigned the semantic
mistake:
Source: "De pen ligt op het bureau."
Shadow: "De pen ligt op het buro."
SKIPPED WORD
Intuition:
If a word in the source file is totally skipped by the participant during
the shadowing task, the word is assigned this mistake type.
Formal definition:
Source words that are not correctly shadowed, and to which no other mistake
type can be assigned, will be assigned the skipped word mistake.
Example:
In the following example, the word 'een' will be assigned the skipped word
mistake:
Source: "Dit is een voorbeeld."
Shadow: "Dit is voorbeeld."
********** "MISTAKES" FOR SHADOW WORDS: **********
REPETITION
Intuition:
If the same word occurs twice in a row in the shadow file, but only once in
the source file, the second occurence is labeled as a repetition.
Formal definition:
A shadow word is assigned this mistake type if:
- it is a repetition of the previous shadow word, and only one occurence is
in the source file.
OR
- it is the beginning of the next shadow word, and only the next shadow word
is in the source file.
OR
- it is the end of the previous shadow word, and only the previous shadow
word is in the source file.
OR
- one of the above, but than for more than one shadow word.
Examples:
In the following example, the word 'een' will be assigned the repetition
mistake:
Source: "Dit is een voorbeeld."
Shadow: "Dit is een een voorbeeld."
In the following example, the word 'toe' will be assigned the skipped word
mistake:
Source: "Je moet haar dat toesturen"
Shadow: "Je moet haar dat toe toesturen."
In the following example, the word 'staat' will be assigned the skipped word
mistake:
Source: "Zoiets bestaat toch niet?"
Shadow: "Zoiets bestaat staat toch niet?"
In the following example, the word 'je' and the word 'dat' will be assigned
the skipped word mistake:
Source: "Moet je je dat eens voorstellen."
Shadow: "Moet je je dat je dat eens voorstellen."
N.B.: If a combination of two words in the shadow is exactly equal to a source
word, or vice versa, then it is not a phonetic mistake. Instead, mark as
'correct'. Because this would be an inconsistency due to the speech to text
software.
N.B. 2: 'zij' and 'ze' are the same, just like 'jij' and 'je', and 'wij' and
'we' Could make an exception for these words.
RANDOM:
- Shadow words that do not fall into the above mistake categories, and are just
said randomly.
Intuition:
If a word in the shadow file is totally unrelated to any source word, the word
is assigned this mistake type.
Formal definition:
Shadow words that are not a correct shadow and to which no other mistake type
can be assigned, will be assigned the random word mistake.
********** "MISTAKES" FOR SOURCE WORDS: **********
Example:
In the following example, the word 'zeer' will be assigned the skipped word
mistake:
SKIPPED:
- Not really a mistake, but really totally skipped. If it is not tied with any
of the shadow words, so it is not correctly shadowed and not wrongly shadowed.
\ No newline at end of file
Source: "Dit is een belangrijk voorbeeld."
Shadow: "Dit is een zeer belangrijk voorbeeld."
Loading