Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
team1920-SpeechComparison
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
msdt
team1920-SpeechComparison
Merge requests
!136
Review window functionality
Code
Review changes
Check out branch
Download
Patches
Plain diff
Closed
Review window functionality
review-window-functionality
into
master
Overview
0
Commits
27
Pipelines
0
Changes
12
Closed
Verbeek, J.M. (Janneke)
requested to merge
review-window-functionality
into
master
5 years ago
Overview
0
Commits
27
Pipelines
0
Changes
19
Expand
Should be fully functional.
0
0
Merge request reports
Compare
version 1
version 1
9d35270f
5 years ago
master (base)
and
latest version
latest version
963d3bbd
27 commits,
5 years ago
version 1
9d35270f
26 commits,
5 years ago
Show latest version
19 files
+
765
−
274
Inline
Compare changes
Side-by-side
Inline
Show whitespace changes
Show one file at a time
Files
19
Search (e.g. *.vue) (Ctrl+P)
docs/MistakeDefinitions.txt
+
189
−
39
Options
********** "MISTAKES" FOR SHADOW WORDS: **********
This document serves as an overview of all the mistake types that Umbra can
detect and assign in shadowing tasks. The goal is to inform the user of the
mistake types by providing an intuition, a formal definition and Dutch
examples of every possible mistake type. The mistakes are sorted in two
categories.
1. Source word mistakes: mistakes that are attributed to the words in the
source file of a shadowing task.
2. Shadow word mistakes: mistakes that are attributed to the words in the
shadow file of a shadowing task.
********** "MISTAKES" FOR SOURCE WORDS: **********
FORM MISTAKE
Intuition:
If a source verb is shadowed in another tense or another plurality (i.e.
singular instead of plural or vice versa), or if a source noun is shadowed
in another plurality, it is assigned this mistake type.
Formal definition:
A source word is matched with a shadow word and assigned this mistake type if
- it starts with a prefix from prefixes and is identical to the shadow word
if the prefix is replaced by another prefix from prefixes.
OR
- it ends with an affix from affixes and is identical to the shadow word if
the affix is replaced by another affix from affixes
where:
prefixes = ['', 'ge', 'be', 'ver', 'on', 'ont']
affixes = ['', 'en', 't', 'te', 'ten', 'de', 'den', 's', "'s"]
A source word is also matched with a shadow word and assigned this mistake
type if
- it is a irregular verb, and source and shadow word belong to the same
imperative and are in umbra\resources\irregular_verbs.csv
OR
- it is a regular verb, and identical to the shadow word if put in another
tense.
OR
- it is a regular verb, and a conjugation of the shadow word.
OR
- a combination of the above two.
Examples:
In the following example, the word 'gingen' will be assigned the form mistake
type:
Source: "Wij gaan naar huis toe."
Shadow: "Wij gingen naar huis toe"
In the following example, the word 'wil' will be assigned the form mistake
type:
Source: "Ik wil naar huis toe."
Shadow: "Ik wilde naar huis toe."
In the following example, the word 'auto' will be assigned the form mistake
type:
Source: "De auto's worden verkocht."
Shadow: "De auto worden verkocht."
REPETITION: (IMPLEMENTED for anchor)
- 2 or more times the same word in a row in the shadow, while it only appears
once in the source. Example: 'en' and 'en' in file 7 (2nd 'en' is repetition)
- incorrectly shadowed word before a correctly shadowed word is the beginning
of the correctly shadowed word. Example: 'toe' and 'toestuurt' in file 7
('toe' is repetition)
- incorrectly shadowed word after a correctly shadowed word is the end of the
correctly shadowed word. Example: 'bestaat' and 'staat' in file 9 ('staat' is
repetition)
- Same as two above, but with more than 1 word. Example 'je', 'dat', 'je' and
'dat' in file 17 (latter 'je' and 'dat' are repetitions)
FORM:
- Shadow word that has a pre- or postfix that the respective source word does
not have.
- Prefixes: ge-, be-, ver-, on-, ont-
- Postfixes: -en, -t, -te, -ten, -de, -den
SEMANTIC MISTAKE
Intuition:
If a source word is semantically related to a shadow word, it is assigned
this mistake type.
Formal definition:
A source word is matched with a shadow word and assigned this mistake type
if:
- the source and the shadow word are in Open Dutch Wordnet and
- they are synonyms according to Wordnet
OR
- one is the hypernym of the other, according to Wordnet
Examples:
In the following example, the word 'lopen' will be assigned the semantic
mistake:
Source: "De mensen lopen naar voren."
Shadow: "De mensen wandelen naar voren."
In the following example, the word 'bestek' will be assigned the semantic
mistake:
Source: "Leg het bestek op tafel."
Shadow: "Leg het mes op tafel."
SEMANTIC:
- Shadow word that is semantically similar to the source word that it belongs
to. Said out of of confusion. Semantic relatedness is determined by using
OpenDutchWordnet.
PHONETIC:
- 1 letter difference with a source word. Example: 'die' en 'de' in file 7
- Combination of this word and the next word looks like a source word. Example:
the combination 'fout' with 'zus', and 'fotootjes' in file 7.
- Any combination of shadow words looks like any combination of source words.
Example: "nextdoor is een" and "stoornissen" in file 9.
- Probably: more than 50% overlap between words.
Intuition:
If a source word is phonetically very similar to a shadow word, it is assigned
this mistake type.
Formal definition:
A source word is matched with a shadow word and assigned this mistake type
if:
- any possible phonetic representation of the source word is a phonetic
representation of the shadow word.
Examples:
In the following example, the word 'bureau' will be assigned the semantic
mistake:
Source: "De pen ligt op het bureau."
Shadow: "De pen ligt op het buro."
SKIPPED WORD
Intuition:
If a word in the source file is totally skipped by the participant during
the shadowing task, the word is assigned this mistake type.
Formal definition:
Source words that are not correctly shadowed, and to which no other mistake
type can be assigned, will be assigned the skipped word mistake.
Example:
In the following example, the word 'een' will be assigned the skipped word
mistake:
Source: "Dit is een voorbeeld."
Shadow: "Dit is voorbeeld."
********** "MISTAKES" FOR SHADOW WORDS: **********
REPETITION
Intuition:
If the same word occurs twice in a row in the shadow file, but only once in
the source file, the second occurence is labeled as a repetition.
Formal definition:
A shadow word is assigned this mistake type if:
- it is a repetition of the previous shadow word, and only one occurence is
in the source file.
OR
- it is the beginning of the next shadow word, and only the next shadow word
is in the source file.
OR
- it is the end of the previous shadow word, and only the previous shadow
word is in the source file.
OR
- one of the above, but than for more than one shadow word.
Examples:
In the following example, the word 'een' will be assigned the repetition
mistake:
Source: "Dit is een voorbeeld."
Shadow: "Dit is een een voorbeeld."
In the following example, the word 'toe' will be assigned the skipped word
mistake:
Source: "Je moet haar dat toesturen"
Shadow: "Je moet haar dat toe toesturen."
In the following example, the word 'staat' will be assigned the skipped word
mistake:
Source: "Zoiets bestaat toch niet?"
Shadow: "Zoiets bestaat staat toch niet?"
In the following example, the word 'je' and the word 'dat' will be assigned
the skipped word mistake:
Source: "Moet je je dat eens voorstellen."
Shadow: "Moet je je dat je dat eens voorstellen."
N.B.: If a combination of two words in the shadow is exactly equal to a source
word, or vice versa, then it is not a phonetic mistake. Instead, mark as
'correct'. Because this would be an inconsistency due to the speech to text
software.
N.B. 2: 'zij' and 'ze' are the same, just like 'jij' and 'je', and 'wij' and
'we' Could make an exception for these words.
RANDOM:
- Shadow words that do not fall into the above mistake categories, and are just
said randomly.
Intuition:
If a word in the shadow file is totally unrelated to any source word, the word
is assigned this mistake type.
Formal definition:
Shadow words that are not a correct shadow and to which no other mistake type
can be assigned, will be assigned the random word mistake.
********** "MISTAKES" FOR SOURCE WORDS: **********
Example:
In the following example, the word 'zeer' will be assigned the skipped word
mistake:
SKIPPED:
- Not really a mistake, but really totally skipped. If it is not tied with any
of the shadow words, so it is not correctly shadowed and not wrongly shadowed.
\ No newline at end of file
Source: "Dit is een belangrijk voorbeeld."
Shadow: "Dit is een zeer belangrijk voorbeeld."
Loading