Skip to content
Snippets Groups Projects
Commit 0425a0cd authored by Luttikholt, T.J. (Thijs)'s avatar Luttikholt, T.J. (Thijs)
Browse files

I finalized the readme file by changing the images. Adding additional text for...

I finalized the readme file by changing the images. Adding additional text for clarity. And including new sections regarding the newly added review window
parent 5c339762
No related branches found
No related tags found
1 merge request!145I finalized the readme file by changing the images. Adding additional text for...
# Readme UMBRA
------
Umbra is an open source shadowing task analyzer. Its goal is to find and report participant performance during shadowing tasks provided by the user.
Umbra is an open source shadowing task analyzer. Its goal is to find and report participant performance during shadowing tasks provided by the user, in csv format.
### Installation
------
Here we write the instructions for installation.
1. Download the zip file umbra.zip
2. Unzip the umbra.zip file.
3. Navigate to the umbra folder.
4. Double-click on the executable file. Note that it may take some seconds for the program to start up, this is due to underlying structures.
### Getting started
### Usage main screen
------
##### Starting Umbra
Use the executable ......... to launch Umbra. The white Umbra-logo should appear. Wait for the Umbra application to load.
Use the executable to launch Umbra. The white Umbra-logo should appear. Wait for the Umbra application to load.
![weegschaal](docs/splash.png)
##### Algorithm selection
Umbra can work with two different implementations. Default is Anchor, a task-specific implementation, which is fast. A Needleman-Wunsch algorithm implementation exists as well, which is slow. An advantage of the Needleman-Wunsch algorithm is that it is a well-known algorithm.
Umbra can work with two different implementations. The default is the Anchor algorithm, which is a task-specific implementation. The second implementation is the Needleman-Wunsch algorithm, which is slower than the anchor algorithm. An advantage of the Needleman-Wunsch algorithm is that it is a well-known algorithm.
![algorithm selection](docs/umbra_algorithm_selection.jpg)
![algorithm selection](docs/umbra_algorithm_selection.png)
##### File selection
Umbra can read in single or multiple source and shadow files, using the 'add source file' and 'add shadow file' buttons. It is also possible to read in an entire folder, using the add folder buttons. Selected files are visible in the display. At the moment, Umbra only supports CSV files as input (.csv).
Umbra can read in single or multiple source and shadow files, using the 'add source file' and 'add shadow file' buttons. It is also possible to read in an entire folder, using the 'add source folder' and 'add shadow folder' buttons. Selected files are visible in the display. At the moment, Umbra only supports CSV files as input (.csv). See the 'important: file structure' section for more details regarding structure and naming of the files.
![adding files](docs/umbra_adding_files.jpg)
![adding files](docs/umbra_adding_files.png)
The selected files are visible in the selected files dropdown.
![currently selected](docs/umbra_selected_files.jpg)
![currently selected](docs/umbra_selected_files.png)
##### Deselecting files
In case any incorrect files were selected, remove the selected shadow or source file using the 'remove' button. All selected files can be removed using the remove all files option.
In case any incorrect files were selected, remove the selected shadow or source file using the 'remove' button. All selected files of a given type can be removed using either the 'clear source files' button or the 'clear shadow files' button.
![deselecting files](docs/umbra_deleting_files.jpg)
![deselecting files](docs/umbra_deleting_files.png)
##### Running analysis
After selecting the correct source and shadow files for the shadowing task you wish to compare, you can run the analysis using the 'Compare' button. The analysis is complete if the message 'Comparison completed!' appears. Analysis can be safely stopped at any time during the process, using the 'stop' button.
After selecting the source and shadow files for the shadowing task(s) you wish to compare, you can run the analysis using the 'Compare' button. The analysis is complete if the message 'Comparison completed!' appears in the messages box.
![running comparison](docs/umbra_comparing_files.jpg)
![running comparison](docs/umbra_comparing_files.png)
##### Saving results
After completing the analysis, the results can be saved to a csv file at any location on your device, using the 'Save result' button.
After completing the analysis, the results can be saved to a .csv file at any location on your device, using the 'Save result' button.
![saving results](docs/umbra_save_results.jpg)
![saving results](docs/umbra_save_results.png)
##### Messages
All messages are shown in the message bar.
All messages are shown in the messages box.
![messages](docs/umbra_message_bar.jpg)
![messages](docs/umbra_message_bar.png)
##### Reviewing results
After completing the analysis, the 'review results' button can be used to open the review window in which the user can manually change the mistake types.
![review results](docs/umbra_review_results.png)
### Usage review window
-----
##### Review window information
The review window shows each combination of source and shadow word that was marked to be a mistake. For each mistake, it shows: the participant number and condition, the source word and its onset, the shadow word and its onset, the original type of mistake, and the reviewed type of mistake.
##### Review window changing mistake type
In order to change the mistake type of a certain source-shadow combination, the dropdown menu in the upper left can be used. This menu initially shows the original marked mistake, and contains all possible mistake types.
![change mistake type](docs/umbra_change_mistake.png)
##### Review window saving
In order to save the changes that have been made, the 'save' button can be used. This button saves the changes in the internal data structures.
![review saving](docs/umbra_save_changes.png)
##### Review window quitting
In order to close the review window, the 'quit' button can be used. Note that pressing this button does not save any changes that have been made.
![review quitting](docs/umbra_quit_window.png)
##### Review window messages box
Messages regarding whether saving was successful are shown in the messages box of the review window.
![review messages](docs/umbra_review_messages.png)
### Important: file structure
-----
##### File naming
The program follows strict naming conventions:
For the source files: 'trial name'.TextGrid.csv   where 'trial name' can be any given name.
For the shadow files: 'trial name'\_'condition''participant name'.TextGrid.csv   where 'trial name' can be any given name. 'condition' can be any of the options: 'AB','AO' or 'AV'. 'participant name' can be any participant name.
Note that the capital letters are important. Not using capital T and G in TextGrid renders the file unusable.
##### File structure
The structure of the file can be seen in the table below. For each column where 'X' is specified as the needed contents, it does not matter what the contents are, yet the columns are still required. The table shows the approximate structure of the file, note that the file itself should be in .csv format.
| Column number | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|-----------------|---|---|---|---|-------------------|--------------------|-------------|
| Needed contents | X | X | X | X | Word onset (time) | Word offset (time) | Word (text) |
### Main algorithms of the program
-----
##### Anchor algorithm
Anchor algorithm (anchor_algorithm.py):
This algorithm aligns the shadow with its source by setting words that are very likely to be correct shadows as anchors. Next, it aligns the words between the anchors. Finally, it classifies the mistakes. In more detail:
This algorithm aligns the shadow with its source by marking words that are very likely to be correct shadows as anchors. Next, it aligns the words between the anchors. Finally, it classifies the mistakes. In more detail:
1. Words that occur once in the source and once in the shadow, and are thus unique in both files, are marked as possible anchors.
2. The possible anchor pairs are marked as anchors if one of the following conditions holds:
* The length of the word in the pair is 6 or more characters, and onset of the shadow happens within 3 seconds after the onset of the source.
* The length of the word in the pair is 3, 4 or 5 charactes, and the onset of the shadow happens within 1.5 seconds after the onset of the source.
All pairs that do not satisfy one of the above conditions is discarded.
* The length of the word in the pair is 3, 4 or 5 characters, and the onset of the shadow happens within 1.5 seconds after the onset of the source.
All pairs that do not satisfy one of the above conditions are discarded.
3. The remaining non-anchor source words are aligned with the shadow words that occur on the same interval between two anchor pairs.
* The source word following the first anchor pair is picked. It is compared with the first shadow word that occurs after the shadow of the anchor pair.
* If the two words are equal and the onset of the shadow happens within 0.05 and 3 seconds after the onset of the source, then they are an aligned pair.
* If the source and shadow word do not satisfy the above condition, then the same source word is compared with the next shadow word.
* If the source and shadow words do not satisfy the previous condition, then the same source word is compared with the next shadow word.
* This is repeated until a shadow word satisfies the above condition, or until the next shadow anchor is reached. In either case, the next source word will be picked, and cycle starts again (so it will be compared to all shadow words between the two anchor pairs).
* If two or more source words are equal to one another, but there are less than that amount in the shadow, then the shadow and source are aligned in such a way that the onset differences are smallest.
4. All the aligned words (anchors and between-anchors) are labeled as correct.
5. Mistake types are determined for the non-correct (= false, not aligned) words (Mistake definitions can be found separately in this read me):
* Loop over the shadow words. If a shadow word is not flagged as correct, then start assessing its mistake type.
* If it is a repetition mistake, mark this shadow word and the other shadow words that make the repetition as REPETITION. Else, check for form mistake.
* If it is a form mistake, then mark this shadow word and its corresponding source word as FORM. Else, check for semantic mistake.
* If it is a semantic mistake, then mark this shadow word and its corresponding source word as SEMANTIC. Else, check for phonetic mistake.
* If it is a repetition mistake, mark this shadow word and the other shadow words that make the repetition as REPETITION. Otherwise, check for form mistake.
* If it is a form mistake, then mark this shadow word and its corresponding source word as FORM. Otherwise, check for semantic mistake.
* If it is a semantic mistake, then mark this shadow word and its corresponding source word as SEMANTIC. Otherwise, check for phonetic mistake.
* If it is a phonetic mistake, then mark this shadow word and its corresponding source word as PHONETIC.
* If the current shadow word is none of the above mistakes, mark it as RANDOM.
* Loop over the source words. If it is not flagged as either correct or as a certain mistake, flag it as SKIPPED.
......@@ -84,7 +131,7 @@ This algorithm aligns the shadow with its source by following the Needleman-Wuns
2. The algorithm updates the matrix's values and pointers going from left-to-right and top-to-bottom. A specific point's value is determined according to the equation: M(i,j) = maximum[M(i-1,j-1)+S,M(i-1,j)+W,M(i,j-1)+W].
Here, M(i,j) is the value of the point at column i and row j. S is the alignment score of the square to be updated, and W is the penalty for inserting a gap.
* Note that in the implementation used by this program, the value of S is dependent on whether the word is correctly shadowed. If the word is not correctly shadowed, the value of S also depends on which type of mistake has been made (form, semantic, phonetic or repetition). If the words are misaligned, the value of S is zero.
* Note that in the implementation used by this program, the value of S is dependent on whether the word is correctly shadowed. If the word is not correctly shadowed, the value of S also depends on which type of mistake has been made (form, semantic, phonetic or repetition). If the words are misaligned, the value of S is zero. The user can change the values of S for the different mistake types, but this needs to be done in the program itself.
3. Based on these scores, the program traces back through the matrix according to the NW algorithm. In tracing back, if two words are aligned based on one of the 4 mistake types mentioned in the previous point, it is checked which of the 4 mistakes was actually used. Based on this check, the type of mistake is bound to the 2 words and they are coupled with each other.
......@@ -111,19 +158,19 @@ in another plurality, it is assigned this mistake type.
Formal definition:
A source word is matched with a shadow word and assigned this mistake type if
* it starts with a prefix from prefixes and is identical to the shadow word
if the prefix is replaced by another prefix from prefixes.
* it starts with a prefix from *prefixes* and is identical to the shadow word
if the prefix is replaced by another prefix from *prefixes*.
OR
* it ends with an affix from affixes and is identical to the shadow word if
the affix is replaced by another affix from affixes
* it ends with an affix from *affixes* and is identical to the shadow word if
the affix is replaced by another affix from *affixes*.
where:
prefixes = ['', 'ge', 'be', 'ver', 'on', 'ont']
affixes = ['', 'en', 't', 'te', 'ten', 'de', 'den', 's', "'s"]
*prefixes* = ['', 'ge', 'be', 'ver', 'on', 'ont']
*affixes* = ['', 'en', 't', 'te', 'ten', 'de', 'den', 's', "'s"]
A source word is also matched with a shadow word and assigned this mistake
type if
* it is a irregular verb, and source and shadow word belong to the same
* it is an irregular verb, and source and shadow word belong to the same
imperative and are in umbra\resources\irregular_verbs.csv
OR
* it is a regular verb, and identical to the shadow word if put in another
......@@ -192,7 +239,7 @@ if:
representation of the shadow word.
Examples:
In the following example, the word 'bureau' will be assigned the semantic
In the following example, the word 'bureau' will be assigned the phonetic
mistake:
Source: "De pen ligt op het bureau."
......@@ -243,20 +290,20 @@ mistake:
Source: "Dit is een voorbeeld."
Shadow: "Dit is een een voorbeeld."
In the following example, the word 'toe' will be assigned the skipped word
In the following example, the word 'toe' will be assigned the repetition
mistake:
Source: "Je moet haar dat toesturen"
Shadow: "Je moet haar dat toe toesturen."
In the following example, the word 'staat' will be assigned the skipped word
In the following example, the word 'staat' will be assigned the repetition
mistake:
Source: "Zoiets bestaat toch niet?"
Shadow: "Zoiets bestaat staat toch niet?"
In the following example, the word 'je' and the word 'dat' will be assigned
the skipped word mistake:
the repetition mistake:
Source: "Moet je je dat eens voorstellen."
Shadow: "Moet je je dat je dat eens voorstellen."
......@@ -272,7 +319,7 @@ Shadow words that are not a correct shadow and to which no other mistake type
can be assigned, will be assigned the random word mistake.
Example:
In the following example, the word 'zeer' will be assigned the skipped word
In the following example, the word 'zeer' will be assigned the random
mistake:
Source: "Dit is een belangrijk voorbeeld."
......@@ -290,12 +337,10 @@ Developer: T. Nijsen
Programmer: J. Verbeek
Copyright 2019
Umbra was developed as part of the course 'Modern Software Development Techniques' at the Radboud University Nijmegen, The Netherlands. This course was provided by Dr. F. Grootjen. Umbra was created as an open source application, however it was created in collaboration with a client from industry. (Is this enough anonymity)
Umbra was developed as part of the course 'Modern Software Development Techniques' at the Radboud University Nijmegen, The Netherlands. This course was provided by Dr. F. Grootjen. Umbra was created as an open source application, however it was created in collaboration with a client from industry.
### License
-----
Umbra is an open source shadowing task analyzer. Its goal is
to find and report participant performance during shadowing tasks
provided by the user.
Umbra is an open source shadowing task analyzer. Its goal is to find and report participant performance during shadowing tasks provided by the user, in csv format.
Copyright (C) 2020 E. Vriezen, T. van Alfen, T. Luttikholt,
R. Haak, L. Hees, T. de Valk, T. Nijsen, J. Verbeek
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment