Nijsen, T.J.P. (Thomas) · Alfen, T. van (Tanja) · 642e5fdc · 642e5fdc · 642e5fdc · 642e5fdc
--- a/Solution Concept/Kernel idea.py 0 → 100644

+ 65

− 0
+++ b/Solution Concept/Kernel idea.py 0 → 100644

+ 65

− 0
+""""This file contains my second idea for a solution concept to our problem.
+It is based on kernel methods as opposed to my last idea."""
+
+import numpy as np
+import re
+
+# Input sequences
+S1 = ["This", "is", "an", "example", "sequence", "This", "is", "the", "original"]
+onsets_1 = [0.0, 10.0, 15.0, 20.0, 35.0, 50.0, 60.0, 65.0, 70.0, 80.0]
+data_1 = [S1, onsets_1]
+
+S2 = ["This", "is", "another", "example", "sequence", "This", "is", "not", "the", "original"]
+onsets_2 = [5.0, 16.0, 20.0, 26.0, 39.0, 46.0, 52.0, 55.0, 60.0, 65.0, 72.0, 76.0, 83.0]
+data_2 = [S2, onsets_2]
+
+# Algorithm constants
+avr_latency = 15
+
+
+# Algorithm functions
+def in_time(w1_onset, w2_onset, threshold):
+    t1, t2 = w1_onset, w2_onset  # Get onsets for both words
+    delta = np.abs(t1 - t2)      # Get difference and use abs due to similarity of words before and after
+    penalty = np.sqrt(delta)/avr_latency  # Square root to not make penalty too large & normalize in terms of experiment
+    print("W1 onset: ", w1_onset, " W2 onset: ", w2_onset, " Penalty: ", penalty)
+
+    return penalty < threshold  # Return the decision of whether word 2 was in time of word 1
+
+
+def phi(w1, w2):  # String phi function for kernel methods, calculate how often w1 appears as substring of w2
+    return len([i.start() for i in re.finditer(w1, w2)])
+
+
+def similarity(w1, w2, seq1, threshold):
+    k = 0  # Kernel function answer
+    ws = 1  # hold all weights constant for now
+    for s in seq1:  # Calculate kernel function answer using feature mapping phi
+        k += ws * phi(s, w1) * phi(s, w2)
+
+    print("W1: ", w1, " W2: ", w2, " k: ", k)
+    return k >= threshold  # Return decision
+
+
+def alg(data1, data2, threshold1=0.5, threshold2=0.5):
+    words1, words2 = data1[0], data2[0]
+    onsets1, onsets2 = data1[1], data2[1]
+    shadowed = np.full(len(words1), False)
+    for idx1, w in enumerate(words1):
+        for idx2, v in enumerate(words2):
+            if not shadowed[idx1] and in_time(onsets1[idx1], onsets2[idx2], threshold1):
+                shadowed[idx1] = similarity(w, v, words1, threshold2)
+                print("Word: ", w, " was shadowed: ", shadowed[idx1])
+
+
+# Test 1
+alg(data_1, data_2, threshold2=1)
+
+""""In general the results seem to be promising. The in_time function does act like a sieve for words.
+    But probably results are very muddled due to making up the data by myself, so testing on some real
+    data would provide some good additional insight. Overall the calculation is fast and seems to only
+    require tuning.
+    The kernel function as well seems promising, I think it should be reworked somewhat to truly make it 
+    good (I copied an internet example code to save some time), but the idea seems to work. But again some
+    tuning seems to be required. But this reduces the problem purely to finding a good kernel function and
+    making that work. """