Data Collection

Nov 22, 2019
1 min read

The data corpus for experiment comprises datasets of varied vocabulary sizes, which is created by collecting data during two main phases.

First, researchers invite three participants (two male, one female, the average age of 29.33 years) join the pilot study to investigate the feasibility of signal detection and to determine electrode positioning. The preliminary dataset recorded with the participants was binary, with the word labels being yes or no. The vocabulary set size gradually increases with more words. In sum, the data collected during the study has about 5 hours of internally vocalized text.

In the second phase, a data corpus was created to train a classifier (same 3 participants). The corpus has about 31 hours of internal speech text recorded in different sessions to be able to regularize the recognition model for session independence. The corpus comprises of multiple datasets. In one category, the word labels are numerical digits (0-9) along with fundamental mathematical operations (times, divide, add, subtract and percent) to facilitate externalizing arithmetic computations through the interface. They use the external trigger signal to slice the data into word instances. In each recording session, signals were recorded for randomly chosen words from a specific vocabulary set. This data is used to train the recognition model for various applications like World Clock, Calendar, Chess Game.

Data Collection

Recent Posts

Comments

Join My Mailing List

lv.theresa@gmail.com