## Abstract

One of the expectations for the next generation of industrial robots is to work collaboratively with humans as robotic co-workers. Robotic co-workers must be able to communicate with human collaborators intelligently and seamlessly. However, industrial robots in prevalence are not good at understanding human intentions and decisions. We demonstrate a steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) which can directly deliver human cognition to robots through a headset. The BCI is applied to a part-picking robot. The BCI sends decisions to the robot while operators visually inspecting the quality of parts. The BCI is verified through a human subject study. In the study, a camera by the side of the conveyor takes photos of each industrial part and presents it to the operator automatically. When the operator looks at the photo, the electroencephalography (EEG) is collected through the BCI. The inspection decision is extracted through SSVEPs in EEG. When a defective part is identified by the operator, the signal is communicated to the robot, which locates the defective part by a second camera and removes it from the conveyor. The robot can grasp various part with our random grasp planning algorithm (2FRG). We have developed a CNN-CCA model for SSVEP extraction. The model is trained on a dataset collected in our offline experiment. Our approach outperforms the existing CCA, CCA-SVM, and PSD-SVM models. The CNN-CCA model is further validated in an online experiment and achieved 93% accuracy in identifying and removing defective parts.

## 1 Introduction

Robots assist humans in various fields, including manufacturing industry, surgery, and activities of daily living. As they become more and more intelligent, robots are engaging in many new tasks in collaboration with humans. Human-robot collaboration refers to a collaborative process in which humans and robots work together to achieve a common goal. By collaborating with humans, robots can handle the automation problems that require a large amount of human knowledge, complex robot motion plans, or reprogramming for various objects. However, in the current human-robot collaborative interactions, human operators have to operate robots while processing the human part of the task. Ideally, we hope robots can maintain a tacit understanding with operators as “robotic co-workers.” When operators make decisions, the robots should directly understand the decisions and take appropriate actions. In this way, robots will function seamlessly with operators without extra manual operations.

A direct way to obtain human decisions is to read information from the brain. Human brain activities are accompanied by bioelectrical signal. The signal can be measured from the scalp and known as electroencephalography (EEG). EEG measures voltage fluctuations resulting from ionic current within the neurons of the brain [1]. By analyzing and processing EEG, humans can control external devices with brain activities. Since 1988, brain-computer interfaces (BCIs) have been built for controlling robots [2]. When performing tasks, human decisions can be sent directly to robots through BCIs. Therefore, by combing EEG-based BCIs with robots, robotic co-workers can obtain the decisions of operators through EEG when the operators are concentrating on the task without making any physical operation on the robot. While working with EEG-based robotic co-workers on industrial tasks, EEG can reveal the intentions, decisions, and mental status of workers, thereby reducing the labor and knowledge for operating the robots. Meanwhile, EEG-based robotic co-workers can provide work opportunities for people with disabilities who want to engage in industrial jobs and realize their social value.

In this study, we demonstrate an EEG-based BCI for a part-picking robotic co-worker. The robot picks up defective parts based on the decisions directly collected from the brain activities of the operator while the operator inspecting the qualities of the parts. The development details of object extraction, robot motion planning, EEG acquisition, and EEG processing are described in the technical approach section. We propose a CNN-CCA method to improve the SSVEP classification accuracy. The model is validated and compared with existing methods in an offline experiment, and the BCI is further tested in an online experiment.

## 2 Background

The challenges for the development of robotic co-workers come from two aspects: physical human-robot interaction and cognitive human-robot interaction. The most critical problem to be solved by physical human-computer interaction is how to ensure human safety while collaborating with robots. Some of the work in hardware design (e.g., lightweight robots) [3] and safety actuation [4,5] greatly reduced the possible damage caused by human-robot collisions. In terms of software, robots are programmed to avoid collisions or actively react in collisions [6,7]. Other contributions include safety and production optimization [8,9], human safety quantification [1012], etc. Regarding cognitive human-computer interaction, research mainly focuses on new human-computer interaction and human intention prediction. Gemignani et al. [13] developed a robot-operator dialogue interface that allows non-expert operators to interact with the robot through voice without knowing any internal representation of the robot. Sheikholeslami et al. [14] and Gleeson et al. [15] studied the intuition of human observers on gestures to explore human-robot interaction gestures. However, with the use of voice and gesture interfaces, operators still need to operate the robot in collaboration. To eliminate human operation, Beetz et al. [16] applied artificial intelligence to analyze the operator’s intentions by predicting the operator’s motion. Oshin et al. [17] predicted a set of future actions of human through the Kinect video stream using a convolutional neural network (CNN) model.

Studies show that humans can achieve multi-dimensional robot control through BCIs with many different strategies and input modalities. Invasive BCIs can be used to implement accurate and complicated robot controls. Vogel et al. [18] demonstrated that human subjects could continuously control a robot arm to retrieve a drink container through an invasive BCI named BrainGate. However, invasive BCI requires surgery to place a chip on the brain. The EEG-based Non-invasive BCIs are capable of controlling various devices with a wearable headset. Edlinger et al. [19] built a virtual smart home where devices like TV, MP3 player, and phone, can be controlled through a BCI using P300 and steady-state visually evoked potentials (SSVEPs). Riaz et al. [20] and Yin et al. [21] developed a BCI language communication tool using P300 speller and speech imagery. In robot controls, Ying et al. [22] built an on-line robot grasp planning framework using a BCI to select the grasping target and grasping pose. Hortal et al. [23] trained a robot to touch one of the four target areas by detecting four different mental tasks. Gandhi et al. [24] and LaFleur et al. [25] proposed a mobile robot and quadcopter control interface for 2D and 3D navigation through motor imagery.

In our preliminary study [26], we demonstrated an EEG-based BCI for a robotic co-worker to pick up defective parts from a conveyor while the operator checking the quality of the parts. In this paper, we summarize the previous work and extend it by improving robot grasp planning, EEG classification, and robotic control.

## 3 Technical Approach

### 3.1 Overview.

The part-picking robotic co-worker integrates a manipulator (a 5-Dof KUKA youBot arm with a two-finger gripper), two cameras (two Logitech HD cameras), a DC motor conveyor, and an EEG-driven BCI as shown in Fig. 1(a). The BCI includes an EEG headset (B-Alert X24, 20 EEG channels) and a LED monitor for stimuli generation. The camera beside the front end of the conveyor (refer as the front camera in the following paragraphs for short) is used to detect the moment when a new industrial part is loaded onto the conveyor and then takes a photo of it at the same time. The monitor displays the photo to the operator as the visual stimuli. The operator inspects the quality of the part through the photo. The inspection result collected and analyzed from the EEG is sent to the robot as a removal decision. If notified that the part is defective, the robot will pick the part out of the conveyor. The camera beside the rear end of the conveyor (called the rear camera) helps the robot find and pick up the defective part.

Fig. 1
Fig. 1
Close modal

The detailed workflow is presented in Fig. 1(b). Once the front camera detects a new part, the part will be registered in the Log thread as the ith part. The front camera takes a photo for it and stores as P(i). The monitor is programed to have four blocks arranged as a 2 × 2 matrix to display photos. P(i) will be displayed in one of the un-occupied blocks. If all blocks are occupied, the monitor will clear a block to make it available. The photo will be displayed for 10 s and then automatically cleared from the block. While the photo is displaying, the operator can inspect the quality of the part through the monitor. Meanwhile, the operator’s neural signal, i.e., the EEG, is collected by the EEG-headset. The EEG is interpreted to be a binary decision d(i) and stored in the Log thread as well. Here, we have d(i) = 0 representing the part is qualified and d(i) = 1 for the part is defective. When the part moves to the rear end of the conveyor, and if d(i) = 1, the rear camera will extract its position μ(i) = (x, y) in real time to provide a close-loop feedback for the youBot to pick it up. The robot picks the part following a grasping plan G(i), which is generated when the part passes the front camera.

The operator sits in front of the monitor and inspects the qualities of parts through photos. Once he/she identifies a defective part, the operator should stare at the photo until it is marked with a green square (detection succeed) or until the photo vanishes (detection failed). If the part is qualified, the operator should avoid staring at the photo for more than 2 s. Otherwise, it may lead to a false positive for defective part identification.

### 3.2 Part-Detection.

We use the threshold method to extract parts from the background for the photos taken by the front and rear cameras. In pre-processing, the morphologically open with structural element of the disk (10 pixels radius) is applied to eliminate lighting effect. Pixels with intensity lower than 0.2 are cluster as the background. Then, we remove the connected regions with area less than 20 pixels to reduce noise. The rest regions are considered as the extracted parts.

We use a binary signal to perceive when a part is loaded onto the conveyor and when the part passes the end of the conveyor. The binary signal is defined as the projection of the extracted regions along the conveyor’s moving direction. In the front camera, when a new step-down occurs, we record that a new part has been loaded onto the conveyor. Similarly, in the rear camera, if there is a new step-down in the binary signal, we record that a part has passed the end of the conveyor. Additionally, to split multiple parts in the same photo, we cut the photo into small segments at the midpoint of each step-down to step-up, so that the extracted regions are segmented into several sub-regions with each sub-region contains only one part. We denote the sub-regions which contains only the ith part as S(i). This is a simple way to track the number of parts on the conveyor, but it requires that the parts must be placed with some space between each other along the conveyor’s moving direction.

### 3.3 Grasping Planning.

The geometric center μ(i) of each part can be easily tracked by averaging all the pixels in S(i). To obtain a generalized grasping plan for various industrial parts, we developed our own grasping algorithm called two-finger gripper random grasp (2FRG). It can generate robust 3-Dof picking gestures for the two-finger gripper to pick various objects. It samples multiple lines as possible finger-moving directions randomly. Along each line, all the possible finger grasping positions are searched. We define the grasping position as the position that both fingers touch the object and there is enough space to insert the fingers. Among the grasping positions, only the positions can construct firm grasps will be kept. A firm grasp requires each finger to have at least two points on both edges touching the object or at least one point in the middle touching the object. We use the distance between the midpoint of the grippers and μ(i) as the score to evaluate the grasp. The algorithm 2FRG returns the firm grasp with the highest score. A demonstration of the firm grasp is shown in Fig. 2.

Fig. 2
Fig. 2
Close modal

In Fig. 3, grasping plans are calculated for five different parts with 200 grasping samples for each part. The optimal firm grasp for each part is shown in Fig. 3(c). The grasping positions are close to geometric centers and the grippers grasp the objects in comfortable directions. The pseudocode for 2FRG is presented in Algorithm 1. The grasping plan G(i) = (φ, δ) is constructed by the grasping direction φ and the offset δ from the grasping position to μ(i). Here, φ is the angle of the vector g1g2 and δ = g1/2 + g2/2 − μ, where g1 and g2 are midpoints of the inner sides of the fingers.

Fig. 3
Fig. 3
Close modal

#### Algorithm 1 Two-finger Gripper Random Grasp (2FRG)

1. Inputs:

2.   S(i): extracted region of one object stored in the binary image I

3.   N: number of maximum sampling times

4.   L: geometric constrains of a gripper

5. Output:

6.   (g1, g2): positions of gripper fingers

7.

8. Do for N times

9. {

10.  Uniformly sample two points q1 and q2 in I

11.  Create a line $l∈I$ passing q1 and q2 and let $g1,g2∈l$

12.  ${lf}←$ The collection of all line segments of l such that L is satisfied if a gripper g1 or $g2∈lf$

13.  If {lf} is not empty, then

14.  {

15.   $Ggp←$ The collection of (g1, g2) with $g1∈lf1$, $g2∈lf2$ and $lf1,lf2∈{lf}$ such that (g1, g2) is a grasping position.

16.   $Gfg←$ The collection of $(g1,g2)∈Ggp$ such that (g1, g2) is a firm grasp.

17.  }

18. }

19. Return the $(g1,g2)∈Gfg$ with the highest score, where $score=−‖g1/2+g2/2−mean(S(i))‖2$

### 3.4 Robot Motion Planning and Control.

If the ith part is defective and it enters the view of the rear camera, the robot will pick it out based on the grasping plan G(i). G(i) is calculated from the photo taken by the front camera and μ(i) is calculated from the photo obtained by the rear camera in real time. The robot will move to a preset height above the conveyor with its last joint perpendicular to the conveyor surface in advance. Then, it tracks the part and moves downward to the conveyor surface. The robot velocity in the horizontal directions vxy, vertical direction vz, and angular velocity of the last joint α are controlled by
$vxy=K1(μ(i)+δ−pxy)+K2μ˙(i)+K3(μ¨(i)−v˙xy)$
(1)
$vz=k4(zi−pz)−k5v˙z$
(2)
$α=k6(φ−θ)−k7α˙$
(3)
Here, pxy and pz are the position of the robot end-effector in horizontal axes and vertical axis, θ is the robot last joint angle, zi is the height of the conveyor surface, and K1, K2, K3, k4, k5, k6, k7 are control gains, where K1, K2, and K3 are 2 × 2 diagonal matrixes.

### 3.5 Electroencephalography Acquisition.

Our EEG-based BCI is a non-invasive implementation. Non-invasive BCIs yield lower performance than invasive BCIs, but they are easy to wear and require no surgery. In our system, EEGs of operators are collected through a B-Alert X24 headset (Advanced Brain Monitoring, Carlsbad, CA), which has 20 electrodes positioned following the 10–20 system and a pair of reference channels. The sampling frequency is 256 Hz. The device is minimalistic and can be comfortably worn for an hour at a time without rewetting or reseating of the electrodes.

The decisions of the operator are identified through SSVEPs. SSVEPs are natural EEG responses to visual stimulation at specific frequencies. The signal can be triggered by looking at a flicker that flashes on a constant frequency. The EEG response will have a signal component at the same frequency as the flicker. Conversely, by monitoring frequency components of the operator’s EEG, the system can recognize the flashing frequency of the flicker. Based on the property of SSVEPs, we flash the photos of parts in different frequencies on the monitor. From the EEG, we can recognize which photo the operator is staring at. When the operator did not see any defective part on the monitor, he/she should avoid staring at a single photo to make sure there is no significant frequency components corresponding to any of the displaying photo. We call this situation the idle state. On the other hand, when a defective part is detected, the operator should stare at its photo until the photo is marked. In this case, a significant frequency component will be found at the EEG which has same frequency as the flashing frequency of the photo.

The monitor is programed to display the visual stimuli for generating SSVEPs. It displays photos in four square blocks. As shown in Fig. 4, the size of the blocks is 300 pixels × 300 pixels. The four blocks flash at 6 Hz, 6.67 Hz, 7.5 Hz, and 8.57 Hz, respectively. Once a new part is observed by the front camera, its photo will be displayed in one of the blocks and flashes at the frequency corresponding to the block. The photo of the next observed part will be displayed in the next available block. Each photo presents on the monitor for 10 s. However, the monitor can display up to four photos at the same time. When a new part is observed but all the blocks are currently occupied, the first displayed photo will be cleared to make an available block for the new photo. The flashing is programed on Windows Direct-X.

Fig. 4
Fig. 4
Close modal

### 3.6 Electroencephalography Processing.

In our previous study, we demonstrated five SSVEP classification methods, which are canonical correlation analysis (CCA) [27], individual templated-based CCA (IT-CCA) [28], support vector machine method (SVM), power spectral density-based SVM method (PSD-SVM) [29], and CCA-based SVM method (CCA-SVM) [26]. The validation dataset was collected in our offline experiment. The study illustrated that our proposed CCA-SVM method significantly outperformed the other comparison methods. However, the accuracy of the CCA-SVM method still not satisfactory especially for industrial applications. To further improve performance, we consider using neural network models. In a previous study, Nik et al. [30] showed that convolutional neural networks (CNNs) and linear models perform significantly better than long short-term memory (LSTM) for SSVEP classification. Because EEG has low signal-to-noise ratio and few training samples, pure LSTM and CNN models are easily overfitted. In our previous study, our model-driven CNN model Conv-CA [31], which combines the CNN structure and CCA, achieved the best performance on a 40-target SSVEP benchmark dataset [32]. However, the offline dataset we collected for this application is not at the same phase as required by the Conv-CA model. Thus, we introduce a new SSVEP classification method CNN-CCA as a derivative of the Conv-CA model to address the phase issue in the dataset. CNN-CCA provides a great performance boosts from previous tested methods. The performance of our new proposed CNN-CCA method is verified by comparing with CCA, PSD-SVM, and CCA-SVM methods.

#### 3.6.1 Canonical Correlation Analysis Method.

CCA is the most widely used classification method in the current SSVEP-based BCI applications. The method seeks a spatial filter, which combines the EEG collected from multiple channels, to maximizes the correlation between the combined EEG and a group of artificial sine and cosine signals. Assume $X∈RNs×Nc$ is a piece of EEG data with Ns sampling points collected from Nc channels. $Yn∈RNs×2Nh,(n=1,2,…,N)$ is a group of artificial reference signals corresponding to the nth stimuli frequency fn as
$Yn=[cos(2πfnt)sin(2πfnt)⋮cos(2πNhfnt)sin(2πHhfnt)]T,t=[1fs2fs…Nsfs]$
(4)
where Nh is the number of harmonics and fs is the sampling frequency. The CCA finds weights wx and wy maximizing the canonical correlation
$ρ=CCA(X,Y)=maxwx,wyE[wxTXTYwy]E[wxTXTXwx]E[wyTYTYwy]$
(5)
To classify the frequency of the input EEG signal, CCA calculates the canonical correlations of the input with different artificial reference signals by ρn = CCA(X, Yn). The base frequency of the reference signal that achieves the maximal canonical correlation in CCA is the classified stimuli frequency, i.e.,
$fn*=argmaxfnρn,n=1,2,…,N$
(6)
In our application, we use four SSVEP frequencies, i.e., 6 Hz, 6.67 Hz, 7.5 Hz, and 8.57 Hz, corresponding to the four flashing blocks on the monitor. There is one more signal we need to classify which is the idle state. Therefore, the EEGs need to be classified as one of the five classes, i.e., SSVEP in 0 Hz (idle state), 6 Hz, 6.67 Hz, 7.5 Hz, and 8.57 Hz. However, the standard CCA method can only find non-zero frequencies. To classify the idle state, we extend the standard CCA by thresholding the maximal canonical correlation. Thus, the Eq. (6) becomes
$fn*={argmaxfnρn,ifmaxρn>δn0,otherwise$
(7)
The threshold δn is searched in the training dataset such that the δn maximize the classification accuracy of the training dataset.

#### 3.6.2 SVM-Based Methods.

The PSD-SVM method uses power spectra densities of EEGs as features of a standard SVM model. The SVM model classifies EEGs into five classes. Similarly, the CCA-SVM method uses the canonical correlations of CCA as features. The features are $Φ={ρnh,ρα}$, n = 1, 2, …, N, h = 1, 2, …, Nh, $α=1,2,…,Nα$, where
$ρnh=CCA(X,[cos(2πhfnt)sin(2πhfnt)]),ρα=CCA(X,[cos(2πfαt)sin(2πfαt)])$
(8)
Two frequencies 8 Hz and 10 Hz are chosen for $fα$. The features $ρα$ measure brain activities in the alpha-band, which reflect the idle activities of the brain. The previous study [26] showed that adding $ρα$ can enhance the idle state classification accuracy.

#### 3.6.3 Convolutional Neural Network Canonical Correlation Analysis Method (CNN-CCA).

The CNN-CCA method combines convolutional neural network (CNN) and CCA. The CNN structure convolutes multi-channel EEGs in short sampling intervals to construct a single-channel signal. The CCA layer at the end of the CNN framework eliminates the noise in the signal and extracts frequency features of the signal. The CNN-CCA model takes the EEG data $X∈RNs×Nc×1$ as input. It applies a three-layer CNN convoluting X to $x¯=f(X),x¯∈RNs×1$. Here, f(.) is the three-layer CNN. Then, a CCA-layer is added behand the CNN-layer. The CCA-layer applies standard CCA to $x¯$ as $ρn=CCA(x¯,Yn)$, where Yn is the reference signal corresponding to the n-th stimuli constructed as Eq. (4) with Nh = 8, N = 5, f1 = 0 (idle state), f2 = 6, f3 = 6.67, f4 = 7.5, and f5 = 8.57. Note that, by Cauchy–Schwarz inequality, the maximum correlation in the CCA layer can be calculated as
$ρn=CCA(x¯,Yn)=x¯TYnYnTx¯x¯Tx¯YnTYn,n∈[1,N]$
(9)
Then, the output of the CCA layer is
$z=g(x¯,Y)=[CCA(x¯,Y1),CCA(x¯,Y2),…,CCA(x¯,YN)]=[ρ1,ρ2,…,ρN]∈RN$
(10)
Here, g(.) is the CCA-layer. We use a dense layer with N units and softmax activation function as the final layer before classification.

Because the CCA-layer can provide non-linear operations, the CNN layers are active by linear activation functions. The first layer of the CNNs has 16 filters of 16 × 4 kernels. It convolutes EEGs in all input channels (Nc = 4) in a short local time period (16 sampling points or 23.4 ms). The second layer combines the 16 filters in the first layer together It uses 1 × 4 kernels to weight EEGs from different channels. The third layer applies an 1 × 4 kernel with no padding (The first and second layers use zeros paddings to keep outputs the same size as the inputs.) to transform the data $X∈RNs×Nc×1$ into a one-dimensional signal $x¯∈RNs×1$. At the end of the CNN layers, we apply a dropout with dropping rate 5% to $x¯$ for regularization. The detailed structure is shown in Fig. 5.

Fig. 5
Fig. 5
Close modal

The CNN-CCA is implemented in python–Keras with tensorflow backend. We use categorical cross-entropy as the loss function. The optimization is solved with Adam algorithm (learning rate (1e-4), beta1 (0.9), beta2 (0.999), gradient clipping (5)) with batch size 32.

## 4 Experimental Setup

We established an offline experiment to test the performances of the above SSVEP classification methods. The experiment required subjects to stare at a flashing photo for 15 s in each trial. The photo flashed at one of the frequencies of 0 Hz, 6 Hz, 6.67 Hz, 7.5 Hz, and 8.57 Hz. The experiment took five runs with five trials in each run. During the experiment, subjects wearing EEG-based BCI headset were required to stay still and blink as less as possible. Five subjects (age 25–35 years, four males, one female) attended the experiment.

After the offline experiment, subject 1, 2, and 3 participated in the online experiment. The online experiment required subjects to select 2 defective industrial parts from 10 parts which were manually placed on the conveyor by another operator in random order. The online experiment took three runs. All subjects successfully accomplished the task. Figure 6 shows the user interface and hardware during the on-line experiment.

Fig. 6
Fig. 6
Close modal

### 4.1 Results.

Classification accuracies of all the methods on 4 different data lengths (also called time windows), i.e., 0.5 s, 1.0 s, 1.5 s, and 2.0 s, were used to evaluate the performances. The data were extracted with a step of 0.15 × time window length. We compared our proposed CNN-CCA method with CCA, PSD-SVM, and CCA-SVM using leave-one-out cross validation. Specifically, one of the five trials of the EEG data was used as test data and the other 4 trials were used as the training dataset. We repeated the process five times so that every trial is tested.

As shown in Fig. 7, the performance of the CNN-CCA was found to be superior to the other three comparison methods across all five subjects in all tested time window lengths. Table 1 lists the average classification accuracies across all subjects. Compared to the most commonly used CCA method, the CNN-CCA improved the average classification accuracies in the time windows of 0.5 s, 1.0 s, 1.5 s, and 2.0 s by 31.43%, 23.00%, 16.25%, and 12.92%, respectively. Especially for the 0.5 s time window of subject 5, the classification accuracy increased from 45.86% to 96.00%. Compared to the CCA-SVM method, which was the best method in our pervious study, the classification accuracies improved by 16.26%, 9.84%, 6.96%, and 6.83% in 0.5 s, 1.0 s, 1.5 s, and 2.0 s time windows, respectively. For the subject 4, whose SSVEP had lower classification accuracy among the five subjects, the accuracy in the 2.0 s time window increased from 72.27% to 88.37%.

Fig. 7
Fig. 7
Close modal
Table 1

Averaged classification accuracies of CCA-SVM, PSD-SVM, CCA, and CNN-CCA methods at 0.5 s, 1.0 s, 1.5 s, and 2.0 s time window lengths

Time window (s)0.51.01.52.0
Accuracy (%)CCA-SVM63.9880.1486.6789.73
PSD-SVM74.2281.7182.9882.07
CCA48.8166.9877.3883.64
CNN-CCA80.2489.9893.6396.56
Time window (s)0.51.01.52.0
Accuracy (%)CCA-SVM63.9880.1486.6789.73
PSD-SVM74.2281.7182.9882.07
CCA48.8166.9877.3883.64
CNN-CCA80.2489.9893.6396.56

In our implementation, the most frequent class was the idle state (i.e., n = 1) if the industrial parts on the conveyor was good parts in most cases. Thus, the classification accuracy of the idle state was more important than the other classes. To check the performance of idle state classification, we calculated the confusion matrices and marked the accuracies of the idle state classifications in Fig. 8. When the time window is 0.5 s or 1.0 s, the PSD-SVM method had the best idle state classification among the four methods. However, the classification accuracy for all five classes was only 81.71% in the 1.0 s time window. As the length of the time window increased, the CNN-CCA method become the best idle state classification method. In the 2.0 s time window, the idle state classification accuracy was 95% and the average classification accuracy for all five classes was 96.56%.

Fig. 8
Fig. 8
Close modal

Figure 9 shows the CNN-CCA applied to the recorded data of our previous online experiment. In the online experiment, the SSVEPs was classified in 2 s time windows. The subject 1 and subject 3 completed the experiment with all the parts detected correctly. The subject 2 got one false positive case in the first and second runs. Compared to their off-line experiment, the subject 2 had higher classification accuracy than the subject 3. However, the subject 2 had worse idle state classification, which caused more false positive cases.

Fig. 9
Fig. 9
Close modal

### 4.2 Conclusions.

We developed an EEG-based BCI for a part-picking robotic co-worker, where an operator is able to collaborate with the robot and communicate defective part without manually operating the robot. The robot removes defective parts from the conveyor based mental command from the operator. The decisions were extracted through SSVEPs and sent to the robot. We proposed a new CNN-CCA method to classify SSVEP. Its performance was verified on our offline experiment data and compared with the existing CCA, CCA-SVM, and PSD-SVM methods on 0.5 s, 1.0 s, 1.5 s, and 2.0 s window lengths of EEG data. Our CNN-CCA was found to be better than all other methods that were tested for the length of the time window. The average classification accuracies across all five subjects were found to be 80.24%, 89.98%, 93.63%, and 96.56% on 0.5 s, 1.0 s, 1.5 s, and 2.0 s time window length, respectively. Then, we established an online experiment with 2.0 s time window length. The average defective part inspection successful rate is 93.33% using the CNN-CCA method. BCI-based system has potential to be a new communication pathway between human and robots in many manufacturing applications in the future.

## Acknowledgment

This work was supported by the National Science Foundation Award Number: 1464737.

## Data Availability Statement

The authors attest that all data for this study are included in the paper.

## References

1.
Henry
,
J. C.
,
2006
, “
Electroencephalography: Basic Principles, Clinical Applications, and Related Fields
,”
Neurology
,
67
(
11
), pp.
2092
2092
.
2.
Bozinovski
,
S.
,
Sestakov
,
M.
, and
Bozinovska
,
L.
,
1988
, “
Using Eeg Alpha Rhythm to Control a Mobile Robot
,”
Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Pts 1–4
,
Seattle, WA
,
Nov. 9–12
, IEEE, pp.
1515
1516
.
3.
Hirzinger
,
G.
,
Sporer
,
N.
,
Albu-Schaffer
,
A.
,
Hahnle
,
M.
,
Krenn
,
R.
,
Pascucci
,
A.
, and
Schedl
,
M.
,
2002
, “
Dlr’s Torque-Controlled Light Weight Robot III-Are We Reaching the Technological Limits Now?
Proceedings 2002 IEEE International Conference on Robotics and Automation
,
Nice, France
,
Sept. 22–26
, Vol. 2. IEEE, pp.
1710
1716
.
4.
Zinn
,
M.
,
Roth
,
B.
,
Khatib
,
O.
, and
Salisbury
,
J. K.
,
2004
, “
A New Actuation Approach for Human Friendly Robot Design
,”
Int. J. Rob. Res.
,
23
(
4–5
), pp.
379
398
.
5.
Shin
,
D.
,
Tanaka
,
A.
,
Kim
,
N.
, and
Khatib
,
O.
,
2016
, “
A Centrifugal Force-based Configuration-Independent High-Torque-Density Passive Brake for Human-Friendly Robots
,”
IEEE/ASME Trans. Mechatron.
,
21
(
6
), pp.
2827
2835
.
6.
,
S.
,
Albu-Schaffer
,
A.
,
De Luca
,
A.
, and
Hirzinger
,
G.
,
2008
, “
Collision Detection and Reaction: A Contribution to Safe Physical Human-Robot Interaction
,”
2008 IEEE/RSJ International Conference on Intelligent Robots and Systems
,
Nice, France
,
Sept. 22–26
, IEEE, pp.
3356
3363
.
7.
Geravand
,
M.
,
Flacco
,
F.
, and
De Luca
,
A.
,
2013
, “
Human-Robot Physical Interaction and Collaboration Using An Industrial Robot with a Closed Control Architecture
,”
2013 IEEE International Conference on Robotics and Automation
,
Karlsruhe, Germany
,
May 6–10
, IEEE, pp.
4000
4007
.
8.
Zanchettin
,
A. M.
,
Ceriani
,
N. M.
,
Rocco
,
P.
,
Ding
,
H.
, and
Matthias
,
B.
,
2015
, “
Safety in Human-Robot Collaborative Manufacturing Environments: Metrics and Control
,”
IEEE Trans. Autom. Sci. Eng.
,
13
(
2
), pp.
882
893
.
9.
Wilcox
,
R.
,
Nikolaidis
,
S.
, and
Shah
,
J.
,
2013
, “
Optimization of Temporal Dynamics for Adaptive Human-Robot Interaction in Assembly Manufacturing
,”
Robotics
,
8
, p.
441
.
10.
,
S.
,
Albu-Schäffer
,
A.
, and
Hirzinger
,
G.
,
2009
, “
Requirements for Safe Robots: Measurements, Analysis and New Insights
,”
Int. J. Rob. Res.
,
28
(
11–12
), pp.
1507
1527
.
11.
Cordero
,
C. A.
,
Carbone
,
G.
,
Ceccarelli
,
M.
,
Echávarri
,
J.
, and
Muñoz
,
J. L.
,
2014
, “
Experimental Tests in Human–Robot Collision Evaluation and Characterization of a New Safety Index for Robot Operation
,”
Mech. Mach. Theory
,
80
, pp.
184
199
.
12.
,
S.
,
Albu-Schaffer
,
A.
, and
Hirzinger
,
G.
,
2008
, “
The Role of the Robot Mass and Velocity in Physical Human-Robot Interaction-Part I: Non-Constrained Blunt Impacts
,”
2008 IEEE International Conference on Robotics and Automation
,
Karlsruhe, Germany
,
May 6–10
, IEEE, pp.
1331
1338
.
13.
Gemignani
,
G.
,
Veloso
,
M.
, and
Nardi
,
D.
, “
Language-Based Sensing Descriptors for Robot Object Grounding
,”
19th Annual RoboCup International Symposium
,
Hefei, China
,
July 23
, pp.
3
15
.
14.
Sheikholeslami
,
S.
,
Moon
,
A.
, and
Croft
,
E. A.
,
2015
, “
Exploring the Effect of Robot Hand Configurations in Directional Gestures for Human-Robot Interaction
,”
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Hamburg, Germany
,
Sept. 28–Oct. 2
, IEEE, pp.
3594
3599
.
15.
Gleeson
,
B.
,
MacLean
,
K.
,
,
A.
,
Croft
,
E.
, and
Alcazar
,
J.
,
2013
, “
Gestures for Industry Intuitive Human-Robot Communication From Human Observation
,”
2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI)
,
Tokyo, Japan
,
Mar. 3–6
, IEEE, pp.
349
356
.
16.
Beetz
,
M.
,
Bartels
,
G.
,
Albu-Schäffer
,
A.
,
Bálint-Benczédi
,
F.
,
Belder
,
R.
,
Beßler
,
D.
,
,
S.
,
,
A.
,
Mansfeld
,
N.
,
Wiedemeyer
,
T.
, and
Weitschat
,
R.
,
2015
, “
Robotic Agents Capable of Natural and Safe Physical Interaction with Human Co-Workers
,”
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Hamburg, Germany
,
Sept. 28–Oct. 2
, IEEE, pp.
6528
6535
.
17.
Oshin
,
O.
,
Bernal
,
E. A.
,
Nair
,
B. M.
,
Ding
,
J.
,
Varma
,
R.
,
Osborne
,
R. W.
,
Tunstel
,
E.
, and
Stramandinoli
,
F.
,
2019
, “
Coupling Deep Discriminative and Generative Models for Reactive Robot Planning in Human-Robot Collaboration
,”
2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)
,
Bari, Italy
,
Oct. 6–9
, IEEE, pp.
1869
1874
.
18.
Vogel
,
J.
,
,
S.
,
Simeral
,
J. D.
,
Stavisky
,
S. D.
,
Bacher
,
D.
,
Hochberg
,
L. R.
,
Donoghue
,
J. P.
, and
Van Der Smagt
,
P.
,
2014
, “Continuous Control of the DLR Light-Weight Robot III by a Human with Tetraplegia Using the Braingate2 Neural Interface System,”
Experimental Robotics
,
Springer
,
Berlin, Heidelberg
, pp.
125
136
.
19.
Edlinger
,
G.
,
Holzner
,
C.
, and
Guger
,
C.
,
2011
, “
A Hybrid Brain-Computer Interface for Smart Home Control
,”
International Conference on Ergonomics and Health Aspects of Work with Computers (EHAWC)/14th International Conference on Human-Computer Interaction (HCI)
,
Orlando, FL
,
July 9–14
, Springer, pp.
417
426
.
20.
Riaz
,
A.
,
Akhtar
,
S.
,
Iftikhar
,
S.
,
Khan
,
A. A.
, and
Salman
,
A.
,
2014
, “
Inter Comparison of Classification Techniques for Vowel Speech Imagery Using Eeg Sensors
,”
The 2014 2nd International Conference on Systems and Informatics (ICSAI 2014)
,
Shanghai, China
,
Nov. 15–17
, IEEE, pp.
712
717
.
21.
Yin
,
E.
,
Zhou
,
Z.
,
Jiang
,
J.
,
Chen
,
F.
,
Liu
,
Y.
, and
Hu
,
D.
,
2013
, “
A Novel Hybrid Bci Speller Based on the Incorporation of Ssvep Into the P300 Paradigm
,”
J. Neural. Eng.
,
10
(
2
), p.
026012
.
22.
Ying
,
R.
,
Weisz
,
J.
, and
Allen
,
P. K.
,
2018
, “
Grasping with Your Brain: A Brain-Computer Interface for Fast Grasp Selection
,”
12th International Symposium on Robotics Research (ISRR)
,
Sestri Levante, Italy
,
Sept. 12–15, 2015
, Springer, pp.
325
340
.
23.
Hortal
,
E.
,
Planelles
,
D.
,
Costa
,
A.
,
Iánez
,
E.
,
Úbeda
,
A.
,
Azorín
,
J. M.
, and
Fernández
,
E.
,
2015
, “
SVM-based Brain–machine Interface for Controlling a Robot Arm Through Four Mental Tasks
,”
Neurocomputing
,
151
, pp.
116
121
.
24.
Gandhi
,
V.
,
,
G.
,
Coyle
,
D.
,
Behera
,
L.
, and
McGinnity
,
T. M.
,
2014
, “
Eeg-based Mobile Robot Control Through An Adaptive Brain–robot Interface
,”
IEEE. Trans. Syst. Man. Cybernet.: Syst.
,
44
(
9
), pp.
1278
1285
.
25.
LaFleur
,
K.
,
,
K.
,
Doud
,
A.
,
,
K.
,
Rogin
,
E.
, and
He
,
B.
,
2013
, “
Quadcopter Control in Three-dimensional Space Using a Noninvasive Motor Imagery-based Brain–computer Interface
,”
J. Neural. Eng.
,
10
(
4
), p.
046003
.
26.
Li
,
Y.
, and
,
T.
,
2018
, “
Brain Computer Interface Robotic Co-Workers: Defective Part Picking System
,”
In ASME 2018 13th International Manufacturing Science and Engineering Conference, American Society of Mechanical Engineers Digital Collection
,
College Station, TX
,
June 18–22
.
27.
Zhang
,
Y.
,
Zhou
,
G.
,
Jin
,
J.
,
Wang
,
X.
, and
Cichocki
,
A.
,
2014
, “
Frequency Recognition in SSVEP-based BCI Using Multiset Canonical Correlation Analysis
,”
Int. J. Neural Syst.
,
24
(
04
), p.
1450013
.
28.
Lin
,
Z.
,
Zhang
,
C.
,
Wu
,
W.
, and
Gao
,
X.
,
2006
, “
Frequency Recognition Based on Canonical Correlation Analysis for SSVEP-based BCIS
,”
IEEE Trans. Biomed. Eng.
,
53
(
12
), pp.
2610
2614
.
29.
Resalat
,
S. N.
, and
Setarehdan
,
S. K.
,
2013
, “
An Improved Ssvep Based BCI System Using Frequency Domain Feature Classification
,”
Am. J. Biomed. Eng.
,
3
(
1
), pp.
1
8
.
30.
Aznan
,
N. K. N.
,
Bonner
,
S.
,
Connolly
,
J.
,
Al Moubayed
,
N.
, and
Breckon
,
T.
,
2018
, “
On the Classification of SSVEP-based Dry-Eeg Signals Via Convolutional Neural Networks
,”
2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
,
Miyazaki, Japan
,
Oct. 7–10
, IEEE, pp.
3726
3731
.
31.
Li
,
Y.
,
Xiang
,
J.
, and
,
T.
,
2020
, “
Convolutional Correlation Analysis for Enhancing the Performance of Ssvep-based Brain-computer Interface
,”
IEEE. Trans. Neural. Syst. Rehabil. Eng.
,
28
(
12
), pp.
2681
2690
.
32.
Wang
,
Y.
,
Chen
,
X.
,
Gao
,
X.
, and
Gao
,
S.
,
2016
, “
A Benchmark Dataset for Ssvep-based Brain–computer Interfaces
,”
IEEE. Trans. Neural. Syst. Rehabil. Eng.
,
25
(
10
), pp.
1746
1752
.