Effects of chest movements while sitting on Navon task performance and stress levels

Background This study explored physical activity during remote work, most of which takes place while sitting in front of a computer. The purpose of Experiment 1 was to develop a classification for body motion by creating a neural net that can distinguish among several kinds of chest movement. Experiment 2 examined the effects of chest movements on stress and performance on the Navon test to validate the model developed in Experiment 1. Method and results The procedures for this study were as follows. Experiment 1: Creation of the body movement classification model and preliminary experiment for Experiment 2. Data from five participants were used to construct a machine-learning categorization model. The other three participants participated in a pilot study for Experiment 2. Experiment 2: Model validation and confirmation of stress measurement validity. We recruited 34 new participants to test the validity of the model developed in Experiment 1. We asked 10 of the 34 participants to retake the stress measurement since the results of the stress assessment were unreliable. Using LSTM models, we classified six categories of chest movement in Experiment 1: walking, standing up and sitting down, sitting still, rotating, swinging, and rocking. The LSTM models yielded an accuracy rate of 83.8%. Experiment 2 tested the LSTM model and found that Navon task performance correlated with swinging chest movement. Due to the limited reliability of the stress measurement results, we were unable to draw a conclusion regarding the effects of body movements on stress. In terms of cognitive performance, swinging of the chest reduced RT and increased accuracy on the Navon task (β = .015 [-.003,.054], R2 = .31). Conclusions LSTM classification successfully distinguished subtle movements of the chest; however, only swinging was related to cognitive performance. Chest movements reduced the reaction time, improving cognitive performance. However, the stress measurements were not stable; thus, we were unable to draw a clear conclusion about the relationship between body movement and stress. The results indicated that swinging of the chest improved reaction times in the Navon task, while sitting still was not related to cognitive performance or stress. The present article discusses how to collect sensor data and analyze it using machine-learning methods as well as the future applicability of measuring physical activity during remote work.


Background
In the spring of 2020, when the COVID-19 pandemic began, Japanese universities shifted to online education according to governmental policies. However, adverse outcomes, such as an increased dropout rate, were found; thus, the Japanese government is currently instructing universities to reduce remote education. In the corporate sector, the adoption of remote work increased during the pandemic. However, some companies have begun scaling back from remote work and requiring employees to return to their offices. Thus, even though Japan ranked first in the world in the number of infected people under the seventh wave of the pandemic, in the winter of 2022, the rate of partially or fully remote work remained at 28.5% [1]. It is therefore important to determine healthy and efficient practice for remote work. For example, what activities promote or reduce performance in remote work? We aimed to create a self-monitoring tool instead of having management evaluate employees' performance. For this reason, we sought to categorize body movements with machine-learning methods that can be used with any smartphone. Such tools can be used by remote employees to obtain feedback regarding potential decreases in performance because these employees might not be aware of their stress levels.

Impacts of sedentary behavior
The lockdown policies implemented to stop the spread of the COVID-19 pandemic caused health and psychological issues, including increased loneliness [2] [3] and increased anxiety and depression [4]. The lockdowninduced change in sedentary time (specifically, sitting) is the change that poses the most significant health risk.
The pandemic increased sedentary time (e.g., time lounging on the couch or bed) in daily life. According to [5] sedentary behavior increases all-cause mortality. Furthermore, there is a vicious cycle in which sedentary behavior induces a depressed mood, and the depressed mood further reduces physical activity [6]. According to a survey by KK Japan Innovation [7], 63.37% of university students showed symptoms of depression when Japanese universities conducted fully remote education. [8] found that greater physical activity and lower sedentary time were associated with reduced perceived stress after adjusting for sex, BMI, income, fruit and vegetable intake, alcohol consumption, and sleep quality.
To reduce sedentary behavior, various attempts have been made to prevent prolonged sitting. [9] compared groups of college students using sitting or standing desks and reported that the standing group had a more active lifestyle. One way to improve the mental health of individuals engaging in remote work and online learning is to use wearable sensors to measure physical activity. It is possible to identify signs of depression or activities that reduce depression, such as running, using data from wearable sensors. Data from such mobile sensors can be classified using deep learning for signal processing.

Machine learning
In recent years, there have been large advances in the field of machine learning, and deep learning has become one of the most popular techniques for data analysis and prediction. Convolutional neural networks (CNNs; [10]) and long short-term memory (LSTM; [11]) models are representative deep neural networks. CNNs perform well for visual object classification, while recurrent neural networks (RNNs) are better suited for body motion recognition due to their ability to handle continuous data. LSTM was proposed as a method to solve the gradient loss problem of RNNs. LSTM models perform well with time-series data, such as text analysis. Deep learning algorithms are particularly useful for image classification and other computer vision tasks, but they can also be applied to signal data.
There are some differences between applying deep learning to image data and applying it to signal data that must be taken into account. When analyzing signal data with deep learning methods, preprocessing and feature extraction become more important than they would be with image data. Raw image data can often be fed directly into a deep learning network, but raw signal data are often noisy and variable, making it necessary to perform preprocessing before the data are ready for input into the network.
In this study, long short-term memory (LSTM) was selected for the classification of mobile sensor data. LSTM models are a type of recurrent neural network (RNN) that are particularly well suited for applications that require the analysis of sequences of data over time. This is because LSTM models are capable of learning long-term temporal dependencies in a sequence.

Cognitive performance
While the negative effects of sitting on health are apparent, their effects on cognitive performance have been inconsistent [12]. examined the association between sedentary behavior and cognitive flexibility in adolescents and found that the association differs according to the type of sedentary behavior. Recreational screen-based sedentary behavior was found to have a negative association with cognitive flexibility, while educational (learning) sedentary behavior was found to have a positive association with executive control.
In the present study, we used the Navon (global-local) task to assess cognitive function [13]. [14] found that a positive mood improves (i.e., reduces) reaction times on global trials compared to local trials. The global-local processing mode is thought to be associated with metacontrol functions that control processing tradeoffs, such as the one between speed and accuracy [15]. In the present study, we used shift trials, which involve switching between global and local trials, as an index of cognitive performance. Under more stressful conditions, the reaction time of shift trials in Navon tasks decreases because they require more attention than nonshift trials [16].

Research question
This study aimed to identify indoor physical activities that promote or reduce cognitive function and stress. In particular, we compared the effects of sitting still with those of movements while seated during computer work. In particular, the study aimed to determine whether sitting still reduced cognitive performance compared to other sitting activities.
The purpose of Experiment 1 was to determine whether such subtle body movements could be classified. After creating a machine learning model to classify physical activity while seated, Experiment 2 examined which movements affected stress and cognitive performance during remote work. The goal of Experiment 2 was to use the model constructed in Experiment 1 to identify physical activities that negatively impact cognitive performance and increase stress levels while seated. Experiment 1 also included validation of the stress and cognitive performance assessments by conducting a preliminary experiment to gather a small amount of data before conducting Experiment 2.

Experiment 1: Constructing the model Study setting
Most of the wearable sensors used in studies mentioned above were designed for specific purposes, such as cardor bangle-type sensors, which are connected via Wi-Fi. Considering technological trends and future applicability, we developed a machine learning model based on data from inertial sensors in smartphones. Smartphone sensors can collect a variety of data. We collected data from accelerometers and gyroscopes, which are the most commonly used sensors for analyzing body motion (e.g., [17,18]).
The sensors were attached to the chest of participants to classify movements, assuming these movements were performed indoors during remote work or study. Chest movements are expected to be small compared to the arm and head movements involved in the operation of computers; thus, they reflect individual differences in movements while sitting.
For this purpose, the classification scheme of the Human Activity Recognition (HAR) study was modified to fit the remote-work situation; the HAR study equipped participants' chests with wearable sensors to record physical activity during short events and basic activity architecture. Studies using support vector machines [19] have shown 96% recognition accuracy. Furthermore, [20] classified physical activities in daily life into six categories. (walking, climbing stairs, descending stairs, sitting, standing, and lying down). However, they reported significant misclassification of sitting and standing.
Reyes-Ortiz et al. [20] found that the time span of daily activities is approximately 2 to 5 s; machine learning should be able to distinguish movements within this time span. They also stated that sensors for obtaining 3-dimensional body movement data should have a sampling rate between 0 and 15 Hz. Because movements while sitting are typically subtle but continue over a lengthy period, a sampling rate that is too high will collect excessive data, reducing the feasibility of the study. Therefore, I used a technique that accurately captures minor motions while correctly differentiating among body movements at a sampling rate that collected a reasonable amount of data.

Participants
Eight participants (aged 19-62 years, six males and two females) were enrolled; five participated in the training session, and three participated in the preliminary experiment for Experiment 2. The participants signed an informed consent form. Before the experiments were conducted, the participants were informed about the APA's ethical standards for psychological research.

Procedure
Physical activity measurements The participants wore their smartphones horizontally at the center of the clavicle using a smartphone holder. Figure 1 (left) shows the smartphone holder used in the training session. For the preliminary experiment for Experiment 2, the participants used a different type of smartphone holder, as shown in Fig. 1 (right), for ease of use.
Training session: Supervised machine learning Physical activity (10-Hz or 20-Hz data from the triaxial accelerometer and gyroscope) was recorded with the MATLAB mobile application. To collect labeled data for supervised machine learning, we recorded six movements by five participants (Participants A, B, C, D, and E). While the instructor counted 10 s for each activity, each participant was instructed to move as follows: We constructed LSTM models from the data obtained in the training session and cross-validated their accuracy rates.
The core components of an LSTM network are a sequence-input layer and an LSTM layer. The sequenceinput layer is responsible for incorporating the timeseries data into the network, while the LSTM layer is responsible for learning the long-term dependencies between the time steps in the sequence data over time. In this study, the author defined an LSTM network with 20 + 30 hidden units and a dropout layer with a dropout probability of 0.1. Dropout is a technique that helps prevent overfitting of models by randomly skipping some data during training. This ensures that the network does not simply memorize the training set and instead learns to recognize more general patterns in the data. The number of hidden units in the LSTM layer was relatively small; this number represents a trade-off between having enough memory to learn and avoiding overfitting. In this study, we aimed to develop a model yielding a sufficient accuracy rate while requiring less memory load.
Preliminary data for Experiment 2: Measuring body motion using the neural-net function in the training session To validate the model developed in the training session, unlabeled sensor data from triaxial accelerometers were collected for 45 min from Participants F, G, and H, who did not participate in Experiment 1. Participant F stood and chatted with other students, Participant G sat and used a laptop computer at his desk, and Participant H sat for approximately half the time and stood for the other half of the time. After 45 min, the participants performed the Navon task and stress test as the preliminary experiment for Experiment 2. The procedure of the Navon task and stress test will be described in the sections on Experiment 2 below.

Experiment 2: Testing the model
In Experiment 2, we used the model developed in Experiment 1 to measure subtle chest movements during seated desk work and examined their effects on cognitive performance and stress. As a measure of cognitive performance, we translated the Navon task for children [21][22][23], which is a script of the Inquisit test library made by Millisecond Software. This task involves a test of perceptual processing of global and local characteristics of stimuli using a circular or square graphic. As most global and local assessments employ English letters, we chose to use a test for children because our participants were Japanese.
Salivary amylase activity (SAA), which is employed as a measure of sympathetic activity to assess stress, has a well-established correlation with plasma norepinephrine levels. Thus, SAA can be used as an index of the norepinephrine concentration, which indicates sympathetic nervous system activation. In this study, we used the Nipro Cocoro METER, which measures SAA through a test chip inserted under the tongue of participants for 1 min [24]. Based on the results of Experiment 1, we measured SAA twice and used the larger value of two.

Participants
Thirty-four students from the psychology experiment class (21 males and 13 females, mean age: 20.46 years) were recruited from Kyoto, Japan. The participants were divided into two groups. One week before the experiment, the participants were familiarized with the On the day of the experiment, the participants were provided with a written explanation of informed consent, explaining that their participation was voluntary and that they could withdraw their consent at any time. Their rights to their data were explained as follows: "If you want to take this test but do not want your data to be used for analysis, please leave the parentheses in the ID number blank or use the number 99 when submitting your data. If you later decide that you do not want your data to be used for research, please contact the experimenter at that time." After the informed consent information was distributed, the participants were directed to sign an informed consent form after the experiment was completed. The experimental participants were randomly given a card with an ID number, which they were asked to fill in when they submitted their data.

Procedure
Body movement measurements The participants were seated approximately 1.5 m apart. The experimenter instructed the participants to download a file for their assignment to their laptops and prepare to start it. The assignment was about statistical analysis, but it was a relatively easy task for the participants to perform following the instructions. Smartphone holders were distributed, and each participant placed his or her smartphone in the holder as shown in Fig. 1 (right). After confirming that participants were ready, the experimenter instructed the participants to start the chest-movement measurements (on their smartphone) and to start the assignment (working on their laptop). After 30 min, the experimenter instructed the participants to stop the measurements and assignment. After stopping, the participants were asked to submit the triaxial acceleration data stored on their smartphones at their discretion.
Performance measurement After 30 min of data collection with smartphone sensors, the experimenter asked the participants to complete the Navon task. The Navon task was administered with the Inquisit Web program launched on the internet browser of the participant PC. In the Navon task, participants observed the overall shape (circle versus square) and components composing the shape (small circles vs. small squares) of Navon figures. This task has three phases: (a) focus on the overall (global) shape; (b) focus on the detail (local) shape, and (c) mixed (randomized trials). The sequence of the trials is as follows: global (ten practice trials; 20 trials), local (10 practice trials; 20 trials), and mixed phases (12 practice trials of randomly selected global or local trials; 40 trials). In these tests, participants were instructed to use their mouse to click on one of the two options (circle or square) displayed beneath the target stimulus.
In this study, shift trials involved switching between global and local trials in the last 40 mixed trials (i.e., global trial → local trial). Nonshift trials were defined as trials in the same 40 trials that did not switch (i.e., global trial → global trial). The mixed test phases involved half shift trials and half nonshift trials, randomly distributed. Performance on the Navon task was assessed with two indices: the accuracy rate on the shift trials and the reaction time on the accurate shift trials.
Stress measurement After the cognitive test was administered, SAA was determined with test kits. To prevent infection with SARS-CoV-2 (the virus that causes COVID-19), participants were instructed to perform the test themselves, fill in the two test results on the form, place the test paper in a plastic bag and return it to a designated place. The higher of the two test results was used as the stress index. The test results were submitted by participants on a voluntary basis.
At the end of the entire study, the experimenter explained the theoretical background of the experiment. Feedback was also provided to the participants by explaining the results of the experiment two weeks after it was conducted. Informed consent forms were submitted within these two weeks; 34 participants submitted consent documents. However, some of these participants did not submit smartphone data, collect sufficient data due to Wi-Fi malfunction, or had different sensor sampling rates. Data from these six participants were excluded as missing values, and data from 28 participants were included in the analysis. Two individuals did not submit SAA data because they failed to measure it, but their available data were included in the analysis.
Regarding additional data, we examined the validity of the SAA test for those willing to undergo a retest three weeks later. Ten participants performed SAA measurements and cognitive tests after climbing up and down stairs for 30 min.

Experiment 1: Constructing the model Training session
We applied CNN and LSTM networks to the data and used time-series partitions without shuffling the data. For cross-validation, data from four of the five participants (80%) were used as training data, and the remaining data were used as test data. Since all participants provided data for the test, network training was performed five times in the same manner, and the model was evaluated by averaging correct responses for the sets of data.
Only accelerometer data were used because the addition of gyroscope data reduced the accuracy of body movement identification in all datasets. The temporal signal was sampled from 10 to 128 Hz with a sliding window of 10 Hz (2 s) and a fixed width of 20 that allowed accurate identification of chest movements. Slice data with 50% overlap were used as training data, and each slice of data was subtracted from the mean for gravity correction and Fourier transformed. Data from each of the five participants was designated as the test data, and the remaining four datasets from participants were used as training data.
The LSTM model had accuracy rates of 81.8% for Participant A, 78.8% for Participant B, 87.0% for Participant C, 83.3% for Participant D, and 88.1% for Participant E. The average accuracy rate was 83.8%. We further examined which categories had the largest errors. Figure 2 shows the confusion matrix of the lowest accuracy rate (78.8% for Participant B) among the five.
The subsequent preliminary experiment for Experiment 2 used the LSTM model trained with data from all five participants. The parameters of this model were as follows: sequence-input layer = 3 (3-axis accelerometer), 1 st layer = 20, 3 rd layer = 30, output layer = 6 (6 categories), mini batch size = 30, and maximum epochs = 300. The solver was a stochastic gradient descent with momentum from the MATLAB program. The output metrics were as follows: iteration = 4500, last epoch precision = 100%, mini batch loss = 0.015, and learning rate = 0.001. Figure 3 shows the learning process of the LSTM model.    [25]. The reaction times on the Navon task were 1468.05 ms for Participant F, 1448.4 ms for Participant G, and 1157 ms for Participant H. Generally, these results suggest that our LSTM model, cognitive task, and SAA data were appropriate for our research purposes. Figure 5 shows the average reaction times for the four phases of the Navon task. A repeated-measures analysis of variance was used to examine whether there was a difference in these averages. Since the assumption of sphericity did not hold, Greenhouse-Geisser correction was performed, and a significant effect was found Pairwise comparisons of the means revealed no difference between the nonshift and shift trials (Table 1). Although the Inquisit program by Sjöwall et al. [23] calculates the shit cost (shift cost = shift trial RT-nonshift trial RT), we used the mean of the reaction times on shift trials as a performance index because there were no significant differences between the shift trial RTs and the nonshift trial RTs. Table 2 shows the mean values for each measure obtained in Experiment 2. Sitting was the largest because the subjects were sitting in front of their laptops in the classroom during the 30-min measurement. Rotating was the next-largest category. SAA data for two participants was   unavailable. The average accuracy rate for shift trials was high enough to obtain a reliable average reaction time.

Descriptive statistics
Regarding correlations among indices (Table 3), walking was strongly correlated with standing up and sitting down as well as rotating, rotating was correlated with rocking, and rotating appeared in sitting situations, as shown in Experiment 1. We discarded correlated indicators and therefore used rotating, rocking, and sitting as body movement indicators in the regression analysis. Concerning the mediator and outcome variables, a negative correlation was found between swinging and reaction time, and a positive correlation was found between the accuracy rate and SAA values. These variables were used in the mediation analysis.

Mediation analysis
Due to the small sample size, the bootstrap method (2000 resamples) was applied to perform a mediation analysis. Physical movements during the 30 min (swinging, rotating, and sitting) were categorized as causes, while reaction time during shift trials was the mediating factor. The outcomes were stress and cognitive performance (accuracy on shift trials). Table 4 shows the direct effects of the independent variables (chest movements) on the dependent variables (stress and cognitive performance). Table 5 shows the total effects. All paths in the mediation analysis, including indirect paths, are shown in Fig. 6.  The total R 2 values for accuracy, stress and reaction time were 0.37, 0.15, and 0.36, respectively. Overall, swinging movements of the chest affected cognitive performance through RT. Swinging reduced the RT and increased accuracy; however, as shown in Fig. 5, the β value was small (standardized β = 0.015 [-0.003,0.054], R 2 = 0.31).

Additional analysis
To examine the validity of the SAA test used in this study, we retested the SAA and cognitive performance of ten participants after climbing up and down stairs for 30 min.
A comparison of the corresponding means revealed that the SAA level was 18.36 (SD = 11.55) after Experiment 2 (desk work), whereas the SAA level was 22.45  . 6 The numbers indicate the parameter's standardized estimates (SD = 16.95) after stair climbing. Although there was a trend toward higher SAA levels after stair climbing, the difference was not significant (t(10) = -0.95)). For the cognitive test, RT was lower after stair climbing, 1156.11 (SD = 248.51, t(9) = 2.66, p = 0.026), than during the main experiment (desk work), 1351.13 (SD = 243.7). The SAA levels, which are predicted to be elevated by physical stress, were not significantly different after stair climbing than after desk work. Regarding the cognitive test, the effect of physical movement on reaction time was also confirmed in this additional analysis. From these results, the delay in the reaction time on shift trials in Experiment 2 was inferred to be due to cognitive fatigue rather than physical fatigue.

General discussion
Experiment 1 showed that the 10-Hz accelerometer training data from four participants, measuring each movement for only 10 s, correctly classified 83.8% of the test data.
Seated status was not identified by lack of motion but by a combination of rotational chest motion and lack of other motion. Machine learning algorithms were able to identify movement categories from this study's small, short-sampled slice data. The LSTM model achieved good performance on accuracy and was able to handle sequential data. A sampling frequency of 10 Hz and a sampling slice of 1 s were sufficient to identify small body movements. Overall, Experiment 1 demonstrated that the participants' cell phones could be used as sensors during remote work. Experiment 2 showed that swinging of the chest improved RTs in shift trials in the Navon task, while sitting still was not related to cognitive performance or stress. Since a correlation was found between RTs and an output category of the machine learning model, we believe that the machine learning model constructed and trained in this study successfully measured subtle chest movements. SAA values correlated with the accuracy rate, suggesting that stress decreased accuracy on shift trials, which require attention. However, we did not find a direct or indirect path between body movements and stress.
Additional analyses were performed to examine the validity of the SAA values. However, no significant differences were found even after stair climbing, which should have increased the physical load. Kreher et al. [26] found that SAA levels can prime the perception of unpleasant stimuli only when cortisol levels are high, suggesting that salivary amylase itself is an indicator related to cortisol levels rather than unpleasantness during task performance. To measure stress, a single indicator is not sufficient. A test battery is needed.

Limitations of this study
Identification rates can be influenced by the environment, device used, and innate and acquired factors. In particular, the position of the wearable device may be important. In this study, the poor performance of the gyroscope could be attributed to the wearing position (on the chest). For example, the participant with the lowest identification rate, had standing up and sitting down events misidentified as rocking. It is natural for the upper body to move back and forth when standing or sitting.
It is possible that swinging of the chest was related to the participants' arm movements. However, Mitra et al. [27] used VR and also found an association between lateral head sway and cognitive performance. For flanker tasks requiring similar attentional control, stress has also been reported to increase concentration in the short term [28]. For global-local tasks, [16] suggested that accessible information enhances the accessible reaction; thus, positive mood, not stress, may correlate with chest swinging.
Regarding the LSTM model, several limitations should be considered. For example, the performance of LSTM models can be sensitive to the choice of hyperparameters, such as the number of hidden units and the learning rate. If not properly supervised, LSTM models can overfit the training data and perform poorly on new, unseen data. Moreover, LSTM models can be difficult to understand and interpret, making it challenging to know how decisions are made and to diagnose problems. It is important to consider these limitations and carefully evaluate the performance of an LSTM model before deploying it in a real-world application.

Conclusion
According to Experiment 2, sitting still did not impact stress levels or cognitive performance. On the other hand, subtle chest movements improved cognitive performance. Thus, sitting still does not negatively impact cognitive performance, but cognitive performance is compromised when small movements are decreased. However, since this study was conducted over a 30-min period, longer observation is necessary.
It was demonstrated that the model developed in Experiment 1 is applicable to such observations. In conclusion, LSTM models are a powerful tool for the classification of mobile sensor data. Using a Fourier transform to preprocess the data yielded highly accurate results.
This study imposed only a low burden of data processing from the user's cell phone since it did not require a high sampling rate to record subtle body movements.