NewbornTime - improved Newborn Care based on Video and Arti�cial Intelligence - Study protocol

Background: Approximately 3-8% of all newborns do not breathe spontaneously at birth, and require time critical resuscitation. Resuscitation guidelines are mostly based on best practice, and more research on newborn resucitation is highly sought for. Methods: The NewbornTime project will develop artiﬁcial intelligence (AI) based solutions for activity recognition during newborn resuscitations based on both visible light spectrum videos and infrared spectrum (thermal) videos. In addition, time-of-birth detection will be developed using thermal videos from the delivery rooms. Deep Neural Network models will be developed, focusing on methods for limited supervision and solutions adapting to on-site environments. A timeline description of the video analysis output enables objective analysis of resuscitation events. The project further aims to use machine learning to ﬁnd patterns in large amount of such timeline data to better understand how newborn resuscitation treatment is given and how it can be improved. The automatic video analysis and timeline generation will be developed for on-site usage, allowing for data-driven simulation and clinical debrief for health-care providers, and paving the way for automated real-time feedback. This brings added value to the medical staﬀ, mothers and newborns, and society at large. Discussion: The project is a interdisciplinary collaboration, combining AI, image processing, blockchain and cloud technology, with medical expertise, which will lead to increased competences and capacities in these various ﬁelds.


Background
Approximately 3-8% of all newborns do not breathe spontaneously at birth, and require resuscitation with positive pressure ventilation (PPV) [1][2][3].Birth asphyxia is one of the leading causes of death of children in the neonatal period [4].It is also one of the main causes of cerebral palsy, learning disabilities and other developmental disorders in children [5].Newborn resuscitation is time critical, and immediate resuscitation of the newborn can reduce the risk of death and long-term damage related to birth asphyxia [6].
The main therapeutic resuscitation activities in newborns not breathing at birth involve stimulation, PPV, suction, and keeping the newborn warm, illustrated with activity timelines in Figure 1.Guidelines on newborn resuscitation exist, however, the effect of the dif-ferent newborn resuscitation activities and the consequences of delays in treatment initiation are far from fully explored.Also, there is variation in both guidelines and practices in different parts of the world.Resuscitation guidelines are mostly based on consensus of best practice, and more research on newborn resuscitation is highly sought for.Manley et.al.conclude that clinicians now demand high-quality evidence to guide neonatal practice [7].
A thorough analysis of objective data collected from newborn resuscitation episodes may allow us to: (1) discover the optimal treatment for different newborn resuscitation situations, and (2) evaluate if there is a gap in compliance with guidelines.To be able to accurately study the effect of guidelines and when and how long the different resuscitation activities should be performed, a large amount of documented newborn resuscitation episodes with accurate timeline information including start and stop time is required.Such timeline information can be produced by manual observations in real time or from visible light spectrum (VL) videos, but that would not be efficient and will not produce large enough data sets.Instead, activity recognition and timeline generation should be fully automated.Automated generation of resuscitation activity timelines will also provide a tool for clinical debriefing as well as a potential for real-time decision support.The resuscitation activity timeline needs to be relative to an accurate Time of Birth (ToB), i.e. when the newborn is clear of the mother's perineum.ToB should be recorded with second precision, since the treatment during the first minute(s) after birth is crucial for any asphyxiated newborn.Time from birth to onset of ventilation is correlated to survival and longterm damage [6], and a study from Tanzania found that every 30 second delay in initiating ventilation increases the risk of death or prolonged admission by 16%, where 8% of newborns were receiving ventilation and the mortality rate for these was 10 % [8].These numbers indicate that in similar settings we can save one newborn for each 1000 births if time to first ventilation was improved by 30 seconds.In a study from Tanzania, the median time from birth to first ventilation was found to be 108 with Q1=76 and Q3=158 seconds respectively based on manual annotation of videos and manually recorded ToB [9].
We will refer to visual light spectrum video as VL video in this manuscript (most often referred to as "video" or "ordinary video") to make the distinction to infrared spectrum (IR) video, also called thermal video.

State of the art
During routine care, the time of birth is currently recorded manually with minute precision which is prone to error and imprecision.An automated timeline needs an accurate and automated ToB detection to be reliable, since the first minutes after birth are so time critical for an asphyxiated, non-breathing newborn.Currently, no automated solution for ToB detection exists.Infrared thermal imaging of the skin has been applied to monitor human skin temperature for decades [10][11][12].In the context of neonatal monitoring it is not much explored, but some studies exist, and some overviews are given in [11,13].Examples are hypoxia recognition immediately after birth [14], respiratory rate measurements of newborns in incubators [15], and thermography for assessing health status in neonates [16,17].To the authors knowledge, thermal videos have not been used with the intention of ToB detection before.
The current state of the art in activity recognition systems from VL video is based on supervised learning approaches with manually labeled training data.Some publicly available datasets for activity recognition of a predefined set of human activities exist [18,19], but no publicly available datasets on activity recognition on resuscitation activities exist.Non-sensitive data can often easily be generated in large quantities and the truth labeling of the data can be carried out by almost anyone, as with for example the ImageNet dataset [20].Large datasets with truth labels allows utilization of supervised learning approaches with deep neural networks (DNNs) and achieves state-of-the-art models with accurate predictions.However, not all data is possible to generate in large quantities, and in the case of sensitive data, there are strict regulations of who can access the data.
For medical applications, labeling, i.e. manual annotations, is preferably done by health care providers (HCPs), usually with limited availability for such timeconsuming tasks.Thus, there is a need for utilizing semi-supervised and un-supervised learning approaches that can benefit from unlabeled or weakly labeled data during training of the DNNs.
Although semi-and unsupervised approaches have been taken into use for 2D data, e.g.images [21], they are not straightforward to adapt for 3D data like videos.Newborn resuscitation is often performed by several HCPs and multiple activities are ongoing simultaneously.Detection of such activities corresponds to activity detection in untrimmed videos i.e. localizing, detecting and classifying multiple activities in the same video clip, possibly overlapping in time and space and is a very challenging task gaining recent research interest [22][23][24].In order to recognize time-overlapping activities, most current state of the art solutions analyze multiple spatiotemporal regions, or chunks, and solve an activity classification problem for each chunk, where chunk aggregation at the output can increase performance [25].There is a need to further investigate ways of capturing the temporal features in video sequences and perform activity detection in untrimmed videos.
Another challenging task currently attracting research interest is adaptive DNNs where the challenges are to develop deep incremental models and update a model while preserving previous knowledge [26].There is normally a trade-off between being able to adapt to new data and being able to generalize well, and the trade-off might be dependent on how large the expected variations in the data are.For example, in the case of newborn resuscitation, the practice and the hospital setting in a high resource setting is most likely more predictable than for a hospital in a low resource setting where you may have more variations in hospital facilities and how trained and equipped the HPCs are.
However, high resource settings use more equipment, have more staff working in parallell hence detection of activities become more challenging.
There is a need for further research on intelligent solutions for adaptive DNNs for activity recognition from video adapted to a user environment.

Previous work from the consortium
Most of the partners of the NewbornTime project have been active partners in the Safer Births project [27] conducted in Tanzania.As a sub-project in Safer Births, the first steps towards an automated activity timeline generation was performed using signals from the NeoBeat [28] heart rate sensor in combination with ventilation signals [29] and using VL video-signals from cameras overlooking the resuscitation tables [30].The results of these studies were encouraging and provide this research group with a unique insight and experience with the task of systematically and automatically document what is going on during resuscitation of newborns.The datasets were, however, limited in size, recorded in a single hospital, and included manual observations of ToB.Our research group has started exploring the possibilities and limitations in thermal cameras for the purpose of detecting births [31].

Methods
All methods were performed in accordance with the relevant guidelines and regulations or declaration of Helsinki.

Aim and objectives
The NewbornTime project will develop a completely automated AI based system, NewbornTimeline, generating a timeline including ToB and resuscitation activities like PPV, stimulation, suction, as well as the number of HCPs involved.The system input will be based on thermal video from the delivery room and the resuscitation room and VL video from the resuscitation table.Figure 1 shows an overview of the main objective.The outcome of NewbornTime includes the unique NewbornTimeline system providing a tool not existing today for understanding newborn resuscitation events.The timelines can be used in research, for data-driven clinical debrief after every resuscitation event, and for targeted data-driven/guided training.In addition, the solutions for video-based activity recognition using semi-supervised and unsupervised deep learning approaches adaptive to new on-site environments are transferable to other applications and domains.

Project main objective and aim
The NewbornTime project aims to utilize video recordings automatically collected from births and newborn resuscitations to develop an AI-based system, NewbornTimeline, for automatic timeline generation of birth and resuscitation activities.

Secondary objectives
1 Technical: Develop a system to automatically detect ToB based on thermal video.SUS is a tertiary level hospital with approximately 4400 births each year in 14 delivery rooms and 2 operating theatres for caesarean sections.The number of newborns that require resuscitation with PPV is currently 3.6%, and continuous positive airway pressure (CPAP) without PPV is provided to additional 2.6% [3].

Study population
The study objects are mothers giving births and their newborns for the ToB detection, and newborns receiving resuscitation for recognition of resuscitative interventions.
The regional ethical committee did not regard the HCPs as study subjects, but registered according to the GDPR.Data minimization has been prioritized to minimize the privacy disadvantage for the HCPs, leading to soundless VL videos only showing the newborn on the resuscitation table and the hands of HCPs.
HCPs are also captured by the thermal cameras in the delivery rooms and resuscitation rooms.Thermal cameras, however, make it hard or impossible to recognize individuals.The videos are also pseudonimyzed in terms of time and date.
As a service for the HCP who refrains study participation we offer the option of requesting deletion of particular events through a user-friendly and anonymized web-based interface.

Inclusion and exclusion criteria
Inclusion Criteria: All women giving birth and newborns requiring resuscitation are eligible for inclusion.Exclusion Criteria: i) Non-consent from mother, ii) HCPs refraining from participation within 48 hours of the event, iii) birth in labour room without cameras installed and no newborn resuscitation performed.
The expected sample size for newborns receiving resuscitation is approximately 500.The expected sample size of women giving birth is > 500.

Consent management
The ethical committee decided that informed consent is only required from the mother.Information on the project will be provided and consent to participation will be obtained at the antenatal visit in approximately week 12 and 20 in the pregnancy, or on admission to labor department if the mother is physically fit to give consent at this time.
Currently many medical research trials collect consent forms manually with handwritten signatures, and enter consent in the (digital) medical records.This method is inefficient and prone to human errors, thus there is a need for developing an effective and reliable digital consents handling system.
We will develop a system for decentralized digital data consent management for efficient and safe consent collection.A data governance methodology will be developed to ensure that the consent data are secure and accountable.Decentralized blockchain based solutions will be established to have collective guarantees on adherence to the data methodology and consent policies.

Data collection 2.6.1 Video recordings
Real birth data will be collected at SUS by two passive thermal cameras (Mx-O-SMA-TPR079 thermal sensor connected to Mx-S16B camera, Mobotix) per room mounted in four delivery rooms and one operating theatre.The sensors are mounted to the ceiling in the four selected delivery rooms.One sensor is mounted centered above and behind the head of the mother, while the other sensor is mounted on the side of the bed, as illustrated in Figure 2. At the operating theatre the thermal sensors are mounted to the ceiling at two locations minimizing blocking of the view from the operating light and equipment in the theatre.
Real resuscitation data will be recorded at two different locations: Resuscitation room 1 (RR1) close to the delivery rooms and on resuscitation room 2 (RR2) next to the operating theatre.There is one thermal camera mounted to the ceiling in both RR1 and RR2 for capturing overview videos, which can be useful for estimating how many HCPs that are treating the newborn.In RR1 there are two resuscitation stations and in RR2 there is one, and each of these stations is equipped with one visual light camera mounted at the top, filming straight down at the table.RR1 is illustrated in Figure 3.
Additional data to be used for training of AI models will be generated by simulation of resuscitation events using newborn-manikins and a resuscitation station.
The recorded videos will be encrypted and kept in Azure cloud storage with strict access control.

Clinical variables:
Relevant information on the deliveries will be collected from the medical journals, including: 1 risk factors for complications at birth 2 mode of delivery, 3 gestational age, 4 birth weight, 5 gender, 6 umbilical cord blood gases, 7 resuscitative interventions in the delivery room, 8 admission to Neonatal Intensive Care Unit, 9 neonatal management, including therapeutic hypothermia treatment, 10 neonatal outcome, including hypoxic ischemic encelopathy and death before discharge.

Other data included in the study
The data collection also includes the preparation of an existing VL video dataset of newborn resuscitation from Haydom Lutheran hospital in Tanzania (481 VL videos) [30] collected during the Safer Births study [27], and from SUS (237 VL videos) collected as part of the NeoBeat study [32].The Safer Births and NeoBeat studies are ongoing and new data will also be included.
The ongoing "Making Every Newborn Resuscitation a Learning Event (LEARN)" study [1] is collecting data in Democratic Republic of Congo (DRC) and will contribute with VL video data from resuscitation.We also plan to include VL video from an upcoming study in DRC from the same research group [33].In these datasets, the ToB is manually recorded on paper or by tablet.No previous thermal videos exist.The datasets [1] P.I.Jackie Patterson, funding: Laerdal Foundation Program Award from Tanzania and DRC gives examples of VL video data from settings of low-and medium income countries.

Artificial intelligence based generation of timelines
To create complete newborn resuscitation timelines, i.e. recognize all activities and events of interest from the thermal and VL video recordings, we will develop DNN models and utilize conventional image processing methods in combination with the DNN models.

AI based Time of Birth detection
To detect the ToB we will utilize thermal videos due to the fact that the newborn temperature immediately after birth is higher than its surroundings and the temperature of other people in the room.This will also allow us to respect privacy and the GDPR principle of data minimization, since the HCPs and the mothers will appear very difficult to identify in the thermal videos and still-images.As a preliminary test of the idea, a real birth was filmed with a thermal camera demonstrating that the newborn is detectable.However, the newborn skin can be exposed only for a few seconds before it is covered with blankets for drying and stimulation, the HCP can block the view of the thermal camera, and the birth position can vary.To mitigate some of the potential problems we propose to use two cameras from different angles.Figure 4 shows an example of head and side camera at the exact ToB.
The ToB will be detected using image pre-processing techniques, adaptive thresholding, and object detection CNNs to reveal the presence and the position of the newborn.

AI based Activity recognition from video
Detecting the time periods with different resuscitation activities from the VL video corresponds to what is referred to as activity detection from video in the image processing and AI literature.We will develop DNN models in forms of 3D CNNs and vision transformers (ViT) for detecting the activities as this corresponds to state of the art in related topics.A still image example from a VL video is shown in Figure 5, here with ongoing ventilation.Newborn resuscitation activities, such as ventilation, drying, stimulation and suction will primarily be detected using deep learning activity recognition networks, e.g.3D CNNs and ViT, and possibly including motion analysis, such as methods for optical flow estimation.We will develop DNNs capable of utilizing both labeled and unlabeled data in training.This can be achieved by adapting 2D convolutional neural networks suited for semi-supervised and unsupervised learning to handle 3D data.We will pretrain models on un-labeled and fine-tune with labeled data.Furthermore, the system will be able to detect multiple time-overlapping activities using the concept of one-shot detectors in object detection where you analyze the network input only once and typically have a multi-dimensional output including both class labels and spatial coordinates.
We will also develop solutions capable of adapting to on-site environments using recorded unlabeled resuscitation videos.Here, we plan to first make predictions on the videos and then post-process the predictions to perform weakly labelling of the data.This can be carried out by utilizing the temporal and spatial information of detected objects and activities to fill in missing detections in neighboring frames.The developed solution will also be able to avoid overfitting to data and preserve generalization capabilities by weighing the loss functions during training and by utilizing dropout on both layer and neuron level.The developed system blocks will be combined to the NewbornTimeline system and a manually annotated small subset of the birth thermal videos and the resuscitation VL videos will be used for system validation.

Medical Analysis
From the timelines the timing and duration of all relevant resuscitation activities will be collected.The relevant activities include ventilations, drying and stimulation, assessment of heart rate, suctioning and intubation attempts, chest compressions, time from birth to initiation of PPV in non-breathing newborns, and timing of first cry/spontaneous breathing or death.
The timelines will be analyzed for compliance to current international resuscitation guidelines, most importantly the time from birth to initiation of PPV in non-breathing newborns, which is critical for survival and morbidity.The timelines will also be analysed with pattern recognition methods to look for clusters and patterns in the data that can provide new insights, especially when coupled to the clinical variables.
The timelines will also be analyzed and processed as a step in making systems for clinical debriefing and for targeted data-driven simulation training.

Novelty and ambition
The NewbornTime project has the ambition of implementing a fully automated AI based system, New-bornTimeline, general enough to be useful in different hospitals, but also adaptive so that it can be further trained to adapt to different local environment.The novelty and ambition can be broken down into the following: 1) NewbornTimeline provides a quantifiable system of doing a study of many, i.e., hundreds and even thousands of resuscitation episodes, which does not exist today.This can be used in data-driven improvement processes both at micro level through for example clinical debriefing and macro level through challenging current guidelines.
2) A fully automatic ToB will be developed using skin temperature information from thermal cameras as well as spatio-temporal information from thermal video.Development of the ToB detector requires both data collection and pushing the research front of DNN models for activity recognition on thermal video.
3) Getting access to enough labeled data to do supervised learning is a problem in many real-world applications of modern machine learning.We will push the research front by focusing on development of DNN models for VL video activity recognition able to utilize unlabeled data in training by self-supervision.Such a network will be beneficial to all 3D data deep learning problems, especially in cases where the amount of labeled data is limited and or the data is sensitive.
4) Moving a DNN model from the test bench to a real-world facility will typically be followed by a drop in performance.We have an ambition to develop DNN models and learning solutions for VL video activity recognition that can adapt to new and previously unseen data and environments.
5) The research on DNN models of both thermal and visual light cameras for event and activity recognition in a hospital environment will provide a deeper understanding of temporal feature learning using DNN, and on decision making in video activity recognition DNNs as well as semi-supervised and adaptive learning, which all can be exploited in similar applications.

Potential impact of the proposed research
In Bettinger et.al.[34] it is concluded that "Strategies that make every birth a learning event have the potential to close the performance gaps in newborn resuscitation that remain after training and frequent simulation practice, and they should be prioritized for further development and evaluation".The NewbornTime project is an example of such strategies.The potential short-and long-term project impacts are many, and they are sorted from a scientific, industrial and public sector and medical/societal perspective.

Impact from the scientific perspective:
1.There will be a strong focus on developing AI methods and neural network architectures for activity recognition in videos.There are many applications that have related problems like for example surveillance, autonomous navigation or social interaction in robotics, content-based retrieval of videos, and the methods resulting from this study will have impact far beyond newborn resuscitation.
2. A large obstacle in taking AI into use in many medical applications is the lack of ground truth data for learning good prediction models.Throughout the NewbornTime project, there will be a strong focus on semi-supervised, self-supervised and unsupervised learning techniques for video, pushing the research front in these areas.This will be important contributions to facilitate use of DNN in medical applications as well as contributions to the AI community in general.
3. Using thermal camera with AI based models for event detection and/or activity recognition is a new research and development area.If proven successful in the context of ToB detection, it opens for many other possibilities in utilizing thermal cameras.
4. Video camera used as sensor data is becoming increasingly popular throughout industry and public sector, in a large number of applications ranging from self-driving cars, production lines, surveillance in the public space, and in operating theatre or emergency setups.The technical solutions for both GDPR compliant data handling of sensitive video data, as well as the solutions and methods exploiting the temporal information to recognize activities will all constitute contributions to the community at large.

Impact from the industrial and public sector
perspective: 5.The collaboration with industry will ensure that the generalized automatic NewbornTimeline will be implemented in a product for further use.Variations of end products include i) research tool for data collection, ii) tool for training and clinical debriefing, iii) quality improvement tool for use at hospitals, iv) decision support tool used in real-time if models can be made computationally tractable.
6.A digital solution for the handling of consent forms will be developed with input from user groups including both medical staff and pregnant women recruited for the data collection.This solution can be reused in many situations requiring consent.
7. NewbornTimeline can potentially lead to improved treatment and thereby reduce the number of mortality and asphyxiated newborns with long term damage.This is undoubtedly extremely important for the parents and the newborns, but it can also reduce the cost of the healthcare system.Today the Norwegian health care system compensate families with approx.80 million NOK each year for death or squeal of newborns.

Impact from the medical and societal
perspective -long term: 8. NewbornTimeline will facilitate processing and collection of a large numbers of newborn resuscitation timelines providing the data ground for a deeper understanding of the effectiveness of the different resuscitation activities and the consequence of delayed interventions.The impact of this is twofold: First, the data collected in the NewbornTime study will provide insight in current practice at the test hospital.Second, NewbornTimeline can be used in many hospitals after the project ends.By analyzing large numbers of NewbornTimeline episodes from different hospitals and different countries, the current knowledge gaps on best practice can be filled.9.The NewbornTimeline system can be embedded in data-driven simulation training and debriefing providing HCPs with an objective report on how much time they used on the different treatment activities and if they were acting according to current guidelines.Quality improvement based on data from NewbornTimeline can help overcome the gap between "work as done" and "work as imagined".
10.The NewbornTimeline system will pave the way for a real-time decision support tool, where detected activities in combination with physiological measurements, like newborn heart rate and ventilation data, are compared to guidelines.
11.The NewbornTime project may lead to increased patient safety by increased treatment quality.If time to first ventilation was improved with 30 seconds then 1 life for each 1000 births could be saved in settings of low-and medium income countries [8], as well as reducing the numbers of newborns with long term damage and the degree of damage caused by birth asphyxia in all settings.5 To the left, a resuscitation station with the visual light camera mounted on the top.To the right, a still image example from a VL video recorded at a resuscitation station.The field of view is covering the table with the newborn, as well as a small area outside allowing to capture movements and activities at the periphery.As shown, mostly only hands and arms of the health care providers are visible.

Figures
Figures

Figure 1
Figure 1 Overview of the main objective of NewbornTime.By automatic analysis of thermal video from birth and visible light spectrum video from potential resuscitation an automated timeline is produced.The timeline shows the start and duration of different events relative to the time of birth.

Figure 2
Figure2Illustration of camera set-up in delivery rooms.To the left we see one thermal sensor mounted to the ceiling behind the head of the mother, and the other at the side of the bed, marked with green circles.To the right there is an illustration of thermal data from such a scene, where humans are easily detected due to body temperature.The newborn (here depicted in red) is slightly warmer than normal skin temperature at time of birth.

Figure 3
Figure3Illustration of camera set-up in resuscitation room RR1.To the left, the thermal camera can be seen in the upper corner, marked with a green circle.Visual light cameras are mounted at the top of the resuscitation stations, marked with red circles, filming straight down, capturing the table and the HCPs hands.To the right there is an illustration of thermal data from such a scene, where humans are easily detected due to body temperature.

Figure 4
Figure 4 Example of still images from thermal video at exact time of a real birth.The newborn is warmer than the skin of the mother and HCP, as seen by the color.

Figure
Figure5To the left, a resuscitation station with the visual light camera mounted on the top.To the right, a still image example from a VL video recorded at a resuscitation station.The field of view is covering the table with the newborn, as well as a small area outside allowing to capture movements and activities at the periphery.As shown, mostly only hands and arms of the health care providers are visible.