Prediction of outpatient waiting time: using machine learning in a tertiary children’s hospital
Highlight box
Key findings
• Artificial intelligence models were developed that were capable of predicting outpatient waiting time.
What is known and what is new?
• Unpredictable waiting times pose challenges for medical staff, wasting patients’ time and potentially leading to missed appointments.
• Four machine learning algorithms were used to build prediction models, with the best-performing model identified for each category.
What is the implication, and what should change now?
• Through the use of prediction models, patients can be informed of likely wait times, allowing them to effectively arrange their schedules and make appropriate plans.
Introduction
The waiting time in the hospital is linked to the outpatient’s satisfaction and impacts the quality of medical care provided (1). In China, many tertiary hospitals are overwhelmed, and patients have become accustomed to waiting in lines because of the deficient and uneven distribution of medical resources, chiefly in children’s hospitals. Patients in European and American countries must book an appointment in advance unless they require emergency care. They must be strictly on time for their appointments, and any cancellations or changes in the schedule require advance notification. Therefore, in most European and American countries, the waiting time is usually expressed in days (2). Chinese hospitals do not require an appointment, and patients can choose to register and visit the hospital on the same day. The unpredictable nature of patient visits poses considerable challenges to the medical staff in China (3) and may be a waste of patients’ valuable time. If the patient cannot momentarily stay in the waiting room because of an emergency, a turn-missing event may occur, as the waiting period is not clear, and the patient may lose their place in line. Because of this, some patients are fearful about missing their turn and do not leave the waiting area. The congested waiting area is detrimental to hospital infection prevention and management, especially during the coronavirus disease 2019 (COVID-19) pandemic period (3).
Analyzing the determining factors that influence waiting time and proposing an effective forecast approach is critical from a practical standpoint. Various studies have been conducted to predict waiting times using a variety of machine learning prediction models in order to improve patient experience and care efficiency (4,5). In this manner, patients may be able to plan ahead and arrive at the hospital at the scheduled time, thus reducing their time in the hospital. However, several factors affect waiting time, and the time projected has generally relied on rolling average or median estimators, which may influence the accuracy (6). In other countries, due to differences in medical and health systems from those in China, waiting time is measured by the number of queuing days, which is not applicable to the situation in China (7).
The rapid development and implementation of artificial intelligence (AI) in the medical field has opened new possibilities for enhancing hospital management. Many machine learning algorithms (8), including deep learning (9) and random forest (RF) (10), have demonstrated excellent performance in time prediction. Studies have used AI to predict the onset time of illnesses and the time spent in the emergency department and operating room (11,12). However, compared with studies conducted elsewhere, those concerning the outpatient care situation in China entail greater difficulty. Online registration, machine registration, and window registration all coexist. Additionally, there is a large flow of patients and a diversity of diseases or illnesses, both of which complicate the implementation of AI technology in hospitals. Establishing an AI-based model to predict outpatient waiting time in pediatric hospitals may be novel solutions for better meeting the objectives of hospital development.
China currently has few models for predicting waiting times and even fewer models that are based on AI. Although two studies of prediction in medication for older adults (11) and chronic respiratory diseases (12) have been conducted in China, there is no specific research on patient waiting time prediction models for outpatients in Chinese children’s hospitals. To better respond to hospital development and patient demands, this study developed a set of AI algorithm models capable of accurately predicting patient wait times using data from the hospital information system (HIS) of Shanghai Children’s Medical Center (SCMC) and the characteristics of each department. In the future, we hope to be able to send real-time predictions to mobile devices in order to provide patients with reasonable and accurate time scheduling. This has considerable practical and social significance for improving patient satisfaction and reducing the burden on hospital management. We present this article in accordance with the TRIPOD reporting checklist (available at https://tp.amegroups.com/article/view/10.21037/tp-23-58/rc).
Methods
This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). This study was approved by the institute review board of SCMC (No. SCMCIRB-K2019020-2). This retrospective study did not require informed consent since it was not practicable.
Research flow
Figure 1 illustrates the modeling procedure. We planned to collect the relevant information in each department from the HIS of SCMC over the past 5 years. Prior to modeling, relevant data were mined and examined, and several hospital departments were classified into a single category based on medical knowledge. Data within the same category were consistently modeled. To facilitate convergence of the model algorithms, preprocessing of the data was conducted to eliminate the effect of extreme values and null values. Subsequently, models for waiting time prediction were constructed using four algorithms: linear regression (LR), K-nearest neighbor (KNN), RF, and gradient boosting decision tree (GBDT). The test data set was used to evaluate the model’s predictive ability, and the R2 and mean absolute error (MAE) were used as indices for model evaluation.
Data collection and department classification
From 2015 to 2021, we gathered retrospective outpatient data at SCMC. However, the COVID-19 pandemic in China altered our hospital outpatient service, with certain emergency patients merged into the outpatient clinic. Additionally, outpatient visits had declined significantly since January 2020. Patient waiting time had also decreased significantly since the pandemic (Figure 2). As a result, this research selected data only from September 2020 to April 2021. SCMC had both specialty and general outpatient departments. Specialist departments had fewer patients than did the general departments. Furthermore, patients in specialty sections were treated separately and were not part of the general outpatient queue. The focus of this research was thus on the waiting time in the general outpatient departments.
SCMC had 24 outpatient departments; 17 of them had fewer people (<9,000) with short queues. There were seven departments with substantial patient flows and severe queue conditions whose waiting times were the focus of this study (13). These departments were internal or surgical. Internal departments included general internal medicine, endocrinology, and pneumology, while surgical departments included general surgery, orthopedics, otolaryngology, and cardiothoracic surgery, as seen in Table 1. The waiting time varied by department, as seen in Figure 2A. Meanwhile, each department’s hours of operation and closure times varied. As a result, we classified these seven departments into four categories: Internal Medicine Departments I and II and Surgery Departments I and II. The general internal department was open 24 hours a day, 7 days a week, including on holidays. The endocrinology, pneumology, otolaryngology, and cardiothoracic surgery departments were open from 7 am to 5 pm, while the orthopedics and general surgery departments were open from 7 am to 12 am. Thus, we classified general internal medicine as Internal Medicine Department I, endocrinology and pneumology as Internal Medicine Department II, orthopedics and general surgery as Surgery Department I, and otolaryngology and cardiothoracic surgery as Surgery Department II. Each category had its own model.
Table 1
Category | Outpatient departments | Number of patients | Open time | Number of patients after preprocessing |
---|---|---|---|---|
Internal Medicine Department I | General internal medicine | 97,908 | 00:00–23:59 | 97,908 |
Total | 97,908 | 97,908 | ||
Internal Medicine Department II | Endocrinology | 14,724 | 07:00–16:59 | 14,644 |
Pneumology | 10,289 | 10,065 | ||
Total | 25,013 | 24,709 | ||
Surgery Department I | Orthopedics | 33,520 | 07:00–23:59 | 33,272 |
General surgery | 9,460 | 9,383 | ||
Total | 42,980 | 42,655 | ||
Surgery Department II | Otolaryngology | 18,548 | 07:00–16:59 | 18,497 |
Cardiothoracic surgery | 10,184 | 9,751 | ||
Total | 28,732 | 28,248 |
Data preprocessing
Due to the fact that certain patients’ critical data (such as check-in time and starting time) were missing during exploratory analysis, we removed data containing null values and erroneous data. For instance, the general surgery department was open from 7:00 am to 11:00 pm, yet a patient checked in at 6:00 am, which was not possible. After data cleaning and deletion of some outliers and missing data, 80% of the data were randomly chosen to compose the training set, while the remaining 20% were selected to compose the test set.
Feature engineering and value range
Foremost, the guardians of outpatients were required to register and wait for their turn with a doctor. As a result, we computed waiting time between the time required to register and the time beginning of the consultation (11,12). The dependent variable was the waiting time.
According to medical knowledge and the experience of physicians, independent variables were constructed. This study performed one-hot coding for categorical variables of gender, type of payment, method of registration, patient punctuality, and the type of department. We constructed the models using the features list in Table 2 after completing a literature research and data interpretation. The day of the week on which guardians registered was the first feature considered since queueing took longer on Mondays and the number of patients was comparatively lower on weekends. The second feature was the specific date of the registration. Different days might have had varying weather and temperature conditions, which might have impacted the patient’s flow. The study included the specific time of the registration as the third major factor. The value was an integer ranging from 0 to 23. The peak time of visits in our hospital was around 8 am to 9 am and from 2 pm to 3 pm. As a result, registering during this time period would cause longer waiting times. Another feature affecting the waiting time was the number of patients waiting ahead of a given patient at the time of registration and was the most direct influence on the time spent waiting. We also took gender, method of payment, appointment, and department into account.
Table 2
Feature | Value range |
---|---|
Registration week | Monday to Sunday |
Registration day | 1st to 31st |
Registration time | 0:00 to 23:00 |
The number of patients in line ahead | The number of patients who have signed in but have not yet seen the doctor when this patient registered |
Patient gender | Girl/boy |
Type of payment | Medical insurance/self-pay |
Way of visit | Intraday/appointment |
Turn missed | No/yes |
Department | Internal Medicine Department I: general internal department |
Internal Medicine Department II: endocrinology department, pneumology department | |
Surgery Department I: orthopedics department, general surgery department | |
Surgery Department II: otolaryngology department, cardiothoracic surgery department |
Statistical analysis
Variance inflation factor and variable correlation were employed to examine the multicollinearity between variables. The variance inflation factors were all almost equal to 1, and the correlation coefficient between independent variables was about 0. Multiple correlations between the independent variables were not found. Following this, a significance test was completed for the variables in each category.
Model construction
We first attempted to establish the model in all different outpatient departments; if poor results were found, then dimension-reduction techniques would be used. LR, RF, GBDT, and KNN were used in constructing models for the four department categories to make the models more explanatory and diverse (14). Grid search was used as a parameter tuning method to list all the cases of hyperparameters in a one-by-one search; that is, to trial each possibility through a cycle among all candidate hyperparameter choices, with the parameter demonstrating the best performance being selected as the final result (15). The 5-fold cross-validation method was used to evaluate the effect of the model on the training set (16). The training set was divided into five subsets on average, with each subset in turn being used as a validation set while the other four self-subsets were used as training sets. Training and validation were repeated five times, and the result of the five average cross-validations was taken as the result of the training set. In this way, overfitting could be reduced to some extent, and effective information could be obtained as much as possible from limited data.
LR
A LR model was created as a reference for other algorithms with waiting time being used as the dependent variable.
RF
RF is a bagging algorithm that contains multiple weak decision trees. The hyperparameters tuned by grid search included the number of subtrees as well as the maximum number of features and the minimum number of samples to split a node.
GBDT
GBDT (17,18) is a boosting algorithm that incorporates a number of weak decision trees. The learning rate, number of boosting iterations, maximum depth of each tree, maximum number of features, and minimum number of samples to split a node were tuned by grid search.
KNN
KNN (19) involves each sample being represented by its KNNs. The hyperparameters tuned were the number of neighbors and the type of weights.
Model evaluation
R2 and the MAE were used to compare the model performances. R2 measured the amount of variation in the dependent variable that could be explained by the independent variable. The nearer R2 is to 1, the better the model performance. MAE is the average of the absolute values of the difference between the actual and expected waiting times for each patient. In practice, patients may encounter difficulties if the predicted waiting time is too lengthy or too short. Therefore, a lower MAE shows that the expected waiting time is closer to the actual time, which benefits patients. Additionally, predicted waiting time was compared against actual waiting time to demonstrate the disparities across models and departments. Throughout the investigation, data processing and analysis were carried out using Python (version 3.9.0, Python Software Foundation).
Results
Basic characteristics of data
Between September 1, 2020, and April 31, 2021, a total of 248,345 observations were gathered. After removal of null values and outliers, a total of 193,520 visits were recorded, comprising 97,908 visits to Internal Medicine Department I, 24,709 visits to Internal Medicine Department II, 42,655 visits to Surgery Department I, and 28,248 visits to Surgery Department II. According to Table 3, patients’ guardians visited intraday or by appointment and were reimbursed by medical insurance or from their own pocket. Gender, means of visit, method of payment, and turn missing were essentially the same across all areas. Internal Medicine Department I had 78,326 visits for the training set and 19,582 visits for the testing set, Internal Medicine Department II had 19,767 visits for the training set and 4,942 visits for testing set, Surgery Department I had 34,124 visits for the training set and 8,531 visits for the testing set, and Surgery Department II had 22,598 visits for the training set and 5,650 visits for the testing set. Data from May 2021 were collected as an external validation set, with 13,413 visits in Internal Medicine Department I, 3,717 visits in Internal Medicine Department II, 6,284 visits in Surgery Department I, and 4,245 visits in Surgery Department II.
Table 3
Characteristics | Internal Medicine Department I |
Internal Medicine Department II | Surgery Department I | Surgery Department II | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Training set | Testing set | Training set | Testing set | Training set | Testing set | Training set | Testing set | ||||
Sample size | 78,326 | 19,582 | 19,767 | 4,942 | 34,124 | 8,531 | 22,598 | 5,650 | |||
Gender | |||||||||||
Girl | 35,286 (45.05) | 8,753 (44.70) | 10,453 (52.88) | 2,643 (53.48) | 13,856 (40.6049) | 3,569 (41.84) | 9,705 (42.9463) | 2,487 (44.02) | |||
Boy | 43,040 (54.95) | 10,829 (55.30) | 9,314 (47.12) | 2,299 (46.52) | 20,268 (59.3951) | 4,962 (58.16) | 12,893 (57.0537) | 3,163 (55.98) | |||
Visit type | |||||||||||
Intraday | 68,888 (87.95) | 17,244 (88.06) | 4,556 (23.05) | 1,175 (23.78) | 23,039 (67.5155) | 5,729 (67.16) | 11,352 (50.2345) | 2,835 (50.18) | |||
Appointment | 9,438 (12.05) | 2,338 (11.94) | 15,211 (76.95) | 3,767 (76.22) | 11,085 (32.4845) | 2,802 (32.84) | 11,246 (49.7655) | 2,815 (49.82) | |||
Method of payment | |||||||||||
Medical insurance | 50,755 (64.80) | 12,797 (65.35) | 14,586 (73.79) | 3,642 (73.69) | 20,943 (61.3732) | 5,111 (59.91) | 11,993 (53.0711) | 2,943 (52.09) | |||
Self-pay | 27,571 (35.20) | 6,785 (34.65) | 5,181 (26.21) | 1,300 (26.31) | 13,181 (38.6268) | 3,420 (40.09) | 10,605 (46.9289) | 2,707 (47.91) | |||
Turned miss | |||||||||||
No | 61,831 (78.94) | 15,395 (78.62) | 18,073 (91.43) | 4,511 (91.28) | 30,408 (89.1103) | 7,615 (89.26) | 19,297 (85.3925) | 4,874 (86.27) | |||
Yes | 16,495 (21.06) | 4,187 (21.38) | 1,694 (8.57) | 431 (8.72) | 3,716 (10.8897) | 916 (10.74) | 3,301 (14.6075) | 776 (13.73) |
Data are presented as n (%). Internal Medicine Department I included the general internal department. Internal Medicine Department II included the endocrinology department and pneumology department. Surgery Department I included the orthopedics department and general surgery department. Surgery Department II included the otolaryngology department and cardiothoracic surgery department.
Performance of models
Models for the departments of Internal Medicine Department I, Internal Medicine Department II, Surgery Department I, and Surgery Department II were constructed using four machine learning algorithms. Table 4 summarizes the prediction performance of the training and testing set. For Internal Medicine Department I, the R2 of GBDT and RF on both the training and test set were 0.97, which was higher than the R2 of LR (training set: R2=0.91; testing set: R2=0.91) and KNN (training set: R2=0.94; testing set: R2=0.95). The R2 of GBDT was the largest on the test sets for Internal Medicine Department II, Surgery Department I, and Surgery Department II, with values of 0.82, 0.89, and 0.85, respectively, while the MAE values were 14.62, 8.73, and 14.11 minutes, respectively.
Table 4
Department category | Model | Training set | Test set | |||
---|---|---|---|---|---|---|
R2 | MAE (min) | R2 | MAE (min) | |||
Internal Medicine Department I | LR | 0.91 | 9.59 | 0.91 | 9.60 | |
KNN | 0.94 | 6.49 | 0.95 | 6.47 | ||
GBDT | 0.97 | 5.27 | 0.97 | 5.28 | ||
RF | 0.97 | 5.06 | 0.97 | 5.03 | ||
Internal Medicine Department II | LR | 0.68 | 19.8 | 0.67 | 20.38 | |
KNN | 0.74 | 16.39 | 0.74 | 16.74 | ||
GBDT | 0.83 | 14.15 | 0.82 | 14.62 | ||
RF | 0.79 | 15.07 | 0.74 | 15.43 | ||
Surgery Department I |
LR | 0.70 | 13.55 | 0.72 | 13.61 | |
KNN | 0.78 | 10.82 | 0.79 | 10.86 | ||
GBDT | 0.88 | 8.76 | 0.89 | 8.73 | ||
RF | 0.87 | 8.95 | 0.86 | 8.85 | ||
Surgery Department II |
LR | 0.70 | 20.65 | 0.71 | 21.09 | |
KNN | 0.77 | 16.67 | 0.77 | 17.12 | ||
GBDT | 0.85 | 13.62 | 0.85 | 14.11 | ||
RF | 0.84 | 13.89 | 0.81 | 14.07 |
Internal Medicine Department I included the general internal department. Internal Medicine Department II included the endocrinology department and pneumology department. Surgery Department I included the orthopedics department and general surgery department. Surgery Department II included the otolaryngology department and cardiothoracic surgery department. MAE, mean absolute error; LR, linear regression; KNN, K-nearest neighbor; GBDT, gradient boosting decision tree; RF, random forest.
With the R2 and MAE of the LR algorithm being used as a baseline, other algorithms were compared (Table 5). The MAE of RF on the testing set for Internal Medicine Department 1 was 5.03 minutes, accounting for just 13.80% of the overall average wait time. When RF was compared to LR, the R2 increased by 6.59% and the MAE decreased by 47.60%. Accordingly, it was found that the RF model was most effective in predicting outcomes in Internal Medicine Department I. The best predictive effect was achieved with the GBDT algorithm Internal Medicine Department II, Surgery Department I, and Surgery Department II, and in comparison to the LR model, the R2 was increased by 22.39%, 23.61%, and 19.72%, respectively, while the MAE was decreased by 28.26%, 35.86%, and 33.10%, respectively.
Table 5
Department category | Model | Training set | Test set | |||
---|---|---|---|---|---|---|
R2 (%) | MAE (%) | R2 (%) | MAE (%) | |||
Internal Medicine Department I | KNN | 3.30 | −32.33 | 4.40 | −32.60 | |
GBDT | 6.59 | −45.05 | 6.59 | −45.00 | ||
RF | 6.59 | −47.24 | 6.59 | −47.60 | ||
Internal Medicine Department II | KNN | 8.82 | −17.22 | 10.45 | −17.86 | |
GBDT | 22.06 | −28.54 | 22.39 | −28.26 | ||
RF | 16.18 | −23.89 | 10.45 | −24.29 | ||
Surgery Department I | KNN | 11.43 | −20.15 | 9.72 | −20.21 | |
GBDT | 25.71 | −35.35 | 23.61 | −35.86 | ||
RF | 24.29 | −33.95 | 19.44 | −34.97 | ||
Surgery Department II | KNN | 10.00 | −19.27 | 8.45 | −18.82 | |
GBDT | 21.43 | −34.04 | 19.72 | −33.10 | ||
RF | 20.00 | −32.74 | 14.08 | −33.29 |
As a control, the linear regression algorithm was used. Internal Medicine Department I included the general internal department. Internal Medicine Department II included the endocrinology department and pneumology department. Surgery Department I included the orthopedics department and general surgery department. Surgery Department II included the otolaryngology department and cardiothoracic surgery department. LR, linear regression; MAE, mean absolute error; KNN, K-nearest neighbor; GBDT, gradient boosting decision tree; RF, random forest.
The optimal prediction model for each category was used to predict the data from the external validation set, and the results are shown in Table 6. The MAE of four models was within 14 minutes, and MAE of the RF model for Internal Medicine Department I was only 2.46 minutes. The prediction results on the external validation set indicated that the optimal model for each category had good generalization performance. The variable coefficients and significance tables of the four category models are shown in Tables S1-S4.
Table 6
Department | Number of patients | Average waiting time (min) | MAE (min) |
---|---|---|---|
Internal Medicine Department I | 13,413 | 25.6 | 2.46 |
Internal Medicine Department II | 3,717 | 55.6 | 13.08 |
Surgery Department I | 6,284 | 28.14 | 8.29 |
Surgery Department II | 4,245 | 56.67 | 13.18 |
Internal Medicine Department I included the general internal department. Internal Medicine Department II included the endocrinology department and pneumology department. Surgery Department I included the orthopedics department and general surgery department. Surgery Department II included the otolaryngology department and cardiothoracic surgery department. MAE, mean absolute error of the optimal model of each category in the external validation set.
Visualization of predicted time versus real time
Figure 3 presents the relationship between actual waiting time and predicted waiting time. In the test set of Internal Medicine Department I, the waiting time predicted with the RF algorithm and the actual waiting time were basically distributed near the axis of symmetry. Meanwhile, a similar tendency was observed for the RF algorithm in the training set and test set, which proved that there was no overfitting problem. Therefore, for Internal Medicine Department I, we chose the RF model as the final prediction model.
For Internal Medicine Department II, Surgery Department I, and Surgery Department II, the predicted and actual values of the LR method were distributed at opposite ends of the symmetry axis; meanwhile, for the GBDT algorithm, the data were more evenly spread along the symmetry axis. As a result, GBDT was chosen as the best prediction model, Internal Medicine Department II, Surgery Department I, and Surgery Department II.
Discussion
Waiting time is one of the indicators used to assess the overall quality of hospital health services. It is influenced by patients, hospitals, and society, and it has an element of unpredictability. In this work, a waiting time prediction model was established using the LR, RF, GBDT, and KNN algorithms, and the model was compared and analyzed using R2 and MAE. GBDT was found to perform the best among these AI algorithms, and Internal Medicine Department I showed the greatest R2 of these departments, at 0.97.
The impact of data sets on results
In this study, the outpatient and emergency flowchart of our hospital were changed before and during the COVID-19 pandemic. To differentiate fever patients from nonfever patients, the hospital included emergency nonfever patients in the normal outpatient clinic. The emergency nonfebrile patient queue sequence overlapped with the outpatient queue sequence. Following check-in, emergency patients would be positioned right in front of all out-patients, putting a strain on outpatient medical services (20). In order to account for changes in outpatient and emergency visiting practices, this study did not include outpatient data from before COVID-19, but rather only data from after September 2020 (21). At the start of the pandemic, patient flow was fluctuating but steadily stabilized until the second half of 2020 (22). Despite the abandoning of a substantial amount of early data, the situation in the second half of 2020 data was more accurate (23) and may be integrated more easily into the hospital’s outpatient system in the future, providing ease for patients and aiding in hospital decision-making. The waiting time was defined as the period from registration and to be beginning of consultation with a physician. After swiping the hospital card, the patient would be added to the queue, which eliminated the disruption caused by scheduling an appointment.
Although this study did not cover all hospital departments, it focused on departments with a greater number of people (>9,000). The findings of the exploratory analysis showed that the waiting time in departments with lower volumes (patients <9,000) was brief, and there was virtually no waiting. As a result, their data were not taken into consideration. Inadequate sample size may have an influence on the model’s training effect. The general internal medicine department had a sizable outpatient population, and hence the training’s sample size was the greatest. The model of Internal Medicine Department I outperformed that of other departments. As a significant component determining the waiting time, turn missing was also included as an independent variable in the model.
Not only did patients who missed their turn have to wait longer for a doctor’s consultation, but they also shortened the waiting time for the patient right behind of them, who could see a doctor immediately. There were a significant number of patients who missed a in the data set, and in actuality, the scenario of patients missing turns happened often in real life. As a consequence, since the direct deletion of registered patients might impact the extrapolation of the findings, we retained these data.
The effect of the algorithm on the results
Among the models in the four categories, the optimal model for Internal Medicine Department I was the RF model, and the optimal model for the other three categories was the GBDT model. The training algorithm for RF applied the technique of bagging in which a random sample was repeatedly selected, with replacement of the training set and the fitting of trees to these samples. Although the predictions of a single tree were highly sensitive to noise in its training set, the average of many trees was not as long as the trees were not correlated. Therefore, the bagging technique reduced the influences of outliers and noise on the model, thereby reducing the variance. Simply training many trees on a single training set would give strongly correlated trees, but bagging has a way of decorrelating the trees by showing them different training sets. Internal Medicine Department I involved only one department, and there was the certain linear relationship between the independent variables and the dependent variable (Figure 3). Therefore, the RF algorithm had the best performance in this category because it decreased the variance without increasing the bias.
In the other three categories, there were two small departments, and there was no obvious linear relationship between the independent variables and the dependent variable. Therefore, in this study, we needed to train the model to reduce the bias between the training value and true value. Gradient boosting combined weak “learners” into a single strong learner in an iterative fashion, with the aim of reducing the residual between the predicted value and the observed value in each iteration. As a result, the GBDT algorithm yielded a smaller bias than did the RF algorithm. The optimal models for Internal Medicine Department II, Surgery Department I, and Surgery Department II were the GBDT models.
The prediction performance for Internal Medicine Department I was the best among all kinds according to our findings after training and verification. The R2 of these four AI methods were all more than 0.9. It was possible to deduce the causes for this discrepancy from the following observations. First, the data volume of the internal medicine clinic was comparatively broad, and then there were more training samples, resulting in high accuracy. Second, Internal Medicine Department I only included a single department, general internal medicine, while the other categories consisted of multiple departments. The heterogeneity between departments might also be one of the reasons for the low accuracy (24). In our earlier understanding, merging and modeling different departments might adequately expand the sample size and reduce the working time. However, it would have a negative impact on the accuracy of the results (25). Therefore, each department should be represented independently in the subsequent phase. However, departments must first determine if they need AI to estimate wait times before proceeding. If the waiting time in the department is unusually short or the patient’s willingness for use AI is low, then there is no urgent need to predict the waiting time (26).
In the future, we plan to embed the models into a mobile social media app, the WeChat mini-program, allowing patients to access the predicted and potential wait time on mobile phone. This will enable outpatients to plan their own schedule and participate in other activities during the waiting period. Moreover, this would allow hospitals to allocate doctors according to the different waiting times of each department, improving the hospital management process. We also plan to develop a patient feedback program to assess patients’ satisfaction with the prediction system, thereby improving the AI system for predicting patient waiting time (27).
Comparison with the average method
At present, Chinese hospitals only provide patients with the number of patients waiting in line ahead of them. Therefore, the most intuitive way to estimate waiting time is to multiply the average waiting time of each patient by the number of patients waiting ahead in line. The average method predicts the waiting time of patients by calculating the average waiting time of patients in each department in the dataset and then multiplying it by the number of patients waiting ahead in line. The average of the absolute values of the difference between the predicted value and the true value are the MAE of the average method. The comparison of the optimal prediction model and the average method prediction capability for each category is shown in Table 7. The MAE of the optimal model in each category was increased over 35% compared to the MAE of the average method.
Table 7
Department | MAE (optimal model) | MAE (average method) | Improvement ratio |
---|---|---|---|
Internal Medicine Department I | 5.06 | 10.03 | 50% |
Internal Medicine Department II | 14.15 | 22.58 | 37% |
Surgery Department I | 8.76 | 13.48 | 35% |
Surgery Department II | 13.62 | 22.09 | 38% |
Internal Medicine Department I included the general internal department. Internal Medicine Department II included the endocrinology department and pneumology department. Surgery Department I included the orthopedics department and general surgery department. Surgery Department II included the otolaryngology department and cardiothoracic surgery department. Improvement ratio: (MAE of the optimal model − MAE of the average method)/MAE of the average method. MAE, mean absolute error.
Strengths and limitations
The advantage of this study is that departments with a large number of outpatient visits were selected based on real-world data, the possibility of using four algorithms to predict postepidemic waiting time was explored, and the performance of different models was compared (28,29). The use of AI in predicting patient wait times holds significant clinical implications, providing valuable insights for healthcare providers. By accurately estimating waiting durations, hospitals managing staff can optimize workflows and enhance the overall patient experience. The primary clinical significance of this predictive model lies in its ability to proactively manage resources and allocate staff effectively. By leveraging precise waiting time predictions, healthcare facilities can effectively ensure available personnel and facilities, minimizing overcrowding and reducing delays. Furthermore, patients can benefit from the transparency and predictability offered by this model. Access to estimated waiting times through a user-friendly interface, such as a mobile application, empowers individuals in planning their schedules accordingly. This informed decision-making not only reduces frustration and anxiety related to uncertain wait times but also improves patient satisfaction.
The main limitation to this study was that due to the single-center design, validation using data from other external sources is needed. Second, in order to expand the sample size, some departments were combined, which might have led to insufficient accuracy of prediction. Finally, the outpatient procedure of the hospital changed before and after the epidemic, and a massive amount of pre-epidemic data were not used.
Conclusions
Machine learning can predict the outpatient waiting time of pediatric hospitals and ease patient anxiety when queuing without medical appointments.
Acknowledgments
We are grateful to all the children and their guardians who provided the data to this study.
Funding: This study was supported by
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://tp.amegroups.com/article/view/10.21037/tp-23-58/rc
Data Sharing Statement: Available at https://tp.amegroups.com/article/view/10.21037/tp-23-58/dss
Peer Review File: Available at https://tp.amegroups.com/article/view/10.21037/tp-23-58/prf
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tp.amegroups.com/article/view/10.21037/tp-23-58/coif). S.L. reports grants paid for attending meetings. The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). This study was approved by the institute review board of Shanghai Children’s Medical Center (No. SCMCIRB-K2019020-2). This retrospective study did not require informed consent since it was not practicable.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Tang L. The Chinese community patient's life satisfaction, assessment of community medical service, and trust in community health delivery system. Health Qual Life Outcomes 2013;11:18. [Crossref] [PubMed]
- Chaou CH, Chen HH, Tang P, et al. Traffic Intensity of Patients and Physicians in the Emergency Department: A Queueing Approach for Physician Utilization. J Emerg Med 2018;55:718-25. [Crossref] [PubMed]
- Bundgaard H, Bundgaard JS, Raaschou-Pedersen DET, et al. Effectiveness of Adding a Mask Recommendation to Other Public Health Measures to Prevent SARS-CoV-2 Infection in Danish Mask Wearers: A Randomized Controlled Trial. Ann Intern Med 2021;174:335-43. [Crossref] [PubMed]
- Cheng N, Kuo A. Using Long Short-Term Memory (LSTM) Neural Networks to Predict Emergency Department Wait Time. Stud Health Technol Inform 2020;272:199-202. [Crossref] [PubMed]
- Jancauskas V, Piontek T, Kopta P, et al. Predicting queue wait time probabilities for multi-scale computing. Philos Trans A Math Phys Eng Sci 2019;377:20180151. [Crossref] [PubMed]
- Pak A, Gannon B, Staib A. Predicting waiting time to treatment for emergency department patients. Int J Med Inform 2021;145:104303. [Crossref] [PubMed]
- Kuo YH, Chan NB, Leung JMY, et al. An Integrated Approach of Machine Learning and Systems Thinking for Waiting Time Prediction in an Emergency Department. Int J Med Inform 2020;139:104143. [Crossref] [PubMed]
- Sapiertein Silva JF, Ferreira GF, Perosa M, et al. A machine learning prediction model for waiting time to kidney transplant. PLoS One 2021;16:e0252069. [Crossref] [PubMed]
- Ayoobi N, Sharifrazi D, Alizadehsani R, et al. Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods. Results Phys 2021;27:104495. [Crossref] [PubMed]
- Cha GW, Moon HJ, Kim YM, et al. Development of a Prediction Model for Demolition Waste Generation Using a Random Forest Algorithm Based on Small DataSets. Int J Environ Res Public Health 2020;17:6997. [Crossref] [PubMed]
- Hu Q, Tian F, Jin Z, et al. Developing a Warning Model of Potentially Inappropriate Medications in Older Chinese Outpatients in Tertiary Hospitals: A Machine-Learning Study. J Clin Med 2023;12:2619. [Crossref] [PubMed]
- Peng J, Chen C, Zhou M, et al. Peak Outpatient and Emergency Department Visit Forecasting for Patients With Chronic Respiratory Diseases Using Machine Learning Methods: Retrospective Cohort Study. JMIR Med Inform 2020;8:e13075. [Crossref] [PubMed]
- Raita Y, Goto T, Faridi MK, et al. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care 2019;23:64. [Crossref] [PubMed]
- Joseph JW. Queuing Theory and Modeling Emergency Department Resource Utilization. Emerg Med Clin North Am 2020;38:563-72. [Crossref] [PubMed]
- Sukhpal K, Himanshu A, Rani R. Hyper-parameter optimization of deep learning model for prediction of Parkinson’s disease. Mach Vis Appl 2020;31:32.
- Wong TT, Yeh PY. Reliable Accuracy Estimates from k-Fold Cross Validation. IEEE Trans Knowl Data Eng 2020;32:1586-94.
- Liang W, Luo S, Zhao G, et al. Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms. Mathematics 2020;8:765.
- Zhu Z, Li J, Huang J, et al. An intelligent prediagnosis system for disease prediction and examination recommendation based on electronic medical record and a medical-semantic-aware convolution neural network (MSCNN) for pediatric chronic cough. Transl Pediatr 2022;11:1216-33. [Crossref] [PubMed]
- Taunk K, De S, Verma S, et al. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India. 2019. doi:
10.1109/ICCS45141.2019.9065747 . - Macchi ZA, Ayele R, Dini M, et al. Lessons from the COVID-19 pandemic for improving outpatient neuropalliative care: A qualitative study of patient and caregiver perspectives. Palliat Med 2021;35:1258-66. [Crossref] [PubMed]
- Ayele R, Macchi ZA, Dini M, et al. Experience of Community Neurologists Providing Care for Patients With Neurodegenerative Illness During the COVID-19 Pandemic. Neurology 2021;97:e988-95. [Crossref] [PubMed]
- McCarthy ML, Ding R, Pines JM, et al. Comparison of methods for measuring crowding and its effects on length of stay in the emergency department. Acad Emerg Med 2011;18:1269-77. [Crossref] [PubMed]
- Begaz T, Elashoff D, Grogan TR, et al. Initiating Diagnostic Studies on Patients With Abdominal Pain in the Waiting Room Decreases Time Spent in an Emergency Department Bed: A Randomized Controlled Trial. Ann Emerg Med 2017;69:298-307. [Crossref] [PubMed]
- Grant RW, Lyles C, Uratsu CS, et al. Visit Planning Using a Waiting Room Health IT Tool: The Aligning Patients and Providers Randomized Controlled Trial. Ann Fam Med 2019;17:141-9. [Crossref] [PubMed]
- Harding KE, Snowdon DA, Lewis AK, et al. Staff perspectives of a model of access and triage for reducing waiting time in ambulatory services: a qualitative study. BMC Health Serv Res 2019;19:283. [Crossref] [PubMed]
- Ebert JF, Huibers L, Christensen B, et al. Does an emergency access button increase the patients' satisfaction and feeling of safety with the out-of-hours health services? A randomised controlled trial in Denmark. BMJ Open 2020;10:e030267. [Crossref] [PubMed]
- Mackert M, Mandell D, Donovan E, et al. Mobile Apps as Audience-Centered Health Communication Platforms. JMIR Mhealth Uhealth 2021;9:e25425. [Crossref] [PubMed]
- Almalki M, Giannicchi A. Health Apps for Combating COVID-19: Descriptive Review and Taxonomy. JMIR Mhealth Uhealth 2021;9:e24322. [Crossref] [PubMed]
- Fiol-DeRoque MA, Serrano-Ripoll MJ, Jiménez R, et al. A Mobile Phone-Based Intervention to Reduce Mental Health Problems in Health Care Workers During the COVID-19 Pandemic (PsyCovidApp): Randomized Controlled Trial. JMIR Mhealth Uhealth 2021;9:e27039. [Crossref] [PubMed]