MLH Workshop

Friday June 19th, 2020      10:00 am - 4:30 pm
The first OHSU - PSU Machine Learning for Health workshop is a free online workshop at the intersection of machine learning and health. Scholars from both institutions will present talks regarding innovative research. Students are invited to present their own research with the chance to win $300, $200, and $100 for 1st, 2nd, and 3rd prize. All registrants will receive an email invitation to the Zoom workshop.
Register for Free

More About the Workshop

Together we can make the difference

Talks from leading minds at the intersection of machine learning and health

The goal of the first OHSU-PSU workshop on Machine Learning for Health is to foster collaborations impacting medicine by bringing together clinicians, health data-scientists, statisticians and statistical learning researchers from OHSU and PSU. The workshop will feature invited talks from leading voices in both medicine and statistical learning. Feel free to join the meeting for any or all presentations.

Opportunity for students to present their work to industry leaders

We welcome short paper submissions highlighting research contributions at the intersection of machine learning and health. Accepted submissions will be featured as Zoom presentations.

Meet our Speakers

Dr. Meysam Asgari
OHSU
Topic-Based Measures of Conversation for Detecting Mild Cognitive Impairment
Dr. Alexander Kain
OHSU
Mispronunciation Detection and Feedback for Children with Speech Sound Disorders
Dr. Young Hwan Chang
OHSU
Seeing More with Deep Learning in Biomedical Imaging
Dr. Steven Bedrick
OHSU
Natural Language Processing and COVID-19: What is our Field Contributing?
Dr. Xubo Song
OHSU
Applications of Machine Learning in Biomedical Image Computing
Stephanie M. Cope
Intel
AI in Health and Life Sciences at Intel
Dr. Bruno Jedynak
PSU
From Reproducing Kernel Hilbert Spaces to Health and Back
Anastasia Adriano
PSU
The State of Machine Learning for Health Research at OHSU/PSU

Student Research Presentations

Presentation

The student research session provides an opportunity for students to present and discuss their research at the First OHSU-PSU Workshop on Machine Learning for Health. Qualified research provides a novel contribution to the field, and extends beyond a literature review. Participants will make a brief (2-minute) presentation discussing their work via Zoom and answer audience questions on Friday, June 19th.

Prizes

Prizes will be awarded to the top 3 student researchers based on creativity/originality, skillful communication of research results, & research project design and methodology. These prizes will be Amazon  gift card in the amounts of $300, $200, & $100.

Our Student Presentations
Anastasia Adriano
Biometric Markers for Parkinson's Data- a Kernel Method Point of Vue
2 Archana Machireddy
Early Prediction of Breast Cancer Therapy Response Using Multiresolution Fractal Analysis of DCE-MRI Parametric Maps
3 Brian R. Snider
Deep Neural Networks for Sleep-Disordered Breathing Event Detection and Severity Estimation
4 David Lovitz
Reducing sample size requirements for randomized control trials using high-frequency digital biomarkers

5 Elliot Gray
Elucidating intratumoral T and B cell functionality related to spatial metrics for pancreatic cancer patient stratification via interpretable machine learning 
6 Erik Burlingame
Balanced learning of cell state representations
7 Geoffrey Schau
Unsupervised Histological Feature Manifold Learning for Massively Parallel Whole Slide Annotation
8 Liu Chen
Improving the Assessment of Mild Cognitive Impairment in Advanced Age With a Novel Multi-Feature Automated Speech and Language Analysis of Verbal Fluency
9 Luke Ternes
VISTA: Virtual ImmunoSTAining for pancreatic disease quantification in murine cohorts

10 Michael Wells
Accelerated DC Algorithms for Image Reconstruction and Dictionary Learning
11 Tuan Dinh
Using conditional adversarial networks for intelligibility improvement for dysarthric speech and laryngectomees
12 Victor Rielly
Imaging and Support Vector Machines for the Early Prediction of Breast Cancer Response to Neoadjuvant Chemotherapy
13 Wei-Chun Lin
Predicting Late Patients in Pediatric Ophthalmology Outpatient Clinic Using Machine Learning

Biometric Markers for Parkinson's Data- a Kernel Method Point of Vue

Anastasia Adriano, Dr. Bruno Jedynak, Ethan Lew, David Lovitz, Andrew Sandall, David Sewell, and Taiyo Terada

Portland State University


Multivariate Kernel Ridge Regression (KMRR) techniques were applied to accelerometer data (smartwatch) of patients with Parkinson's disease (PD) in order to simultaneously predict medication status, dyskinesia severity, and tremor severity. Our work establishes a computer-based analysis technique that uses a convolutional neural network (CNN) and a Long Short-Term memory network to create the kernel. The identity of each participant was available at the time of the test. One regression was trained per participant. Using the mean square error (MSE) and held-out data to evaluate our work, we achieved the following MSE performance: medication status with .54, dyskinesia severity at .24, and tremor severity at .21. This is to compare to the baseline model consisting of predicting the average training rating with respective MSEs of .97, .43, and .44. Overall, the KMRR technique outperformed the null model and has the potential to support the treatment of PD while facilitating a deeper understanding of diagnosis using accelerometer data.

Early Prediction of Breast Cancer Therapy Response Using Multiresolution Fractal Analysis of DCE-MRI Parametric Maps

Archana Machireddy, Wei Huang, and Xubo Song

Oregon Health & Science University


Tumor vasculature generally exhibits strong spatio-temporal heterogeneity, which reflects tumor progression and disease stage. A significant change in tumor metabolism usually precedes tumor size reduction in response to neoadjuvant chemotherapy (NACT). Image texture features that capture the change of heterogeneity in tumor microvasculature as measured by dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) have been shown to be capable of providing early prediction of breast cancer response to NACT. In this work we aimed to determine whether multiresolution fractal analysis of voxel-based DCE-MRI parametric maps can provide early prediction of breast cancer response to NACT. The proposed multiresolution fractal method and the more conventional methods of single-resolution fractal, gray-level co-occurrence matrix, and run-length matrix were used to extract features from the parametric maps. With a training (N = 40) and testing (N = 15) data set, support vector machine (SVM) was used to assess the predictive abilities of the features in classification of pathologic complete response versus non-pathologic complete response. The SVM classification performance was evaluated by calculating the average over ten random partitions of the training and testing data. Generally the multiresolution fractal features from individual maps and the concatenated features from all parametric maps showed better predictive performances than conventional features. The differences in AUC were statistically significant (P < .05) for several parametric maps. Thus, multiresolution analysis that decomposes the texture at various spatial-frequency scales may more accurately capture changes in tumor vascular heterogeneity as measured by DCE-MRI, and therefore provide better early prediction of NACT response.

Deep Neural Networks for Sleep-Disordered Breathing Event Detection and Severity Estimation

Brian R. Snider

Oregon Health & Science University


Sleep-disordered breathing (SDB) is recognized as a widespread, under-diagnosed condition associated with many detrimental health problems.  The condition, with its numerous related comorbidities, places a significant burden on the individual and the healthcare system alike, with untreated SDB patients utilizing health resources at twice the usual rate.  The most common form of SDB is obstructive sleep apnea, characterized by frequent transient reductions of oxygen saturation, cessations of ventilatory airflow, and collapse or obstruction of the upper airway.

The current gold standard for diagnosis of SDB is full-night polysomnography (PSG).  This overnight procedure takes place in a sleep laboratory and is obtrusive, typically recording twelve or more physiological processes requiring dozens of sensor leads to be attached to the patient.  Scoring of study results is subjective and time-consuming, as an entire study must be manually assessed by a human expert to determine a diagnosis.

We hypothesize that deep neural network-based systems can detect individual SDB events with an acceptable level of inter-rater reliability with human experts, and predict overall SDB severity with a strong correlation to the clinically-derived apnea hypopnea index.  We apply these approaches to a large PSG corpus collected at the Oregon Health & Science University sleep lab.  We present our work on these approaches, including additional work on event scoring-specific issues such as sensor failure, oximetry sensor desaturation delay, and sensor baseline estimation, and outline remaining work toward our goal of automatic, objective, and accurate event scoring and severity estimation.

Sleep-disordered breathing (SDB) is recognized as a widespread, under-diagnosed condition associated with many detrimental health problems.  The condition, with its numerous related comorbidities, places a significant burden on the individual and the healthcare system alike, with untreated SDB patients utilizing health resources at twice the usual rate.  The most common form of SDB is obstructive sleep apnea, characterized by frequent transient reductions of oxygen saturation, cessations of ventilatory airflow, and collapse or obstruction of the upper airway.

The current gold standard for diagnosis of SDB is full-night polysomnography (PSG).  This overnight procedure takes place in a sleep laboratory and is obtrusive, typically recording twelve or more physiological processes requiring dozens of sensor leads to be attached to the patient.  Scoring of study results is subjective and time-consuming, as an entire study must be manually assessed by a human expert to determine a diagnosis.

We hypothesize that deep neural network-based systems can detect individual SDB events with an acceptable level of inter-rater reliability with human experts, and predict overall SDB severity with a strong correlation to the clinically-derived apnea hypopnea index.  We apply these approaches to a large PSG corpus collected at the Oregon Health & Science University sleep lab.  We present our work on these approaches, including additional work on event scoring-specific issues such as sensor failure, oximetry sensor desaturation delay, and sensor baseline estimation, and outline remaining work toward our goal of automatic, objective, and accurate event scoring and severity estimation.

Reducing sample size requirements for randomized control trials using high-frequency digital biomarkers

David Lovitz, Daniel Taylor-Rodriguez, Nora Mattek, Chao-Yi Wu, Jeffrey Kaye, Hiroko H. Dodge, and Bruno M. Jedynak

Portland State University


High-frequency biomarkers (HFBs) are quantified biological characteristics that are collected daily, weekly, or more often. Examples of digital biomarkers are computer usage and sleep duration. HFBs can potentially reduce the sample size requirements for randomized control trials (RCT) and for the early detection of disease onset. Our methodology aims to use HFBs to detect differences between the placebo and experimental groups in an RCT. Current statistical analyses (for example linear mixed effect models with random intercept) detect changes in the mean trajectory but fail to detect changes in the covariance structure. Using data from the Oregon Center for Aging and Technology (ORCATECH), we examined whether more careful modeling could reduce the sample size and trial length for an RCT. We present a novel statistical methodology for assessing the outcome of an RCT using HFB data, using Gaussian process modeling, the fisher Kernel and Maximum Mean Discrepancy (MMD). We demonstrate our approach using simulations and computer usage data from ORCATECH to show that a careful statistical procedure reduces the sample size needed for a 100 week RCT by taking advantage of HFB data. This method compares favorably with the linear mixed effect model, which is the traditional approach.  More experiments will be necessary to validate these findings.

High-frequency biomarkers (HFBs) are quantified biological characteristics that are collected daily, weekly, or more often. Examples of digital biomarkers are computer usage and sleep duration. HFBs can potentially reduce the sample size requirements for randomized control trials (RCT) and for the early detection of disease onset. Our methodology aims to use HFBs to detect differences between the placebo and experimental groups in an RCT. Current statistical analyses (for example linear mixed effect models with random intercept) detect changes in the mean trajectory but fail to detect changes in the covariance structure. Using data from the Oregon Center for Aging and Technology (ORCATECH), we examined whether more careful modeling could reduce the sample size and trial length for an RCT. We present a novel statistical methodology for assessing the outcome of an RCT using HFB data, using Gaussian process modeling, the fisher Kernel and Maximum Mean Discrepancy (MMD). We demonstrate our approach using simulations and computer usage data from ORCATECH to show that a careful statistical procedure reduces the sample size needed for a 100 week RCT by taking advantage of HFB data. This method compares favorably with the linear mixed effect model, which is the traditional approach.  More experiments will be necessary to validate these findings.

Elucidating intratumoral T and B cell functionality related to spatial metrics for pancreatic cancer patient stratification via interpretable machine learning

Elliot Gray, Shannon Liudahl, Shamilene Sivagnanam, Courtney Betts, Jason Link, Dove Keith, Brett Sheppard, Rosalie Sears, Guillaume Thibault, Joe W. Gray, Lisa M. Coussens, and Young Hwan Chang

Oregon Health & Science University

 

Pancreatic ductal adenocarcinoma (PDAC) patients, who often present with stage 3 or 4 disease, face a dismal prognosis as the 5-year survival rate remains below 10%. Recent studies have revealed that CD4+ T, CD8+ T, and/or B cells in specific spatial arrangements relative to intratumoral regions correlate with clinical outcome for patients, but the complex functional states of those immune cell types remain to be incorporated into prognostic biomarker studies. Here, we developed an interpretable machine learning-based model to analyze the functional relationship between leukocyte-leukocyte or leukocyte-tumor cell spatial proximity, correlated with clinical outcome of 46 therapy-nave PDAC patients following surgical resection. Using a multiplex Immunohistochemistry data set focused on profiling leukocyte functional status, our model identified features that significantly distinguished patients in the fourth quartile from those in the first quartile of survival. CD4 T helper cell and CD68+ myeloid cell frequency, amongst CD45+ immune cells represented both positive and negative prognostic stratifiers, respectively. The frequency of Granzyme B-positive CD4 and CD8 T cells, indicative of cytotoxicity, was significantly associated with patient survival in both univariate tests and in our multivariate model. Similarly, CD4 T- to B-cell proximity, as well as the frequency of PD1 and EOMES double-positivity on CD8 T cells, were significant prognostic features. Our analysis links the immune microenvironment of late-stage PDAC tumors to outcome of patients, thus providing clues about intratumoral cues associated with more progressive disease.

Pancreatic ductal adenocarcinoma (PDAC) patients, who often present with stage 3 or 4 disease, face a dismal prognosis as the 5-year survival rate remains below 10%. Recent studies have revealed that CD4+ T, CD8+ T, and/or B cells in specific spatial arrangements relative to intratumoral regions correlate with clinical outcome for patients, but the complex functional states of those immune cell types remain to be incorporated into prognostic biomarker studies. Here, we developed an interpretable machine learning-based model to analyze the functional relationship between leukocyte-leukocyte or leukocyte-tumor cell spatial proximity, correlated with clinical outcome of 46 therapy-nave PDAC patients following surgical resection. Using a multiplex Immunohistochemistry data set focused on profiling leukocyte functional status, our model identified features that significantly distinguished patients in the fourth quartile from those in the first quartile of survival. CD4 T helper cell and CD68+ myeloid cell frequency, amongst CD45+ immune cells represented both positive and negative prognostic stratifiers, respectively. The frequency of Granzyme B-positive CD4 and CD8 T cells, indicative of cytotoxicity, was significantly associated with patient survival in both univariate tests and in our multivariate model. Similarly, CD4 T- to B-cell proximity, as well as the frequency of PD1 and EOMES double-positivity on CD8 T cells, were significant prognostic features. Our analysis links the immune microenvironment of late-stage PDAC tumors to outcome of patients, thus providing clues about intratumoral cues associated with more progressive disease.

Balanced learning of cell state representations

Erik Burlingame, Jennifer Eng, Guillaume Thibault, Geoffrey Schau, Koei Chin, Joe W. Gray, and Young Hwan Chang

Oregon Health & Science University


Cell state characterization is essential to patient diagnosis and treatment and can be defined by a cell's morphology or the markers it expresses. High-dimensional imaging methods like cyclic multiplexed immunofluorescence (cmIF) enable unprecedented in situ cell state characterization through iterative labeling of tens of markers within the same tissue. Awareness of cell state at this resolution can augment diagnostic and prognostic decision-making. To model such heterogeneity, we must uniformly balance cell state distributions between training and validation datasets. Operating under the assumption that cell morphology reflects features of cell state, here we present a balanced deep learning framework that leverages the nuclear morphology of cells as visualized by fluorescent DAPI staining to infer features of their state. Learned cell state representations can facilitate virtual staining of human biopsy tissues based on common tissue stains like DAPI or hematoxylin and eosin (H&E) alone. A model which infers cell state using low-cost and widely available reagents like DAPI and H&E even if only a limited number of cell state features could bring the benefits of cmIF to more patients and in a clinically relevant timeframe.

Cell state characterization is essential to patient diagnosis and treatment and can be defined by a cell's morphology or the markers it expresses. High-dimensional imaging methods like cyclic multiplexed immunofluorescence (cmIF) enable unprecedented in situ cell state characterization through iterative labeling of tens of markers within the same tissue. Awareness of cell state at this resolution can augment diagnostic and prognostic decision-making. To model such heterogeneity, we must uniformly balance cell state distributions between training and validation datasets. Operating under the assumption that cell morphology reflects features of cell state, here we present a balanced deep learning framework that leverages the nuclear morphology of cells as visualized by fluorescent DAPI staining to infer features of their state. Learned cell state representations can facilitate virtual staining of human biopsy tissues based on common tissue stains like DAPI or hematoxylin and eosin (H&E) alone. A model which infers cell state using low-cost and widely available reagents like DAPI and H&E even if only a limited number of cell state features could bring the benefits of cmIF to more patients and in a clinically relevant timeframe.


Unsupervised Histological Feature Manifold Learning for Massively Parallel Whole Slide Annotation

Geoffrey Schau, Hassan Ghani, Erik Burlingame,  Joe Gray, Chris Corless, and Young Hwan Chang 

Oregon Health & Science University


Pathologists evaluate whole slide histology to render cancer diagnoses and infer the grade and stage of tumors, which inform the selection of appropriate follow-up biomarker tests with prognostic and therapeutic significance for the patient. Specialized scanners digitize these large and detailed sections of tissues, enabling the application of computer vision methods and giving rise to the emerging field of computational pathology. Recent deep learning approaches use pathologists' annotations to train models to recognize specific features of interest automatically. However, generating the requisite labeled datasets requires a pathologist to annotate dozens of whole slide images in great detail, a process in which a pathologist might use a single annotation label hundreds of times. Because this process is time intensive and thus cost prohibitive, the collection of pathologists' annotations presents a limiting bottleneck for computer vision research in digital pathology. This work seeks to resolve a critical research bottleneck and provide an avenue to significantly accelerate annotation by expert human pathologists within digital pathology research. We present DeepHAT: a deep learning-based Histology Annotation Tool that integrates unsupervised and semi-supervised feature-learning methods with an interactive web-based annotation tool to facilitate massively parallelized and simultaneous annotation of hundreds of whole slide images. We illustrate how a single pathologist can concurrently annotate hundreds of whole slide images with DeepHAT to segregate normal liver tissue from whole slide liver metastases and to identify primary tumor tissue from the context of microenvironments, such as necrosis, inflammation, hemorrhage, and healthy tissue.

Pathologists evaluate whole slide histology to render cancer diagnoses and infer the grade and stage of tumors, which inform the selection of appropriate follow-up biomarker tests with prognostic and therapeutic significance for the patient. Specialized scanners digitize these large and detailed sections of tissues, enabling the application of computer vision methods and giving rise to the emerging field of computational pathology. Recent deep learning approaches use pathologists' annotations to train models to recognize specific features of interest automatically. However, generating the requisite labeled datasets requires a pathologist to annotate dozens of whole slide images in great detail, a process in which a pathologist might use a single annotation label hundreds of times. Because this process is time intensive and thus cost prohibitive, the collection of pathologists' annotations presents a limiting bottleneck for computer vision research in digital pathology. This work seeks to resolve a critical research bottleneck and provide an avenue to significantly accelerate annotation by expert human pathologists within digital pathology research. We present DeepHAT: a deep learning-based Histology Annotation Tool that integrates unsupervised and semi-supervised feature-learning methods with an interactive web-based annotation tool to facilitate massively parallelized and simultaneous annotation of hundreds of whole slide images. We illustrate how a single pathologist can concurrently annotate hundreds of whole slide images with DeepHAT to segregate normal liver tissue from whole slide liver metastases and to identify primary tumor tissue from the context of microenvironments, such as necrosis, inflammation, hemorrhage, and healthy tissue. 

Improving the Assessment of Mild Cognitive Impairment in Advanced Age With a Novel Multi-Feature Automated Speech and Language Analysis of Verbal Fluency

Liu Chen, Meysam Asgari, Robert Gale, Katherine Wild, Hiroko Dodge, and Jeffrey Kaye

Oregon Health & Science University


The animal fluency (AF) test, whose final score is the number of uniquely generated animal names, is a cognitive test that requires examinees retrieving as many words in animal category as possible in a short duration of time, typically one minute. Conventionally, cognitively intact (CI) participants achieve higher score than the those with mild cognitive impairment (MCI). Although, the scoring method captures effective information, the effortless process loses potential useful information. By applying advanced computer techniques, one can easily deploy labor-intensive process to capture other clinically relevant information, such as the pattern of retrieving words. Troyer et al. (1997) characterized the pattern by measuring the semantic relation between adjacent animal names.

Method: We designed features that capitalized the temporal aspect of the semantic relation by utilizing an automatic speech recognition (ASR) system to generate the timestamp of each name in an answer. Our model semantically clusters animal names and automatically characterizes the semantic search strategy of subjects in retrieving words from those clusters. Extracted time-based measures along with standard count-based features are then used in a support vector machine (SVM) classifier to examine the utility of these measures in distinguishing those with MCI from CI controls.

Results: We experimentally showed that the combination of both count-based and time-based features, automatically derived from the test response, achieved 77% on AUC-ROC of the SVM classifier, outperforming the model trained only on the VF test score (AUC, 65%), and well above the chance model (AUC,50%).

VISTA: Virtual ImmunoSTAining for pancreatic disease quantification in murine cohorts

Luke Ternes, Ge Huang, Christian Lanciault, Guillaume Thibault, Rachelle Rigger, Joe W. Gray, John Muschler, and Young Hwan Chang

Oregon Health & Science University


Mechanistic disease progression studies using animal models require objective and quantifiable assessment of tissue pathology. Currently quantification relies heavily on staining methods which can be expensive, labor/time-intensive, inconsistent across laboratories and batch, and produce uneven staining that is prone to misinterpretation and investigator bias. We developed an automated semantic segmentation tool utilizing deep learning for rapid and objective quantification of histologic features relying solely on hematoxylin and eosin stained pancreatic tissue sections. The tool segments normal acinar structures, the ductal phenotype of acinar-to-ductal metaplasia (ADM), and dysplasia. Disease quantifications produced by our computational tool were correlated to the results obtained by immunostaining markers (DAPI, amylase, and cytokeratins; correlation score= 0.9, 0.95, and 0.91). Moreover, our tool distinguishes ADM from dysplasia, which are not reliably distinguished with immunostaining, and demonstrates generalizability across murine cohorts with pancreatic disease. We quantified the changes in histologic feature abundance for murine cohorts with oncogenic Kras-driven disease, and the predictions fit biological expectations, showing stromal expansion, a reduction of normal acinar tissue, and an increase in both ADM and dysplasia as disease progresses. Our tool promises to accelerate and improve the quantification of pancreatic disease in animal studies and become a unifying quantification tool across laboratories.


Accelerated DC Algorithms for Image Reconstruction and Dictionary Learning

Michael Wells, Dr. Mau Nam Nguyen, Dr. Thai An Nguyen and Lewis Hicks

Portland State University


We explore image dictionary learning via non-convex (difference of convex,

DC) programming and its applications to image reconstruction. First, the image reconstruction problem is detailed and solutions are presented. Each such solution requires an image dictionary to be specifed directly or to be learned via non-convex programming. The solutions explored are the DCA (DC algorithm) and the boosted DCA. These various forms of dictionary learning are then compared on the basis of both image reconstruction accuracy and number of iterations required to converge.

Using conditional adversarial networks for intelligibility improvement for dysarthric speech and laryngectomees

Tuan Dinh and Alexander Kain

Oregon Health & Science University


We explored voice conversion systems to improve speech intelligibility of 1) dysarthric speech and 2) laryngectomees.

In the first case, we explore the potential of conditional generative adversarial networks (cGANs) to learn the mapping from habitual speech to clear speech. We evaluated the performance of cGANs in three tasks: 1) speaker-dependent one-to-one mappings, 2) speaker-independent many-to-one mappings, and 3) speaker-independent many-to-many mappings. In the first task, cGANs outperformed a traditional deep learning (DNN) mapping in term of average keyword recall accuracy and the number of speakers with improved intelligibility. In the second task, we showed that without clear speech, we can significantly improve intelligibility of the habitual speech of one of three speakers. In the third task which is the most challenging one, we improved the keyword recall accuracy for two of three speakers.

In the second case, we aim to improve speech of laryngectomees in term of intelligibility and naturalness. We predict the voicing and voicing degree for laryngectomees from speech spectra using a deep neural network. We use a logarithmically falling synthetic F0 for statement phrases. Spectra are converted to synthetic target spectra using a cGAN.

Imaging and Support Vector Machines for the Early Prediction of Breast Cancer Response to Neoadjuvant Chemotherapy

Victor Rielly, Wei Huang,  and Bruno Jedynak

Portland State University


We trained a Support Vector Machine (SVM) to predict the effectiveness of neoadjuvant chemotherapy for treating breast cancer. Our models were trained on clinical and histopathological  data as well as dynamic contrast-enhanced (DCE) MRI images taken at 4 separate visits during the course of chemotherapy treatment. We compared the ROC scores of models trained using 3 different models in order to see what biological models might be more useful in the task of early prediction of treatment outcome. Having a very limited dataset from 50 patients, great care was taken in performing feature selection, and developing training procedures to maximize our model's performance. We focused on a very simple linear model with a small number of strong features. Although we lack the data size to make statistically significant claims, our initial results suggest a new biological model is better suited for the task of early prediction of treatment outcome over standard models used in practice today.

Predicting Late Patients in Pediatric Ophthalmology Outpatient Clinic Using Machine Learning

Wei-Chun Lin, Jimmy Chen, Michelle Hribar, Michael Chiang

Oregon Health & Science University


Purpose: As healthcare shifts towards value-based care, there has been an increased focus on providing efficient and cost-effective clinical services. An important barrier for clinic efficiency is a patient's late arrival. Predicting which patients will be late can allow clinic schedulers to adjust and optimize the schedule to minimize the disruption of patient lateness. The purpose of this study was to develop machine learning models to predict late patients in pediatric ophthalmology clinics at OHSU.

Methods: The data was collected from office visits from 2012 to 2018 at OHSU. Time-stamp and office visit data were used to calculate time-related variables. Patients who checked-in more than 10 minutes after their scheduled appointment time were considered late. Models using random-forest, gradient boosting machine (GBM), support vector machine (SVM), and logistic regression were developed to predict whether the patient would arrive late. We used 10- fold cross-validation to reduce over-fitting. AUC-ROC scores were used to evaluate the accuracy of the prediction models.

Results: Figure 1 shows the ROC curves and AUC-ROC scores of four machine learning models for distinguishing late patients. The random-forest had the best performance (AUC=0.654) followed by GBM model with the second-best performance (AUC=0.652). The top three important predictors identified in the GBM model were clinic volume, patients arrived late in the previous visit, and previous exam length.

Conclusions: Machine learning model with secondary use of EHR data can be used to predict late patients with reasonable success. More work is needed to refine the models to improve accuracy.

Workshop Schedule

10:00 - 10:30 Welcome & Introductions
10:30 - 10:50 The Current State of Machine Learning for Health Research at OHSU/PSU Anastasia Adriano
10:55 - 11:15 From Reproducing Kernel Hilbert Spaces to Health and Back Dr. Bruno Jedynak
11:20 - 11:40 Mispronunciation Detection and Feedback for Children with Speech Sound Disorders Dr. Alexander Kain
12:00 - 1:30 Student Presentations, Networking and Tudent Presentations
1:45 - 2:05 Natural Language Processing and COVID-19: What is our Field Contributing? Dr. Steven Bedrick
2:10 - 2:30 Topic-Based Measures of Conversation for Detecting Mild Cognitive Impairment Dr. Meysam Asgari
2:35 - 2:55 Applications of Machine Learning in Biomedical Image Computing Dr. Xubo Song
3:00 - 3:20 Seeing More with Deep Learning in Biomedical Imaging Dr. Young Hwan Chang
3:25 - 3:30 Student Presentation Prizes
3:30 - 3:45 Ai in Health and Life Sciences at Intel Stephanie M. Cope
3:45 - 4:15 Panel Q&A
4:15 - 4:30 Closing Comments

Our Organizers

Eric Wan, PSU
James McNames, PSU
Steven Bedrick, OHSU
Alexander Kain, OHSU
Jong Kim, PSU
Bruno Jedynak, PSU
Daniel Taylor-Rodriguez, PSU
Wei Huang, OHSU
Jeffrey Kaye, OHSU
Zachary Beattie, OHSU
Meysam Asgari, OHSU
Hiroko Dodge, OHSU
Anastasia Adriano, PSU
Anthony Rhodes, PSU
Feng Liu, PSU
Xubo Song, OHSU
Jodi Lapidus, OHSU
Workshop Identity Design and Marketing performed by Leo Jedynak Branding Strategy
You can find more work at https://www.leojedynak.com/

If you have any concerns or trouble registering please contact bruno.jedynak@pdx.edu
Share by: