The University of Florida Informatics Institute will host its 5th Annual Symposium on Friday, October 11th, 2019, at the Reitz Union. Students, researchers, faculty, and industry professionals from across the nation are invited to join together to interact, share research, and collaborate. Focusing on the latest trends in informatics, and our Mission of cross-discipline collaborative research, we have invited guest speakers from across the campus and outside the University to present on topics, including election data, text mining, network science, and machine learning.
Location and Date
J. Wayne Reitz Union
2nd Floor, Room 2365
Friday, October 11th, 2019
8:00 AM: Check-in & Coffee
9:00 AM: “State of the Institute”
Dr. George Michailidis
UFII Founding Director and Professor of Statistics
University of Florida
9:15 AM : “Big Data in Climate and Earth Sciences: Challenges and Opportunities for Data Science”
Dr. Vipin Kumar
Regents Professor and William Norris Chair in Large Scale Computing
Department of Computer Science and Engineering
University of Minnesota
The climate and earth sciences have recently undergone a rapid transformation from a data-poor to a data-rich environment. In particular, massive amount of data about Earth and its environment is now continuously being generated by a large number of Earth observing satellites as well as physics-based earth system models running on large-scale computational platforms. These massive and information-rich datasets offer huge potential for understanding how the Earth’s climate and ecosystem have been changing and how they are being impacted by humans actions. This talk will discuss various challenges involved in analyzing these massive data sets as well as opportunities they present for both advancing machine learning as well as the science of climate change in the context of monitoring the state of the tropical forests and surface water on a global scale.
10:15 AM: Coffee Break
10:30 AM: “Towards Honest Inference from Real-World Healthcare Data”
Dr. David Madigan
Professor of Statistics
Data Science Institute
In practice, our learning healthcare system relies primarily on observational studies generating one effect estimate at a time using customized study designs with unknown operating characteristics and publishing – or not – one estimate at a time. When we investigate the distribution of estimates that this process has produced, we see clear evidence of its shortcomings, including an apparent over-abundance of statistically significant effects. We propose a standardized process for performing observational research that can be evaluated, calibrated and applied at scale to generate a more reliable and complete evidence base than previously possible. We demonstrate this new paradigm by generating evidence about all pairwise comparisons of 39 treatments for hypertension for a relevant set of 58 health outcomes using nine large-scale health record databases from four countries. In total, we estimate 1.3M hazard ratios, each using a comparative effectiveness study design and propensity score stratification on par with current one-off observational studies in the literature. Moreover, the process enables us to employ negative and positive controls to evaluate and calibrate estimates ensuring, for example, that the 95% confidence interval includes the true effect size 95% of time. The result set consistently reflects current established knowledge where known, and its distribution shows no evidence of the faults of the current process.
11:30 AM: Poster Session and Lunch *lunch provided
Reitz Rooms: 2335
1:00 PM: SEED Speaker: “Automatics selection of the number of clusters in highly multivariate biodiversity data using the truncated stick breaking prior: applications in Bayesian mixture and mixed-membership models”
Dr. Denis Valle
Department of Forest and Resources Conservation
Understanding and predicting how species composition has been and will be altered by anthropogenic stressors is key to sustaining biodiversity and ecosystem functioning. However, biodiversity data are challenging to interpret because they are highly multivariate, containing information on the presence/absence or abundance of tens to thousands of species in any given location and/or at any given time, requiring dimension reduction techniques (e.g., clustering) to generate interpretable findings.
This talk will focus on the truncated stick breaking (TSB) prior (a commonly used prior in Bayesian nonparametrics) and how it can be used within a wide range of clustering models for biodiversity (e.g., stochastic block models, species archetype models, and modified Latent Dirichlet Allocation models) to determine the optimal number of clusters. The standard approach to determining the optimal number of clusters consists of users running the algorithm for different number of groups to then choose the optimal number of groups using a criterion (e.g., AIC or BIC). The problem with this two-stage approach is that it can be computationally expensive to run these clustering algorithms multiple times for different numbers of groups and the uncertainty associated with the final clustering results is underestimated. Both of these problems are avoided through the use of the TSB prior. We believe that a range of ecological and environmental sciences applications will benefit from this TSB prior approach given the ubiquity of clustering in the field.
1:30 PM: SEED Speaker: “Phylodynamic analysis of pathogens and AI-based algorithms in the genomics era”
Dr. Marco Salemi
Holloway and McCalmma Professor in Experimental Pathology
Department of Pathology, Immunology, and Laboratory Medicine
UF College of Medicine
XXI century biology research is undergoing a profound paradigm shift fostered by the exponential advance in the ease and cost of DNA/RNA sequencing techniques. The advent, in 2005, of high-throughput next generation (NextGen) sequencing made it possible to obtain large full-genome data sets in a matter of hours or days rather than months or years. At the same time, a novel computational framework, combining phylogenetic analysis and coalescence theory (phylodynamic), has been developed to correlate epidemiology and evolutionary behavior of pathogens using time-consuming algorithms that can nowadays be run efficiently on high performance computer clusters. Reconstructing viral or bacterial evolutionary history from full genome sequences provides key insights on the dynamics underlying epidemics of pathogens such as HIV, Ebola, Vibrio cholerae and MRSA. For example, statistical molecular clock and spatial diffusion models, can be used to track the temporal origin of an outbreak and investigate factors driving its spatial spread. Finally, new tools are currently been developed, coupling phylodynamic inference and behavioral network data with artificial intelligence algorithms, to predict the trajectory of viral transmission clusters and inform on key determinants of new infections.
2:00 PM: “LSST: the greatest movie of all time is coming to you!”
Dr. Zeljko Ivezic
Professor of Astronomy
Department of Astronomy
University of Washington
The Large Synoptic Survey Telescope (LSST, www.lsst.org) will deliver the most comprehensive optical sky survey ever undertaken. Starting in 2022, LSST will take panoramic images of the entire visible sky twice each week for 10 years. The resulting hundred-petabyte imaging dataset for close to 40 billion objects will be used for scientific investigations ranging from the properties of potentially hazardous asteroids to characterizations of dark matter and dark energy. I will start with a brief overview of the LSST science drivers and system design, describe the status of its construction, and then touch on some Big Data research challenges that need to be tackled to make the best use of LSST data.
3:00 PM: Coffee Break
3:15 PM: SEED Speaker: “Optimized Wombling for LHC data”
The relevant information from collision events from the Large Hadron Collider (LHC) in Geneva, Switzerland can be represented as spacial point data in a suitable phase space. The observation of sharp discontinuities in the observed event number density would hint at the presence of new physics beyond the Standard Model. We apply and further improve upon some known wombling techniques from other fields. We illustrate our method with simulated high energy data.
3:30 PM: “Intelligent Narrative-Centered Learning Enviroments”
Dr. James Lester
Distinguished University Professor of Computer Science
North Carolina State University
Adaptive learning technologies offer significant promise for bringing about fundamental improvements in education and training. For the past decade we have been investigating a family of intelligent game-based learning environments focusing on narrative-centered learning and integrating intelligent tutoring systems with game technologies. Research on these narrative-centered learning environments seeks to combine the inferential capabilities of user-adaptive systems and intelligent user interfaces with the rich gameplay supported by game engines. This line of investigation has the dual objectives of increasing learning effectiveness and promoting student engagement. In this talk we will introduce the principles motivating the design of narrative-centered learning environments, describe their roots in intelligent interactive narrative, and discuss ongoing work exploring their role in formal settings (K-12 schools, training) and informal settings.
Video Recording of 2019 Speakers
Our previous symposia have featured posters, presenters and attendees from animal science, biology, biochemistry, computer science, ecology, education, electrical engineering, health sciences, mathematics, and political science. We have hosted speakers from industry and other national groups including National Ecology Observatory Network (NEON) and Defense Advanced Research Projects Agency (DARPA). Our goal is to reach out and include a population that is equally diverse. To learn more, visit: 2015, 2016, 2017, and 2018 Symposiums.