I'm a PhD student in the David R. Cheriton School of Computer Science at the University of Waterloo. I work with Edith Law, and am a member of the Human-Computer Interaction Lab. Before joining Waterloo, I was a research scientist in the Faculty of Classics at the University of Oxford, and spent two summers as a research intern in Oak Ridge National Laboratory's Biomedical Sciences and Engineering division.
My PhD work focuses on the intersection of curiosity, citizen science, and artificial intelligence. My research consolidates questions from each of these areas to better understand how we can get volunteers and crowd-workers more excited about and involved in participating in crowd-scale scientific research.
As a research scientist, I helped design and build a variety of computational tools for studying ancient manuscripts. These tools include Ancient Lives,
Proteus, and Greek-BLAST.
Curiosity, Citizen Science, and Artificial Agents
My research focuses on the intersection of curiosity, citizen science, and artificial agents. The theme of my research centers around how curiosity can be leveraged and exploited to make learners (i.e. volunteers or crowd-workers) more excited and engaged with scientific crowdsourcing tasks.
Experimental Human-in-the-Loop Systems
My research centers around building systems that integrate human input as a computational units. Building hyrbid human-machine systems that perform complex tasks better than either human or machine could independently is non-trivial. This area of work is exploratory and open-ended.
CrowdCurio: an online crowdsourcing platform to facilitate climate change studies using herbarium specimens
Phenology is a key aspect of plant success. Recent research has demonstrated that herbarium specimens can provide important information on plant phenology. Massive digitization efforts have the potential to greatly expand herbarium-based phenological research, but also pose a serious challenge regarding efficient data collection. Here, we introduce CrowdCurio, a crowdsourcing tool for the collection of phenological data from herbarium specimens. We test its utility by having workers collect phenological data (number of flower buds, open flowers and fruits) from specimens of two common New England (USA) species - Chelidonium majus and Vaccinium angustifolium. We assess the reliability of using nonexpert workers (i.e. Amazon Mechanical Turk) against expert workers. We also use these data to estimate the phenological sensitivity to temperature for both species across multiple phenophases. We found no difference in the data quality of nonexperts and experts. Nonexperts, however, were a more efficient way of collecting more data at lower cost. We also found that phenological sensitivity varied across both species and phenophases. Our study demonstrates the utility of CrowdCurio as a crowdsourcing tool for the collection of phenological data from herbarium specimens. Furthermore, our results highlight the insight gained from collecting large amounts of phenological data to estimate multiple phenophases.
C. Willis, E. Law, A. Williams, B. Franzone, R. Bernardos, L. Bruno, C. Hopkins, C. Schorn, E. Weber, D. Park and C. Davis. (2017). New Phytologist.
Crowdsourcing as a Tool for Research: Implications of Uncertainty
Numerous crowdsourcing platforms are now available to support research as well as commercial goals. However, crowdsourcing is not yet widely adopted by researchers for generating, processing or analyzing research data. This study develops a deeper understanding of the circumstances under which crowdsourcing is a useful, feasible or desirable tool for research, as well as the factors that may influence researchers’ decisions around adopting crowdsourcing technology. We conducted semi-structured interviews with 18 researchers in diverse disciplines, spanning the humanities and sciences, to illuminate how research norms and practitioners’ dispositions were related to uncertainties around research processes, data, knowledge, delegation and quality. The paper concludes with a discussion of the design implications for future crowdsourcing systems to support research.
E. Law, A. Wiggins, K. Gajos, M. Gray, and A. Williams. (2017). Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW '17, pages 1544-1561, Portland, Oregon, USA. ACM.
Ensemble: A Hybrid Human-Machine System for Generating Melody Scores from Audio
Music transcription is a highly complex task that is difficult for automated algorithms, and equally challenging to people, even those with many years of musical training. Furthermore, there is a shortage of high-quality datasets for training automated transcription algorithms. In this research, we explore a semi-automated, crowdsourced approach to generate music transcriptions, by first running an automatic melody transcription algorithm on a (polyphonic) song to produce a series of discrete notes representing the melody, and then soliciting the crowd to correct this melody. We present a novel web-based interface that enables the crowd to correct transcriptions, report results from an experiment to understand the capabilities of non-experts to perform this challenging task, and characterize the characteristics and actions of workers and how they correlate with transcription performance.
T. Tse, J. Salamon, A. Williams, H. Jiang and E. Law. (2016). Proceedings of the 17th International Society for Music Information Retrieval Conference.
Proteus: A Platform for Born Digital Critical Editions of Literary and Subliterary Papyri
Today, digital libraries, or digital repositories, can be found in nearly any discipline studying collections of text, manuscripts, or other variations of literature. However, many of these digital libraries operate on a traditional model that fails to lend itself to the user outside of fundamental operations (i.e., Searching). Proteus, still a work-in-progress, leverages modern computing methods and techniques to aid the modern papyrologist in the study and analysis of papyrus fragments.
Williams, A.C., Santarsiero, A., Meccariello, C., Verhasselt, G., Wallin, J.F., Carroll, H.D., & Brusuelas, J.H. (2015). Proteus: A Platform for Born Digital Critical Editions of Literary and Subliterary Papyri. Proceedings of the Digital Heritage 2015.
Computationally Accelerated Papyrology
This thesis presents two computational approaches for accelerating papyrus transcription and identification. The first approach is a computational pipeline that aggregates millions of crowdsourced letter classifications into transcriptions of papyrus fragments. The second approach leverages genetic sequence alignment algorithms to rapidly identify damaged papyrus fragments to known papyrus manuscripts. These approaches greatly improve upon the current state-of-the-art techniques and set a new standard for leveraging computation to the transcription and identification of ancient texts.
Williams, A.C. (2015). Computationally Accelerated Papyrology.
A Computational Pipeline for Crowdsourced Transcriptions of Ancient Greek Papyrus Fragments
To date, over 7 million letter identifications from users across the world have been recorded in the Ancient Lives database. In this paper, we present a computational pipeline for converting crowdsourced letter identifications made through the Ancient Lives interface into digital consensus transcriptions of papyrus fragments. We conclude by explaining the usefulness of the pipeline output in the context of additional computational projects that aim to further accelerate the identification process.
Williams, A.C., Wallin, J.F., Yu. H, Carroll, H.D., Lamblin., A-F., Fortson, L., Obbink, D., Lintott, C.J. & Brusuelas, J.H. (2014). A Computational Pipeline For Crowdsourced Transcriptions of Ancient Greek Papyrus Fragments. Proceedings of the 2nd Workshop on Big Humanities Data.
Improving Retrieval Efficacy of Homology Searches using the False Discovery Rate
In genetic sequence alignment, comparative accuracy of retrieval algorithms (e.g., BLAST) has been rigorously studied for improvement. Unlike most components of retrieval algorithms, the E-value threshold criterion has yet to be thoroughly investigated. An investigation of the threshold is important as it exclusively dictates which sequences are declared relevant and irrelevant. In this paper, we introduce the false discovery rate (FDR) statistic as a replacement for the uniform threshold criterion in order to improve efficacy in retrieval systems
Carroll, H.D., Williams, A.C., Davis, A.G., & Spouge, J.L. (2014). Improving Retrieval Efficacy Using the False Discovery Rate. Transactions on Computational Biology and Bioinformatics. Transactions on Computational Biology and Bioinformatics, 2014.
Identification of Ancient Greek Papyrus Fragments Using Genetic Sequence Alignment Algorithms
A key task performed by papyrologists is determining if an unknown fragment belongs to a literary manuscript. In this paper, we introduce a novel methodology that uses modern genetic sequence alignment algorithms as a method for identifying Ancient Greek text fragments. This application will offer papyrologists and other professionals in the humanities the ability to rapidly identify severely damaged texts. This approach leverages a new form of non-contextual, multi-line text identification for the Greek language that can greatly accelerate the tedious task of transcription and identification
Williams, A.C., Carroll, H.D., Wallin, J.F., Brusuelas, J., Fortson, L., Lamblin., A-F., & Yu, H. (2014). Identification of Ancient Greek Papyrus Fragments Using Genetic Sequence Alignment Algorithms. Proceedings of the 1st Workshop on Digital Humanities and e-Science.
False Discovery Rate for Homology Searches
While many different aspects of retrieval algorithms (e.g., BLAST) have been studied in depth, the method for determining the retrieval threshold has not enjoyed the same attention. Furthermore, with genetic databases growing rapidly, the challenges of multiple testing are escalating. In order to improve search sensitivity, we propose the use of the false discovery rate (FDR) as the method to control the number of irrelevant (“false positive”) sequences. In this paper, we introduce BLAST-FDR, an extended version of BLAST that uses a FDR method for the threshold criterion.
Carroll, H D., Williams, A.C., Davis, A.G., & Spouge, J.L. (2013). False Discovery Rate for Homology Searches. Advances in Bioinformatics and Computational Biology, 194-201.
Automated assessment of bilateral breast volume asymmetry as a breast cancer biomarker during mammographic screening.
The biological concept of bilateral symmetry as a marker of developmental stability and good health is well established. The study suggests that automated assessment of global bilateral asymmetry could serve as a breast cancer risk biomarker for women undergoing mammographic screening. Such biomarker could be used to alert radiologists or computer-assisted detection (CAD) systems to exercise increased vigilance if higher than normal cancer risk is suspected.
Williams, A.C., Hitt, A., Voisin, S., & Tourassi, G. (2013, March). Automated assessment of bilateral breast volume asymmetry as a breast cancer biomarker during mammographic screening. In SPIE Medical Imaging (pp. 86701A-86701A). International Society for Optics and Photonics.
University of Waterloo
This course provides an introduction to contemporary user interfaces, including the basics of human-computer interaction, the user interface design/evaluation process, and the architectures within which user interfaces are developed. Students implement and evaluate portions of typical user interfaces in a series of programming assignments
While at Waterloo, I've also had the opportunity to serve as an Instructional Apprentice or a Teaching Assistant for the following courses:
- CS349: User Interfaces
- CS330: Management Information Systems
Middle Tennessee State Univeristy
CS3130: Introduction to Computer Architecture (Fall 2013, Spring 2014)
This course provides an introduction to assembly language programming, hardware components of digital computers, microprogramming, and memory management. The course is accompanied by a mandatory laboratory
/ tutorial with exercises that include assembly language programing and the design and implementation of computer architecture components, ranging from simple logic gates to complex ALUs.
CS1150: Computer Orientation (Spring 2015)
This course provides a general introduction to computers with an emphasis on personal computing, database, word processing, presentation graphics, spreadsheets, and Internet tools. Students are introduced and exposed to the Microsoft Office product suite.
Digital Humanites Summer Institute
Crowdsourcing as a Tool for Research (Summer 2016, Summer 2017
This short workshop provides participants with a practical, concise introduction to crowdsourcing as a tool for research and public engagement, through a series of discussions and tutorials. The workshop is designed to educate and train scholars who have no formal training in technology. In the final stage of the workshop, participants build their own crowdsourcing projects on the CrowdCurio
In my free time, I play classical guitar. A few of my favorite musicians that draw inspiration from
the classical guitar literature are Django Reinhardt,
When I lived in England, I became a Baron of the principality of Sealand.