ºÚÁϳԹÏÍø±¬ÍøÕ¾

Past Honors Projects

Below is a sampling of past honors projects in computer science. A print version of each honors project is deposited in the ºÚÁϳԹÏÍø±¬ÍøÕ¾ College Library's Department of Special Collections and Archives.

2024

Selective Procedural Content Generation Using Multi-Discriminator Generative Adversarial NetworksDarien Gillespie

Selective Procedural Content Generation Using Multi-Discriminator Generative Adversarial Networks

Advisor: David Byrd

This work expands upon the usage of machine learning for procedural content generation in video game worlds. We base our work on World-GAN, which utilizes Generative Adversarial Network (GAN) architectures in conjunction with vector embedding representations of material types to generate diverse and realistic terrains in Minecraft. Our work builds upon these singular terrain chunks representing one biome (forest, desert, plains, ocean, etc.) to generate continuous, transitional world chunks with multiple biomes. In order to achieve this, we propose using multiple discriminators selectively across a generated sample to allow the model to simultaneously learn different generative styles. We do this by applying an alpha layer mask to teach the generative model how to blend and divide between these multiple styles. We hope to expand this work in the future by allowing for blending between more than two biomes and expanding the application of these selective, multi-discriminator GANs into other fields, such as natural language processing.

Statistically Principled Deep Learning for SAR Image SegmentationCassandra Goldberg

Statistically Principled Deep Learning for SAR Image Segmentation

Advisor: Jeova Farias

This project explores novel approaches for Synthetic Aperture Radar (SAR)image segmentation that integrate established statistical properties of SAR into deep learning models. First, Perlin Noise and G0 sampling methods were utilized to generate a synthetic dataset that effectively captures the statistical attributes of SAR data. Subsequently, deep learning segmentation architectures were developed that utilize average pooling and 1x1 convolutions to perform statistical moment computations. Finally, supervised and unsupervised disparity-based losses were incorporated into model training. The experimental outcomes yielded promising results: the synthetic dataset effectively trained deep learning models for real SAR data segmentation, the statistically-informed architectures demonstrated comparable or superior performance to benchmark models, and the unsupervised disparity-based loss facilitated the delineation of regions within the SAR data. These findings indicate that employing statistically-informed deep learning methodologies could enhance SAR image analysis, with broader implications for various remote sensing applications and the general field of computer vision.

Distance Based Pre-Clustering for Deep Time-Series Forecasting: A Data Selection ApproachLeopold Spieler

Distance Based Pre-Clustering for Deep Time-Series Forecasting: A Data Selection Approach

Advisor: David Byrd

High-frequency financial time series forecasting is only lightly explored in academic literature. Challenges arise from the nature of the data, which is noisy, voluminous, time-dependent, and sequential. This paper proposes a clustering framework for such data utilizing data partitioning for deep learning model training. We perform a comparative analysis using multiple distance-based clustering methods and time series-specific distance metrics to select training data for recurrent neural forecasting models. Evaluating our approach over a three year period for three large-cap technology stocks, we find that models trained on the partitioned data achieve lower loss values and increased directional prediction accuracy compared to equivalent models trained without partitioning.


2023

Evan Albers

Agent-Based Modeling of Asset Markets: A Study of Risks, Preferences, and Shocks

Advisors: Mohammad Irfan and Matthew J. Botsch

Artificial neural networks and other predictive models have grown to dominate computational finance. One recent application of such networks is the generation of synthetic market data for machine learning models. Unfortunately, such generative models have limited application when it comes to studying interventions and explaining behavioral outcomes. In this thesis, I study agent-based modeling – an established technique with a significant history in computational finance – to create market simulations that have analytic validity and can also provide insights into the functioning of asset markets. In particular, I evaluate the model with respect to several traditional financial economic theories like Tobin’s separation theorem and capital asset pricing model (CAPM). I also investigate the emergence of different roles played by the agents due to their risk preferences. Furthermore, I perform intervention studies like shocks and explain the outcomes using my model. Finally, I study the effects of noise trading and show that noisy agents converge to a different equilibrium point due to the difference in beliefs. Put together, this thesis presents an agent-based model of asset markets that can be used to study the effects of risks, preferences, and shocks at a systemic level, thereby connecting localized agent and asset characteristics to global or collective outcomes.

A Quad-tree Based, Multi-Resolution Algorithm for Computing Viewsheds on Grid TerrainsLily Smith

A Quad-tree Based, Multi-Resolution Algorithm for Computing Viewsheds on Grid Terrains

Advisor: Laura Toma

Digital elevation models (DEMs) at high spatial resolution and high vertical accuracy are available from LiDAR technology. LiDAR DEMs can easily reach billions of points and their use in modeling requires efficient algorithms that work well in practice. A common use of DEMs is the computation of visibility: given an arbitrary viewpoint v and a grid terrain, the viewshed of v is the set of all points on the terrain that are visible from v. This thesis describes QuadVS --- an output-sensitive algorithm which aims to compute viewsheds more efficiently by utilizing a multi-resolution technique. The idea is to divide the grid into layers of blocks around the viewpoint using a quadtree subdivision. The intuition behind this multi-resolution approach is that a lower resolution will often be sufficient to filter out blocks that are guaranteed to be invisible, and using high resolution only when necessary will lead to improved efficiency. QuadVS is backed by an elegant theoretical approach, is guaranteed to preserve full accuracy, and in our experiments we found it to be over 10 times faster compared to the previous fastest algorithm.

A Machine Learning Approach to Sector-Based Market EfficiencyAngus Zuklie

A Machine Learning Approach to Sector-Based Market Efficiency

Advisor: David Byrd

In economic circles, there is an idea that the increasing prevalence of algorithmic trad- ing is improving the information efficiency of electronic stock markets. This project sought to test the above theory computationally. If an algorithm can accurately forecast near-term equity prices using historical data, there must be predictive information present in the data. Changes in the predictive accuracy of such algorithms should correlate with increasing or decreasing market efficiency.

By using advanced machine learning approaches, including dense neural networks, Long Short Term Memory (LSTM), and Convolutional Neural Network (CNN) models, I modified intra day predictive precision to act as a proxy for market efficiency. Allowing for the basic comparisons of the weak form efficiency of four sectors over the same time period: utilities, healthcare, technology and energy. Finally, Within these sectors, I was able to detect inefficiencies in the stock market up to four years closer to modern day than previous studies.


2022

Oulier Detection in Energy Datasets graphStephen Crawford

Outlier Detection in Energy Datasets

Advisor: Sean Barker 

In the past decade, numerous datasets have been released with the explicit goal of furthering non-intrusive load monitoring research (NILM). NILM is an energy measurement strategy that seeks to disaggregate building-scale loads. Disaggregation attempts to turn the energy consumption of a building into its constituent appliances. NILM algorithms require representative real-world measurements which has led institutions to publish and share their own datasets. NILM algorithms are designed, trained, and tested using the data presented in a small number of these NILM datasets. Many of the datasets contain arbitrarily selected devices. Likewise, the datasets themselves report aggregate load information from building(s) which are similarly selected arbitrarily. This raises the question of the representativeness of the datasets themselves as well as the algorithms based on their reports. One way to judge the representativeness of NILM datasets is to look for the presence of outliers in these datasets. This paper presents a novel method of identifying outlier devices from NILM datasets. With this identification process, it becomes possible to mitigate and measure the impact of outliers. This represents an important consideration to the long-term deployment of NILM algorithms.

2021

Jack Beckitt-MarshallAverage energy consumption with each combination of compiler flags

Improving Energy Efficiency through Compiler Optimizations

Advisor: Sean Barker

Energy efficiency is becoming increasingly important for computation, especially in the context of the current climate crisis. The aim of this experiment was to see if the compiler could reduce energy usage without rewriting programs themselves. The experimental setup consisted of compiling programs using the Clang compiler using a set of compiler flags, and then measuring energy usage and execution time on an AMD Ryzen processor. Three experiments were performed: a random exploration of compiler flags, utilization of SIMD, as well as bench- marking real world applications. It was found that the compiler was able to reduce execution time, especially when optimising for the specific architecture, to a degree that depends on the program being compiled. Faster execution time tended to correlate with reduced energy usage as well, further suggesting that optimizing programs for speed and architecture is the most effective way of decreasing their overall energy usage.

Kim Hancock

Cascades and Overexposure in Networks

Advisor: Mohammad Irfan

No description available.

Liam R. JuskeviceCongressional database schema

The Congressional Database: Designing a Web Application Using an HCI Approach

Advisor: Mohammad Irfan

The activities of the United States Senate are a topic of interest for researchers and concerned members of the public alike. Websites such as GovTrack and Congress.gov allow people to research specific bills among many other offerings. However, they have significant weaknesses regarding their ease of use and the way they organize and store data. The Congressional Database Project aims to provide an intuitive user experience navigating government data while storing the data in a consistent database. We approach this project from an HCI perspective in order to determine the best ways to improve the user experience. We have conducted a qualitative user study to test the effectiveness of our design and identify potential areas of improvement. This paper provides an in-depth overview of the design of the Congressional Database on the front end and back end. It then explains the methodology of our user study and discusses the implications of its findings.

Yuto Yagi

A Comparative Study of Equilibria Computation in Graphical Polymatrix Games

Advisor: Mohammad Irfan

Graphical games are often used to model human networks such as social networks. In this project, we focus on one specific type of graphical games known as polymatrix games. In a polymatrix game, a player's payoff can be additively decomposed into payoffs coming from its network neighbors. It is well known that computing Nash equilibria, which is the main solution concept in game theory, in polymatrix games is a provably hard problem. Due to this, we focus on special graph structures like paths and trees. We compare several equilibrium computation algorithms at an implementation level. Two main algorithms compared are a fully polynomial-time approximation scheme (FPTAS) algorithm by Ortiz and Irfan [2017] and another algorithm by Kearns et al. [2001]. We evaluate the applicability of these algorithms based on the size of the network and the accuracy level desired.


2020

Dylan Hayton-RuffnerRank vs. length and key word

Word Embedding Driven Concept Detection in Philosophical Corpora

Advisor: Fernando Nascimento

During the course of research, scholars often explore large textual databases for segments of text relevant to their conceptual analyses. This study proposes, develops and evaluates two algorithms for automated concept de- tection in theoretical corpora: ACS and WMD Retrieval. Both novel algorithms are compared to key word search, using a test set from the Digital Ricoeur corpus tagged by scholarly experts. WMD Retrieval outperforms key word search on the concept detection task. Thus, WMD Retrieval is a promising tool for concept detection and information retrieval systems focused on theoretical corpora.

Dani Paul HoveMap of testing environment

Virtual Reality Accessibility with Predictive Trails

Advisor: Sarah Harmon

Comfortable locomotion in VR is an evolving problem. Given the high probability of vestibular-visual disconnect, and subsequent simulator sickness, new users face an uphill battle in adjusting to the technology. While natural locomotion offers the least chance of simulator sickness, the space, economic and accessibility barriers to it limit its effectiveness for a wider audience. Software-enabled locomotion circumvents much of these barriers, but has the greatest need for simulator sickness mitigation. This is especially true for standing VR experiences, where sex-biased differences in mitigation effectiveness are amplified (postural instability due to vection disproportionately affects women).

Predictive trails were developed as a shareable Unity module in order to combat some of the gaps in current mitigation methods. Predictive trails use navigation meshes and path finding to plot the user’s available path according to their direction of vection. Some of the more prominent software methods each face distinct problems. Vignetting, while largely effective, restricts user field-of-vision (FoV), which in prolonged scenarios, has been shown to disproportionately lower women’s navigational ability. Virtual noses, while effective without introducing FoV restrictions, requires commercial licensing for use.

Early testing of predictive trails proved effective on the principal investigator, but a wider user study - while approved - was unable to be carried out due to circumstances of the global health crisis. While the user study was planned around a seated experience, further study is required into the respective sex-biased effect on a standing VR experience. Additional investigation into performance is also required.

Luca Ostertag-Hill

Ideal Point Models with Social Interactions Applied to Spheres of Legislation

Advisor: Mohammad Irfan

We apply a recent game-theoretic model of joint action prediction to the congressional setting. The model, known as the ideal point model with social interactions, has been shown to be effective in modeling the strategic interactions of Senators. In this project, we apply the ideal point models with social interactions to different spheres of legislation. We first use a machine learning algorithm to learn ideal point models with social interactions for individual spheres using congressional roll call data and subject codes of bills. After that, for a given polarity value of a bill, we compute the set of Nash equilibria. We use the set of Nash equilibria predictions to compute a set of most influential senators. Our analysis is based on these three components--the learned models, the sets of Nash equilibria, and the sets of most influential senators. We systematically study how the ideal points of senators change based on the spheres of legislation. We also study how most influential senators, that is a group of senators that can influence others to achieve a desirable outcome, change depending on the polarity of the desirable outcome as well as the spheres of legislation. Furthermore, we take a closer look at the intra-party and inter-party interactions for different spheres of legislation and how these interactions change depending on whether or not we model the contextual parameters. Finally, we show how probabilistic graphical models can be used to extend the computational framework.


2019

Kevin Fakai Chen3-dimensional rendering of Ackley with n=2

GEM-PSO: Particle Swarm Optimization Guided by Enhanced Memory

Advisor: Stephen Majercik

Particle Swarm Optimization (PSO) is a widely-used nature-inspired optimization technique in which a swarm of virtual particles work together with limited communication to find a global minimum or optimum. PSO has been successfully applied to a wide variety of practical problems, such as optimization in engineering fields, hybridization with other nature-inspired algorithms, or even general optimization problems. However, PSO suffers from a phenomenon known as premature convergence, in which the algorithm's particles all converge on a local optimum instead of the global optimum, and cannot improve their solution any further. We seek to improve upon the standard Particle Swarm PSO algorithm by fixing this premature convergence behavior. We do so by storing and exploiting increased information in the form of past bests, which we deem enhanced memory. We introduce three types of modifications to each new algorithm (which we call a GEM-PSO: Particle Swarm Optimization, Guided by Enhanced Memory, because our modifications all deal with enhancing the memory of each particle). These are procedures for saving a found best, for removing a best from memory when a new one is to be added, and for selecting one (or more) bests to be used from those saved in memory. By using different combinations of these modifications, we can create many different variants of GEM-PSO that have a wide variety of behaviors and qualities. We analyze the performance of GEM-PSO, discuss the impact of PSO's parameters on the algorithms' performances, isolate different modifications in order to closely study their impact on the performance of any given GEM-PSO variant, and finally look at how multiple modifications perform. Finally, we draw conclusions about the efficacy and potential of GEM-PSO variants, and provide ideas for further exploration in this area of study. Many GEM-PSO variants are able to consistently outperform standard PSO on specific functions, and GEM-PSO variants can be shown to be promising, with both general and specific use cases.

James I. LittleDigits composited on top of brick textures or house facades

Teaching Computers to Teach Themselves: Synthesizing Training Data based on Human-Perceived Elements

Advisor: Eric Chown

solation-Based Scene Generation (IBSG) is a process for creating synthetic datasets made to train machine learning detectors and classifiers. In this project, we formalize the IBSG process and describe the scenarios—object detection and object classification given audio or image input—in which it can be useful. We then look at the Stanford Street View House Number (SVHN) dataset and build several different IBSG training datasets based on existing SVHN data. We try to improve the compositing algorithm used to build the IBSG dataset so that models trained with synthetic data perform as well as models trained with the original SVHN training dataset. We find that the SVHN datasets that perform best are composited from isolations extracted from existing training data, leading us to suggest that IBSG be used in situations where a researcher wants to train a model with only a small amount of real, unlabeled training data.

Archives