Current Projects | College of Information Sciences and Technology
Close Open

Please Update Your Browser.

It is recommended that you update your browser to the latest version to view the website's full experience.

Dismiss

Current Projects

You are here

Faculty Researcher: Dinghao Wu
Sponsoring Agency: National Science Foundation

A major obstacle in binary code based retrofitting is the immaturity of the reverse engineering tools. Current approaches, mostly binary code patching based, to retrofit legacy software systems have a number of drawbacks including performance overhead and security issues. To the best of our knowledge, there are no binary reverse engineering tools that can disassemble a binary executable into assembly code which can be reassembled back in a fully automated manner. This limitation has severely restricted the application of reverse engineering techniques in legacy software retrofitting. Further, the analysis and transformation tools and ecosystems are disconnected and fragmented. Connecting the dots between the tools, infrastructures, and ecosystems will have great impact on software analysis and retrofitting.

To fill in the gap, we are considering a radically different approach by placing the recompilability as the first and topmost goal. We will further develop our preliminary study on Reassembleable Disassembling, with the similar design goal to preserve the recompilability while lifting the code to higher level languages or intermediate representations. The proposed reverse engineering technology can help augment legacy software systems with modern security mechanisms, allowing us to address a problem space that was previously intractable.

Faculty Researcher: Dongwon Lee
Sponsoring Agency: Samsung Advanced Institute of Technology

The Social User Mining (SUM) project aims to develop novel algorithmic solutions, working prototypes, and innovative applications for mining "publicly-available" social media data from user profiles. (No illegally crawled or scraped data violating users' privacy or terms-of-service of social network sites will be used.) We aim to mine and automatically discover interesting information about social users including demographics, profile, temporal pattern, and spatial pattern, which bear important and practical implications in real settings.

In the SUM project, we will work to understand the following questions: (1) What is the technical landscape and solution space of the problem in general? (2) How to create good quality ground truth data set to various profile information? (3) Which social media data is the most useful one to discover user profile information? (4) How can we combine different social media data to improve the performance of overall solutions? (5) What is the holistic framework? (6) What are the effective and scalable data mining solutions to mine such social user information?

Faculty Researcher: Zhenhui (Jessie) Li
Sponsoring Agency: University of Illinois

Research suggests that the environment influences lifestyle behaviors and contributes to racial and socioeconomic inequities in health. Overall, however, results are inconsistent and effect sizes are small. One potential explanation is measurement error in environmental exposures. Despite that many individuals spend considerable time outside their immediate neighborhood, most research has solely measured environmental exposures in a respondent’s residential space and failed to take into account daily mobility.

This research will examine the joint spatial and temporal stability of activity spaces (AS) derived from GPS data in a racially, ethnically, and socioeconomically diverse sample of adults. Specifically, we aim to (1) determine the optimal number and combination of days of GPS tracking needed to represent an individual’s AS by testing the temporal stability within a person across days, types of days, and between weeks; (2) Determine the sufficiency of using only one time period for GPS tracking by examining the temporal stability of an individual’s AS within and between seasons; and (3) compare participant AS as derived from GPS tracking with those derived from questionnaires, with comparisons based on AS overlap, size, and environmental attributes.


Research Tags: Big Data
Faculty Researcher: Peng Liu
Sponsoring Agency: George Mason University

Today’s cyberdefenses are largely static. They are governed by slow deliberative processes involving testing, security patch deployment, and human-in-the-loop monitoring. As a result, adversaries can systematically probe target networks, pre-plan their attacks, and ultimately persist for long times inside compromised networks and hosts. A new class of technologies, called Adaptive Cyber Defense (ACD), is being developed that presents adversaries with optimally changing attack surfaces and system configurations, forcing adversaries to continually re-assess and re-plan their cyberoperations. Although these approaches (e.g., moving target defense, dynamic diversity, and bio-inspired defense) are promising, they assume stationary and stochastic, but non-adversarial, environments. This research aims to build the scientific foundations so that system resiliency and robustness in adversarial settings can be thoroughly defined, quantified, measured, and extrapolated in a rigorous and reliable manner.

Faculty Researcher: David Reitter
Sponsoring Agency: National Science Foundation

Language use in real-world dialogue happens in context; linguistic choices depend on previous ones. For example, chosen words and sentence structures tend to mirror what was used previously by a conversation partner, a process known as "alignment." Alignment appears to help people understand each other in dialogue, and it seems to extend to human-computer interfaces, too. The concrete functions of alignment in dialogue, however, are unclear.

The project will devise computational models that describe and quantify alignment and language change in natural-language dialogue. With these, one can detect them in actual language use, such as in web-forums. The computational models will explain and predict processes in a way that makes them exploitable in modern social networks as well as for data science. The outcomes of the project may point to novel methods of prioritizing and filtering the most helpful content and can address quality of life and well-being of patients such as those of the peer-support community whose conversations were studied in the investigator's work motivating the proposal.

Faculty Researcher: Vasant Honavar
Sponsoring Agency: Concurrent Technologies Corporation

The principle investigator of this project will provide subject matter expertise in artificial intelligence and thought leadership as related to development of the Analyst Virtual Assistant (AVA) and Cognitive Agents capabilities for the Boosting Innovation GEOINT Topic 2: Artificial Intelligence Automation Broad Agency Announcement. The project will provide research support in artificial intelligence, machine learning and intelligent agents, including, but not limited to: effective methods for classifying actors in heterogeneous and dynamic social networks (node labeling), and link prediction.

Faculty Researcher: Andrea Tapia, Anna Squicciarini
Sponsoring Agency: National Science Foundation

The goal of this project is to develop means to improve information quality and use in emergency response, increasing the value of using messaging and microblogged data from crowds of non-professional participants during disasters. Despite the evidence of strong value to those experiencing the disaster and those seeking information concerning the disaster, there has been very little effort in detecting the relevance and veracity of messages in social media streams.  The problem of data verification is one of the largest problems confronting emergency-response organizations contemplating using social media data. This research directly addresses this known problem by methods to measure relevant and verifiable information. The results of this research will have a direct pipeline to organizations involved in emergency response.  Therefore the research has the potential to help organizations, which respond to emergencies, make use of large amounts of citizen-produced data, which in turn may improve the speed, quality, and efficiency of emergency response leading to better support to those who need them, and more lives saved.

This research will contribute to the field of Emergency and Disaster Studies by mapping the key decisions made during an emergency response, the information needs, type, form and flow during those decision points, and most importantly, assessing data quality and verifiable standards for each. It will also investigate relevant and verifiable identifiers (or features), provide weights, incorporate these into an analytical framework, and use the results of the analysis as input to scalable computational models. The work will design algorithms that can estimate the relevance and veracity of messages in a high-volume streaming text comprised of short messages. Given the diverse backgrounds of the team, it will contribute to the use and development of socio-technical systems theory to analyze the integration of technical and social systems. The output of the models will match the organizational needs of responding organizations.

Faculty Researcher: C. Lee Giles, Prasenjit Mitra
Sponsoring Agency: National Science Foundation

This project draws together a diverse interdisciplinary team of researchers to create a new training program in Social Data Analytics, aimed at producing a new type of scientist capable of meeting emerging big data challenges.

In response to massive new sources of data, "data science" and "analytics" are emerging as new fields of inquiry, merging statistics, computer science, and visualization. The greatest challenges and opportunities arise from socially-generated big data, observed as a result of human interactions that are increasingly recorded via web, mobile device, and distributed sensors, or revealed through the digitization of historical records. Society faces a transformative data deluge, from which new scientific, economic, and social value can be extracted. This project includes a new curriculum, training in advanced technologies of data science and analytics, a series of research rotations in both academic and non-academic settings, and a challenge mechanism under which interdisciplinary teams compete to innovate solutions to real social data analytics problems. Further, this project expands the participation of underrepresented groups in data science, by combining an exciting new field with a focus on diversity as a research theme.

Faculty Researcher: Dinghao Wu
Sponsoring Agency: National Science Foundation and the Office of Naval Research

Reverse engineering has many important applications in computer security, one of which is retrofitting software for safety and security hardening when source code is not available. However, no existing tool is able to disassemble executable binaries into assembly code that can be correctly reassembled in a fully automated manner. People have tried to overcome it by patching or duplicating new code sections for retrofitting of executables, which is not only inefficient but also cumbersome and restrictive on what retrofitting techniques can be applied to. Our research is working toward a tool that can disassemble executables to the extent that the generated code can be assembled back to working binaries without manual effort.


Research Tags: Network Science, Security and Privacy, Reverse Engineering, Binary Code Analysis, Software Retrofitting, Software Analysis, Software Security, Program Analysis
Faculty Researcher: Vasant Honavar
Sponsoring Agency: National Science Foundation

Artificial neural networks, because of their potential for massive parallelism and fault and noise tolerance, offer an attractive approach to the design of associative memories, language processors, and trainable pattern classifiers. Evolutionary algorithms offer a powerful means of exploring large search spaces for solutions that optimize multiple. Against this background, we explored several closely related topics in biologically inspired algorithms and architectures for knowledge representation and language processing.


Research Tags: Artificial Intelligence, Big Data, Health Informatics and Bioinformatics
Faculty Researcher: John Yen, Peng Liu, Vasant Honavar
Sponsoring Agency: College of Information Sciences and Technology

Cyberanalysts perform complex tasks in analyzing data regarding potential cyberattacks; however, our understanding of their fine-grained cognitive process and how these processes evolve through training are rather limited. The goals of this research project are twofold. First, we aim to design cognitive tasks that not only reflect some of the complexity of real-world intrusion detection tasks but also are suitable for fMRI studies. Second, we aim to apply brain network analysis and machine learning methods to analyze brain images data of cyberanalysts from fMRI and EEG for predictive modeling regarding the performance of intrusion detection by cyberanalysts. The result of this research can provide an initial evidence of larger-scale brain studies for improving our understanding about the cognitive processes of complex decision making tasks.


Research Tags: Neural Science, Brain Network, Cybersecurity, Cognitive Science, Big Data, Machine Learning
Faculty Researcher: Dongwon Lee
Sponsoring Agency: National Science Foundation

The significance and importance of this project resides in the introduction of big data analytics into the education landscape. There is increasing demand for skilled personnel in big data industries, but existing big data curricula at the university level focus primarily on students with a strong computational background, ignoring a large segment of students who might otherwise pursue education and training in this vital area, but who will be faced with big data issues in the workplace.

This project aims to address the national demand for professionals with knowledge in big data and broadening the pool for a big data analytics workforce. Part of this effort will involve research as to whether the newly developed learning modules are more effective at increasing students' big data competencies, skills, and analysis. Specifically, this project aims to develop three innovative learning modules, which will be designed to (1) utilize both group-based and contextualized learning methods and (2) be applicable and accessible to students majoring in disciplines outside, but related to main-stream computer science.

Faculty Researcher: Dinghao Wu
Sponsoring Agency: National Science Foundation

Binary code analysis is very attractive from a security viewpoint. First, in many tasks such as malware analysis, software plagiarism detection, and vulnerability exploration, the source code of the program under examination is often absent, and the analysis has to be done on binary code initially. Second, even if the source code is available, binary analysis allows us to reason the real instructions executed on hardware and avoid the well-known “What You See Is Not What You Execute” problem. Third, some program behaviors such as cache access only exhibit in the low-level code.

Binary code analysis is faced with an increasing challenge caused by emerging, readily available code obfuscation techniques. Traditional signature-based malware detection is often problematic as it relies on file hashes and bye (or instruction) signatures which are not very resilient to obfuscation. This project tackles the challenge by proposing several advanced methods that combine techniques from behavior and semantics perspectives. The proposed methods leverage formal program semantics, symbolic execution, automated constraint solving, and algorithmic memorization of code semantics that form solid foundations with rigorous resilience properties to latest attacks.


Research Tags: Binary Diffing, Malware Analysis, Metamorphic Malware, Symbolic Execution, Weakest Pre-condition, Constraint Solving, Taint Analysis, Side-channel, Software Analysis, Software Security, Program Analysis
Faculty Researcher: Zhenhui (Jessie) Li
Sponsoring Agency: National Science Foundation

According to U.S. 2010 Census, about 80.7% of the U.S. population live in urban area. Urbanization has modernized people's lives but also generated many urban issues such as traffic congestion, air pollution, health, education, and life quality. In the meantime, with the rapid progress in sensing technologies and widely-used digital documentation, increasing amount of urban data are being accumulated in the digital form, including human traces, traffic, air quality, local events, vehicle collisions, noise reports, and many more. Many cities in the U.S. (e.g., New York City, Chicago, and Los Angeles) have joined the open data initiative and created websites to release the city data to the public. Such big data implies rich knowledge about a city and could empower us to address many critical urban challenges. This project develops novel data mining techniques to help people uncover the complicated correlations in the big urban data.


Research Tags: Big Data
Faculty Researcher: John M. Carroll
Sponsoring Agency: National Science Foundation

This is a collaborative project also involving researchers from Carnegie-Mellon University and Xerox Palo Alto Research Center; it expands a previous NSF award in which we developed and investigated the first mobile computing infrastructure for timebanking. Timebanking is an activist movement and altruistic service exchange system in which time is voluntarily exchanged for services among local community members. This project investigates adding computational support for context awareness to timebanking infrastructures, that is, helping people take into account their own location, navigational trajectories, preferences, and current plans as they engage in time-based exchanges. We also are investigating coproduction-based systems in which services are mutually, interdependently and often reciprocally produced, producing “free” social goods.

Faculty Researcher: Anna Squicciarini
Sponsoring Agency: Texas A&M University

Social media enables the rapid harnessing and amplification of user interest, captivating the attention of huge numbers of users. However, knowing that interest may quickly coalesce and then collectively focus around a particular phenomenon, new threats are emerging which have potentially far reaching consequences. These threats, also referred to as collective attention threats involve users’ opinion manipulations, fast spread of various forms of malware, or false information sharing, amplified via manipulation of collective attention. In comparison with tradition Internet and Computer threats (such as malware) – in which users are targeted by agents and lured into taking some actions – users themselves are unwitting accomplices to the spread, infection rate, and success of these collective attention threats.

This project aims to develop the framework, algorithms, and systems for detecting, analyzing, modeling, and defending against emergent collective attention threats in large-scale social systems by (1) creating new explanatory socio-behavioral models of the dynamics of collective attention in tandem with the inherent threats against collective attention; (2) operationalizing our descriptive data-drive models to provide predictive capabilities of emerging collective attention threats, rigorous stress-testing, parameter sensitivity analysis, and what-if analysis, all grounded in social-behavioral theories of collective attention; and (3) developing, deploying, testing, and refining a suite of threat awareness analytics to serve as a prototype for a collective attention early-warning system.

Faculty Researcher: Peter Forster
Sponsoring Agency:

This project focuses on engaging government entities and civil society through a series of tabletop exercises to build capacity in countering terrorism and extremism. Some direct expenses are covered by the US Department of Defense but there is no funding to cover personnel time.


Research Tags: Counterterrorism, Counter Violent Extremism, Foreign Terrorist Fighters, NATO, Returning Terrorist Fighters, Tabletop Exercises
Faculty Researcher: David Reitter
Sponsoring Agency: National Science Foundation

Within the human mind, there is something like a dictionary that tells people what words mean (semantics) and how words are combined to make grammatical sentences (syntax). How does the mind learn this dictionary from experience with a language? Computer simulations can help science better understand this learning process which can, in turn, help teach languages in the classroom and aid in the early detection of language deficits. Improving the ability of computers to simulate language learning processes can also lead to the development of better technology such as machine translation, web search, and virtual assistants.

This project considers how a better understanding of language learning can help us avoid common pitfalls of memory connected to the use of language through the design of a new model of human memory, the Hierarchical Holographic Model. This computational model helps explain certain aspects of how words and languages are learned, and will allow us to investigate the question of whether human memory has the ability to detect arbitrarily indirect associations between concepts. The researchers consider evidence that sensitivity to abstract relations between words improves the ability of the computer model to learn syntax, such as parts-of-speech, and to use words appropriately to construct grammatical sentences. This work will be assessed against human language data and competing computational models. The success of the computational model should provide evidence that (1) language acquisition depends on indirect associations, and (2) human memory must be able to form indirect associations to facilitate it.

Faculty Researcher: Peng Liu
Sponsoring Agency: National Science Foundation

Inherent vulnerabilities of information and communication technology systems to cyberattacks (e.g., malware) impose significant security risks to Cyber-Physical Systems (CPS), which is evidenced by a number of recent accidents. Noticeably, current distributed control of CPS is not really attack-resilient. Although provable resilience would significantly lift the trustworthiness of CPS, existing defenses are rather ad-hoc and mainly focus on attack detection. In addition, while network attacks have been extensively studied, resilient-to-malware distributed control has been rarely investigated.

This research aims to bridge the gap by investigating provably correct distributed attack-resilient control of CPS. The project will focus on a representative class of CPS, namely unmanned-vehicle-operator networks. Its four main research thrusts are (1) the development of a distributed attack-resilient control framework to ensure task completion of multiple vehicles despite network attacks and malware attacks, (2) the synthesis of novel distributed attack-resilient control algorithms to deal with network attacks, (3) the design of estimation algorithms to detect malware attacks on vehicles, and computationally efficient algorithms which allow clean vehicles to avoid the collision with the vehicles compromised by malware, and (4) the validation of the cost-effectiveness of the proposed distributed attack-resilient control framework via a principled systematic evaluation plan. The research findings profoundly impact CPS security of a variety of engineering disciplines beyond unmanned-vehicle-operator networks, including smart grid, smart buildings and intelligent transportation systems.

Faculty Researcher: Frank Ritter
Sponsoring Agency: Charles River Analytics, Inc.

This research will explore algorithms to help schedule new jobs, using artificial intelligence and industrial-organizational psychology.

Faculty Researcher: John Yen, Peng Liu
Sponsoring Agency: National Science Foundation

Cyberattacks, especially those involving Advanced Persistent Threats (APTs), have targeted organizations of all types. A key opportunity to counter large-scale cyberattacks is to initiate the establishment of a broad partnership regarding the ultimate goal of cross-organization protected sharing of relevant cybersecurity data for enhanced operation, workforce development, and research. The impacts of sharing cybersecurity data are immense, such as an institution's concern about potential risks involved in sharing such data.

This project will improve our understanding about these complex issues related to barriers for protected sharing of cybersecurity data through collaborative activities and a planned workshop. An improved understanding regarding these issues and their relationships in a holistic way provides a critical base on which possible best practices for agreements, frameworks, and cyberinfrastructures for sharing relevant cybersecurity data can be established. In addition, the workshop will also identify options and uncover their tradeoffs for addressing the complex issues for cross-organization sharing of cybersecurity data. It is likely that this tacit knowledge will enhance the formal knowledge regarding cybersecurity analysis, management, and tool development, especially for achieving cross-organization big data cyberattack awareness.

Faculty Researcher: Andrea Tapia
Sponsoring Agency: National Science Foundation

The problem of verifying data is one of the largest problems confronting emergency-response organizations contemplating using social media data (Tapia, Bajpai, Jansen, Yen, & Giles, 2011). Our research directly addresses this problem by designing both social and computational interventions that can estimate the relevance and veracity of citizen contributions in a way that better meets the needs of responders during a crisis. Through the support of this award, our research will gain further understanding of data analytics that support human processing of emergency management data, the types of information necessary to help community members understand their own resilience capacity, and participatory processes to improve citizen science projects. This research has potential to help organizations responding to emergencies make use of large amounts of citizen-produced data, which in turn may improve the speed, quality, and efficiency of emergency response, leading to better support to those who need them, and even, more lives saved.

This research proposes a method of quality control by introducing an iterative system of data curation that encourages citizen scientists to become stakeholders in the quality of the information that they create by utilizing human-centered data analytics. The system begins with indirect data aggregated from social media that is later filtered and enhanced with direct contributions from citizen scientists engaged in the participatory design of a community alert system. We will use the results to design social and technical infrastructure for community resilience operational picture (CROP) that would allow local community leaders to monitor the resilience of their community at various levels. This CROP prototype would draw in data from physical sensors, network sensors, and human sensors monitoring resilience indicators. The scientific approach to this work is measurement of the engagement of the community in its own crisis response by observing the flow of information and indicators of community resilience during and after a crisis. If successful, this project will take a significant step forward in our understanding of methods to measure relevant and verifiable information in a process that 1) utilizes community participation in enhanced event detection and pre-response to crises, and 2) supports emergency response organizations leverage new layers of spatial, temporal, and social infrastructure.

Faculty Researcher: James Wang, Zihan Zhou
Sponsoring Agency: College of Information Sciences and Technology

This project aims to develop 3-D image analysis methods for root structures. The work is a collaboration with Penn State’s Department of Plant Science.


Research Tags: 3-D Image Analysis, Plant Root
Faculty Researcher: Lynette (Kvasny) Yarger
Sponsoring Agency: College of Information Sciences and Technology

Creating a diverse IT workforce has been a challenge for many major technology companies, including Apple, Facebook, and Yahoo. At these Silicon Valley companies, only 2%-3% of the technology workforce is Black, while Blacks make up 12% of the US workforce. Technology firms often argue that the cause of this disparity is due to the severe shortage of Blacks with degrees in computing. However, US universities turn out Black computer science and computer engineering graduates at twice the rate that leading technology companies hire them. In addition, corporations, universities and US government agencies continue to invest in programs to build a diverse pipeline of technology talent, but the representation of Blacks in tech remains stagnant. The purpose of this study is to examine the role of implicit bias by employers in the hiring process of entry-level IT professionals.


Research Tags: IT Workforce, Diversity, Inclusion, Hiring, Implicit Bias
Faculty Researcher: Andrea Tapia
Sponsoring Agency: National Science Foundation

The project investigates the use of big-data analysis techniques for classifying crisis-related data in social media with respect to situational awareness categories, such as caution, advice, fatality, injury, support, with the goal of helping emergency response teams identify useful information. A major challenge is the scale of the data, where millions of short messages are continuously posted during a disaster, and need to be analyzed. The use of current technologies based on automated machine learning is limited due to the lack of labeled data for an emergent target disaster, and the fact that every event is unique in terms of geography, culture, infrastructure, technology, and the people involved. To tackle the above challenges, domain adaptation techniques that make use of existing labeled data from prior disasters and unlabeled data from a current disaster are designed. The resulting models are continuously updated and improved based on feedback from crowdsourcing volunteers. The research will provide real, usable solutions to emergency response organizations and will enable these organizations to improve the speed, quality and efficiency of their response. Educationally, this research will involve the integration of domain adaptation and emergency-related research into courses taught by the PIs, and training of undergraduate and graduate students, including underrepresented groups.

The goal of this research is to perform big data classification for an emergent disaster, called target disaster, using domain adaptation algorithms that employ unlabeled data from the target disaster, in addition to label data from prior source disasters. The research provides solutions based on deep neural networks to tackle the unique challenges in applying machine learning for crisis-related data analysis, specifically the volume and velocity challenges of big crisis data. The main contributions to the state-of-the-art in both the computer and information sciences and the social science are as follows: (1) Enhancements to the field of Emergency/Disaster Studies by mapping the key decisions about transferring situational awareness knowledge, and the information needs, type, form and flow during those decision points. This is an essential step to providig data to response organizations at a time and in a form that is verifiable, actionable and appropriate; (2) Novel domain adaptation approaches based on deep neural networks, for transferring information from source crises to a target crisis. Deep learning approaches will make it possible to employ large amounts of labeled source data and unlabeled target data, and to incrementally update the models as more labeled target data becomes available; (3) Further use and development of socio-technical systems theory to analyze the integration of technical and social systems in the context of knowledge transfer from prior source crises to an emergent target crisis. Technical and social solutions to the problem of transferring situational awareness knowledge will be blended together for use in emergency response.

Faculty Researcher: Zhenhui (Jessie) Li
Sponsoring Agency: National Science Foundation

Recent improvements in computing capabilities, data collection, and data science have enabled tremendous advances in scientific data analysis. However, the relevant data are often highly sensitive (e.g., Census records, tax records, medical records). This project addresses an emerging and critical scientific problem: Privacy concerns limit access to raw data that might reveal information about individuals. Techniques to "sanitize" such data (e.g., anonymization) could have negative impact on the quality of the scientific results that use the data. How can we provide data that protect the privacy of individuals but also accurately support scientific analyses?


Research Tags: Big Data
Faculty Researcher: Guoray Cai, John M. Carroll
Sponsoring Agency: National Science Foundation

This project seeks to discover new knowledge required to support geodeliberation in community geospatial decision-making contexts.  Geodeliberation refers to democratic deliberation (within local communities) on complex and controversial geographically-defined problems and involves the use of geographical information and online, asynchronous deliberative technologies.  This research addresses two key knowledge gaps. One is the lack of understanding about human information and interaction behavior while engaging online asynchronous geodeliberation, and the methodological challenges of supporting community-scale deliberation of complex geospatial problems over sustained engagements. A more formidable gap is between the desirable level of public involvement and the practical level of participation that can be supported by the current social-technical solutions. To address these gaps, the research applies an ethnographically-guided participatory research approach to: (1) identify opportunities and barriers in using geodeliberation to empower communities; and (2) investigate visual-computational methods to enable human participation and facilitation of geodeliberation processes.   Research activities include: developing cognitively-motivated design of visual representations and interfaces; models of deliberative discourse and decision-making in communities; and active facilitation of large-scale geodeliberation towards better coherence and effectiveness.  The research addresses broader impacts of three kinds. First, this project demonstrates the potential of using information technology to improve civic engagement in community-level. Second, the design research investigation of socio-technical support for geodeliberation will provide a concrete model for local governments across the nation. Third, this project will prepare a generation of undergraduate and graduate students with consciousness and career potentials in applying social-technical solutions in the practice of democratic decision-making.

Faculty Researcher: Dongwon Lee
Sponsoring Agency: National Science Foundation and Economic and Social Sciences Research Council

We are researching how individuals present themselves across different social media platforms. That is, when users create profiles in different social networks, are they redundant expressions of the same persona or are they adapted to each platform? Our team is reviewing profile images and shared information to take a first look at how user profiles vary in the aggregate across different social networks.


Research Tags: Big Data, Community and Social Informatics, Social Network, Privacy Model, Privacy Enforcement, Internet Security
Faculty Researcher: Peter Forster
Sponsoring Agency: National Institute of Justice

The primary purpose of this project is to collaborate with the Pennsylvania State Police (PSP) and community organizations to identify and describe opiate distribution networks, discern ways to disrupt them in Pennsylvania, and develop a model for implementation across the United States.


Research Tags: Abuse, Prescription Pain Relievers, Heroin, Social Networking Analysis, Distribution Networks, Hubs, Prevention, Enforcement, Treatment
Faculty Researcher: Vasant Honavar
Sponsoring Agency: National Science Foundation

Today individuals, society, and the United States critically depend on software to manage infrastructures for power, banking and finance, air traffic control, telecommunication, transportation, national defense, and healthcare. Specifications are critical for communicating the intended behavior of software systems to software developers and users and to make it possible for automated tools to verify whether a given piece of software behaves as intended. Safety critical applications have traditionally enjoyed the benefits of such specifications, but at a great cost. The absence of precise, comprehensible, and efficiently verifiable specifications is a major hurdle to developing software systems that are reliable, secure, and easy to maintain and reuse.

This project brings together an interdisciplinary team of researchers with complementary expertise in formal methods, software engineering, machine learning, and big data analytics to develop automated or semi-automated methods for inferring the specifications from code. The resulting methods and tools combine analytics over large open source code repositories to augment and improve upon specifications by program analysis-based specification inference through synergistic advances across both these areas.

Faculty Researcher: Carleen Maitland
Sponsoring Agency: U.S. Fulbright Program

ICT-based technology incubators have become an important component of entrepreneurship and economic growth in the U.S. Yet, it is unclear whether or not these models can transfer to emerging economies. Many studies have identified factors influencing incubators' success in launching businesses. However, these studies take only a narrow view of success as well as the factors involved. This research identifies a novel set of successful outcomes for incubators, taking into account the realities of less developed economies. It also develops a theory of boundary institutions that identifies organizational and governmental institutions necessary for success.


Research Tags: ICT Incubators, Entrepreneurship, Rwanda
Faculty Researcher: C. Lee Giles
Sponsoring Agency: The National Bureau of Economic Research

This research will merge disambiguated and linked MEDLINE ProQuest data into UMETRICS and National Institute of Health (NIH) Application data all at scale. For the NIH, we will undergo a Public Trust Background Check and obtain clearance. Once approved, it will provide access to NIH data via an NIH laptop. We will also compare the disambiguated and linked MEDLINE ProQuest data to the Smalheiser-Torvik Authority author data.

Faculty Researcher: Zhenhui (Jessie) Li
Sponsoring Agency: National Science Foundation

This INSPIRE project addresses the issue of high volume hydraulic fracturing, also called fracking, and its effects on ground water resources. Fracking allows drillers to extract natural gas from shale deep within the earth. Methane gas sometimes escapes from shale gas wells and can contaminate water resources or leak into the atmosphere where it contributes to greenhouse gas emissions. Monitoring for these potential leaks is difficult because methane is also released into aquifers naturally, and because monitoring is time- and resource-intensive. Such subsurface leakage may also be relatively rare. This project seeks to improve overall understanding of the impacts of natural gas drilling using both advances in computer science and geoscience, and to teach the public about such impacts. The project will elucidate both the effects of human activities such as shale gas development as well as natural processes which release methane into natural waters. Results of the proposed research will lead to a better understanding of water quality in areas of shale-gas development and will highlight problems and potentially problematic management practices. The research will advance both the fields of geoscience and computer science, will train interdisciplinary graduate students, and involve citizen scientists in collecting data and understanding environmental data analysis. The project combines new hydro-geochemical strategies and data mining approaches to study the release of methane into streams and ground waters.


Research Tags: Big Data
Faculty Researcher: James Wang
Sponsoring Agency: National Science Foundation

This project seeks to determine how the carbohydrate-based cell walls of guard cells dynamically change shape to control stomatal pore size, thus allowing plants to control carbon dioxide (CO2) uptake and water loss. Stomata are small openings in the surfaces of plants that regulate the photosynthetic conversion of CO2 into plant biomass, which serves as a renewable source of food, materials, and bioenergy. A deeper understanding of cell wall structure, mechanics, and dynamics in stomatal guard cells will help identify plants that can more efficiently use water, a major limiting factor in global agricultural production. The computational image analysis and modeling tools that will be developed in this project will provide scientists with new ways of interpreting and understanding experimental data. Because stomatal guard cells are an amazing example of cellular engineering by plants and are accessible and observable by scientists of all ages, a learning module will be developed and deployed that allows 4th through 8th graders to observe stomatal dynamics first-hand and challenges them to construct and optimize functioning macro-scale models of stomatal guard cells, helping to inspire future scientists and engineers. This project will also train two PhD students and a research associate in interdisciplinary research skills that cross the boundaries of biology, computer and information science, and engineering.


Research Tags: Image Segmentation, 3-D, Plant Biology
Faculty Researcher: Vasant Honavar
Sponsoring Agency: National Science Foundation

Vast quantities of health, environmental, and behavioral data are being generated today, yet they remain locked in digital silos. For example, data from health care providers, such as hospitals, provide a dynamic view of the health of individuals and populations from birth to death. At the same time, government institutions and industry have released troves of economic, environmental, and behavioral datasets, such as indicators of income/poverty, adverse exposure (e.g., air pollution), and ecological factors (e.g., climate) to the public domain. How are economic, environmental, and behavioral factors linked with health?

This project will put together numerous sources of large environmental and clinical data streams to enable the scientific community to address this question. By breaking current data silos, the broader scientific impacts will be wide. First, this effort will foster new routes of biomedical investigation for the big data community. Second, the project will enable discoveries that will have behavioral, economic, environmental, and public health relevance.

It also aims to assemble a first-ever data warehouse containing numerous health/clinical, environmental, behavioral, and economic data streams to ultimately enable causal discovery between these data sources. The ultimate goal of the project is to facilitate community-led and collaborative causal discovery through dissemination of integrated and open big data and analytics tools.

Faculty Researcher: James Wang
Sponsoring Agency: College of Information Sciences and Technology

This project aims to develop emotion-understanding capabilities for smart cities. The project is highly interdisciplinary, involving the College of Information Sciences and Technology, psychology, and statistics.


Research Tags: Emotion, Computer Vision
Faculty Researcher: Carleen Maitland
Sponsoring Agency: National Science Foundation

This project includes organizing a workshop designed to promote a three-cornered dialog between policymakers, researchers and data managers (implementation agencies), to collectively define a broadband research agenda that will ultimately provide answers to questions relevant to each of these stakeholders. The organizers will seek to secure the participation of broadband experts from multiple stakeholders, including universities and university affiliated research institutes, regulatory agencies, policy makers, industry bodies, and implementation agencies. A publicly available workshop report will be created to disseminate workshop findings.

Faculty Researcher: Frank Ritter
Sponsoring Agency: Office of Naval Research

Training is important for the Navy. In this research program, we will explore the implications of the KRK learning theory, particularly how it can help warfighters not only to initially learn but also to improve retention of skills. We will examine how different learning schedules influence learning retention, and now a range of learning schedules affect the learning and retention of different skills. We will also continue to develop a tutoring system to further test and to apply this theory. We will use maintenance of Navy platforms as a domain of study and analysis, particularly for the tutor. More generalized skills and more robust learning is necessary because Navy platforms increasingly vary as for a given platform its components are less uniform, making the maintenance and association problem solving more uncertain, requiring more general and more robust skill to maintain. Some of the key research areas and questions this program will attempt to answer include (1) identifying the learning theories and associated training protocols that promote development of robust and general maintenance skills, (2) understanding which training architectures will allow efficient practice of complex maintenance procedures to proceduralize and generalize knowledge to improve its availability and retention, and (3) investigating how such an architecture realizes tutors that can actually do the training and whether or not they are effective.

Faculty Researcher: Carleen Maitland
Sponsoring Agency: College of Information Sciences and Technology, Schreyer Honors College

ICT-based technology incubators have become an important component of entrepreneurship and economic growth in the U.S. Yet, it is unclear whether or not these models can transfer to emerging economies. Many studies have identified factors influencing incubators' success in launching businesses. However, these studies take only a narrow view of success as well as the factors involved. This research identifies a novel set of successful outcomes for incubators, taking into account the realities of less developed economies. It also develops a theory of boundary institutions that identifies organizational and governmental institutions necessary for success.


Research Tags: Cash-based Assistance, Humanitarian Aid, Socio-technical Systems, Rwanda
Faculty Researcher: Carleen Maitland
Sponsoring Agency: College of Information Sciences and Technology, Schreyer Honors College

ICT-based technology incubators have become an important component of entrepreneurship and economic growth in the U.S. Yet, it is unclear whether or not these models can transfer to emerging economies. Many studies have identified factors influencing incubators' success in launching businesses. However, these studies take only a narrow view of success as well as the factors involved. This research identifies a novel set of successful outcomes for incubators, taking into account the realities of less developed economies. It also develops a theory of boundary institutions that identifies organizational and governmental institutions necessary for success.


Research Tags: Cash-based Assistance, Humanitarian Aid, Socio-technical Systems, Rwanda
Faculty Researcher: James Wang
Sponsoring Agency: National Science Foundation

An award is made to The Pennsylvania State University, University Park campus to purchase a super-resolution microscope that will enable the capture of images of plant and animal cells, as well as complex chemical samples, at the scale of single molecules. This microscope will reveal new insights into how living and chemical systems are organized and work. The project will also generate new image analysis tools for the scientific community. The microscope will enable interdisciplinary research training and enhance education through coursework and outreach to other Penn State campuses and other institutions. Integration of this microscope into a core microscopy facility will make it available to undergraduate, graduate and postdoctoral trainees, and regular imaging workshops will be offered by Penn State. New teaching modules for K-12 and undergraduate educators demonstrating the science of size and the potential of super-resolution microscopy will be developed. Access and training will be assured for underrepresented students through programs including the Summer Experience in the Eberly College of Science, McNair Scholars, Women in Science and Engineering Research, and Minority Undergraduate Research Experience. Public understanding of super-resolution microscopy and its advantages will be catalyzed by multiple outreach activities and venues, including The Franklin Institute (science museum) and Penn State's Ag Progress Days, which together will expose this cutting-edge imaging technology to tens of thousands of people. The discoveries enabled by this microscope will advance the study of plant and animal development, sustainable agriculture and energy production, and the chemical interactions that define our physical environment.


Research Tags: Image Analysis, 3-D, Plant Biology
Faculty Researcher: Xiang Zhang
Sponsoring Agency: National Science Foundation

This project includes an integrated research, education, and outreach program that focuses on the development of novel methods for mining large, complex networks. Networks (graphs) are ubiquitous in real-world applications. Although successful, the methodology development for network analytics is still in its early stage. This project addresses fundamental questions essential to the advancement of large and complex network analytics. These challenges are driven by real-world applications in social, biological, and medical domains. The research plan is complemented by a comprehensive education and outreach plan focused on (1) the development of new interdisciplinary courses, (2) direct undergraduate involvement in the research projects, and (3) outreach activities including the STEM program targeting K-12 schools.

This research aims to extend the reliability and efficiency of large network analysis by (1) developing novel memory-based random walk proximity measures that can effectively capture the similarity between nodes, (2) studying the dual-network model and its applications, and (3) designing robust and flexible multi-network algorithms for clustering and ranking.

Faculty Researcher: John M. Carroll, Mary Beth Rosson
Sponsoring Agency: National Science Foundation

There are and will be too many elderly for society to continue custodial care regimes; and such regimes waste huge potential contributions of healthy elderly whose retirements can last 40 years.  In collaboration with public and private retirement communities, “aging in place” elderly, and key local nonprofits, we are investigating the practices of older adults and their interest in and utilization of technology to facilitate peer-based coproduction of health and wellbeing. Our project takes a pervasive participation approach to the recognition and facilitation of aging as a social resource in modern society. Our original plan was to specifically employ timebanking, but we have abandoned that for more radical approaches we invented.

Faculty Researcher: Vasant Honavar
Sponsoring Agency: National Library of Medicine

The Biomedical Big Data to Knowledge (B2D2K) Training Program at the Pennsylvania State University will bring together data science researchers and educators from the Eberly College of Science, the College of Engineering, the College of Health and Human Development, the College of Engineering, the College of Information Sciences and Technology, the College of Medicine, and the Geisinger Institute for Genomic Medicine to create a truly transformative multi-disciplinary predoctoral training environment.

The B2D2K program aims to train a diverse cadre of next-generation biomedical data scientists with the deep knowledge of Data Science to develop novel algorithmic and statistical methods for building predictive, explanatory, and causal models through integrative analyses of disparate types of biomedical data (including Electronic Health Records, genomics, behavioral, socio-economic, and environmental data) to advance science and improve health.

Faculty Researcher: Vasant Honavar
Sponsoring Agency: National Center for Advancing Translational Sciences

Over the past decade, Penn State’s Clinical and Translational Science Institute (CTSI) has developed into an active and visible entity within the University and the institutional home for clinical and translational research. The University has cutting-edge capabilities in a vast array of basic and applied disciplines that are relevant to health and critical to the discovery and development of innovative tools.

This project will catalyze multidisciplinary clinical and translational Team Science by engaging researchers, professionals, and communities across and outside the traditional boundaries of biomedicine, from within Penn State and beyond. It will also strengthen and accelerate research ethics and concerns for the health and healthcare needs of an increasingly diverse population. Last, it will effectively share resources and expertise with other CTSI Hubs, the CTSA Consortium, and more broadly with health practitioners and the public at large.

The project aims to dramatically improve clinical trials processes, accelerating the rate at which discoveries are translated into clinical care. It will also educate a new generation of health professionals and investigators to successfully address ethical issues that arise when technological capabilities and societal imperatives meet with economic and practical constraints.

Faculty Researcher: Anna Squicciarini, Peter Forster, Nicklaus A. Giacobe, Dongwon Lee
Sponsoring Agency: National Science Foundation

This project will expand the capability and involvement of Penn State students State in cyber-relevant disciplines. To support student needs, we have implemented a flexible and strong Scholarship for Service (SFS) program, based on customized mentoring for each student.

Through the program, we will (1) provide federal employers with exceptional entry-level professionals with the skills and attributes required to meet the Nation’s cybersecurity challenges and rise to the top of their field, (2) enhance our relationship with federal entities to ensure the success of the SFS program, and (3) ignite interest in IA careers and IA programs at Penn State and continue to demonstrate our support to the Nation’s cybersecurity enterprise.

Faculty Researcher: Lynette (Kvasny) Yarger
Sponsoring Agency: National Science Foundation

This study traces the career pathways of successful African American male college students who have opted to pursue IT-related careers. Grounded in the theory of power and practice posited by Pierre Bourdieu, the study uncovers the practical logics that African American men employ when making career choices. The sample consists of 100 African American male students who have expressed interest in IT careers and who are currently enrolled at Historically Black Colleges and Universities (HBCUs). A small sample of African American men at Predominantly White Institutions is also included. The research includes conducting person-centered analysis that is well grounded in theory and employs rigorous qualitative approaches. The proposed work contributes to the limited literature on African American men's academic success and helps clarify some of the mixed and contradictory findings about their career choices that exist in the current literature.

The results of this research reveal how and which social structures enable and constrain African American men's IT-related career choices; information which may be useful to policy makers, teachers and school counselors, and that may inform the creation of innovative interventions. Findings may also contribute to the increase in the STEM workforce while expanding the career options for African American men who are historically disproportionately affected by economic downturns. The research continues a collaboration between Washington State University and Pennsylvania State University as well as strengthens ongoing partnerships with the four participating HBCUs.

Faculty Researcher: Dongwon Lee, Peng Liu, Mary Beth Rosson
Sponsoring Agency: National Science Foundation

With the increasing participation in online social networks, it is critical to preserve users’ privacy, without preventing them from socialization and information sharing. Unfortunately, existing approaches fall short meeting such requirements. In general, security and privacy problems in social networks can be viewed from two perspectives: human-oriented and technology-centered. These two camps, however, are largely isolated in the literature, and their findings are not integrated with each other.

This research will articulate a unifying framework that bridges the gap between the perspectives through which user security and privacy problems in social networks can be viewed. We will (1) study the threats and vulnerabilities of social networks and existing protection approaches; (2) detect the discrepancies between user expectations and actual information disclosure; (3) articulate a user-centered yet computationally-efficient formal model of user privacy in social networks, and (4) implement a mechanism to effectively enforce privacy policies in the proposed model. The solutions will protect user privacy in a way that reconciles users' desire for freely sharing sensitive information in social networks.


Research Tags: Social Network, Privacy Model, Privacy Enforcement, Internet Security
Faculty Researcher: Carleen Maitland
Sponsoring Agency:

Harnessing data for rapid response in humanitarian crises is a pressing challenge as well as opportunity. Humanitarian relief organizations work to share data during response, but these efforts must overcome organizational, work, distance and technical barriers. This research examines the processes of institutionalization of norms around routines and work practices in the sharing of data during a humanitarian crisis. The research will generate knowledge of the process of proto-institutionalization, a theory that has heretofore lacked an empirical basis. It will also generate findings for practice with important implications for improving data sharing and collaborative analytics in humanitarian crises.


Research Tags: Institutional Theory, Collaborative Data Analytics, Humanitarian Relief
Faculty Researcher: Peter Forster, Dongwon Lee, Anna Squicciarini
Sponsoring Agency: National Science Foundation

This NSF funded project support the development of the federal cybersecurity workforce through a defined curriculum, internship opportunities in appropriate organizations, and engaged scholarship including unique course offerings, federal connections, and ultimately job placement.


Research Tags: Cybersecurity, Education, Federal Work Force, National Security
Faculty Researcher: Peng Liu, John Yen
Sponsoring Agency: U.S. Army Research Laboratory

This research aims to achieve three goals in advancing cybersituation awareness when cyberoperation centers are doing cyberanalysis to detect intrusions: (1) automatic data triage through data mining of operation traces of analysts, (2) context aware experience retrieval, and (3) Intelligent software agents that can interact with human analysts.

Faculty Researcher: Carleen Maitland
Sponsoring Agency: National Science Foundation and the United Nations High Commissioner for Refugees

Refugees are among the world’s most vulnerable people. However, for some, the adversity results in an amazingly resourceful and innovative spirit, open to change. Our research examines how Information Communication Technologies (ICT) can improve the lives of refugees as well as the operations of their service providers. Important questions include: How does refugee use of IT change across the refugee life cycle? Can IT-based service provider systems be used to promote both upward and downward accountability? What new, innovative technologies can be developed for use in camps? With urban refugees? How can data support community development among refugees?


Research Tags: Cognitive Science, Community and Social Informatics, Human-Computer Interaction
Faculty Researcher: Andrea Tapia
Sponsoring Agency: National Science Foundation

Recent natural disasters have challenged our traditional approaches of planning for and managing disruptive events. Today, social media provides an opportunity to make use of community-driven data to help us understand the resilience, or lack thereof, of community networks (e.g., friends, neighborhoods) physical infrastructure networks (e.g., transportation, electric power) and networks of service providers (e.g., emergency responders, restoration crews). This research integrates multiple disciplinary perspectives in engineering, computer science, and social science to address how community-driven data can help (i) understand the behavior of these interdependent networks before, during, and after disruptions, and (ii) more effectively reduce their vulnerability to and enhance their recovery after a disruption.

Two research components comprise the proposed effort in resilience analytics. The first component creates a network model of the interdependence of infrastructure networks, the community networks that they serve, and the service networks engaged to respond after a disruption. We will explore the functional relationships between community resilience and infrastructure network performance. Model results will enable decision makers to understand the balance of resilience across the several networks and regions. The second component integrates the interdependent network model with community-sourced data to develop a framework of data analytics to better understand and plan for resilience. This component builds on research in the field of socio-technical systems relating to the analysis of social media data monitored after a disruption. The methods will assess the value of information provided by crowd-sourced data with expertise of community social scientists. This project draws upon multiple methods across several disciplines. The multidisciplinary methods explored in this project are essential for a breakthrough in resilience analytics. This project aims at taking a significant step forward in our understanding of how real-time data from social media and other sources can describe, predict, and prescribe practices to manage interdependent networks in crises.

Faculty Researcher: Dinghao Wu
Sponsoring Agency: Office of Naval Research

As cyberthreats become commoditized, with a broad range of tools on the market that are easily accessible by attackers, there is a critical need in defense with automated tools. While the cyberthreat techniques and landscape are evolving, with latest technologies adopted quickly, we are falling behind in the defense side as software systems takes long time to mature and production systems have fairly long life cycles. It is costly and technically difficult to patch these legacy systems once widespread zero-day cyberthreats are discovered.

This research explores reverse engineering bases diversification and transformation methods for defending such widely spread cyberthreats. Current computer systems are highly, allowing attackers to attack numerous systems once a common vulnerability is revealed. To reverse such asymmetric attacks, we propose a set of reverse engineering based diversification techniques, shipped with an iterative platform which can compose and amplify small and basic diversification techniques. The proposed techniques and platform can defend cyberattacks and render threats not widely exploitable, as we are able to generate thousands of heterogeneous variants of the same software. Our aim is to make the reverse engineering code recompilable or reassembleable, which can help augment legacy software systems with modern security mechanisms.


Research Tags: i. Reverse Engineering, Software Diversification, Cyber Fault Tolerance, Software Analysis, Software Security, Program Analysis
Faculty Researcher: Anna Squicciarini, Peter Forster
Sponsoring Agency: National Science Foundation

This program allows Penn State to provide grants through the Federal Cyber Corps Scholarship for Service (SFS) program to students studying in the field of information assurance (IA). Each scholarship recipient will complete either a Bachelor's or graduate degree in Security and Risk Analysis, Information Sciences and Technology, or Computer Science. Each scholarship student must be able to complete his or her academic program within a maximum of two years.

Faculty Researcher: Dinghao Wu, Peng Liu
Sponsoring Agency: Office of Naval Research

Modern software engineering practice heavily relies on third party libraries, existing frameworks, high level programming languages, and agile development methodologies, which allow us to build more complex software and deliver it faster. These practices, however, cause some negative consequences, such as bloatware and feature creep. When such an application is running in the system, inside its address space is unused (library) code, which exposes extra attack surface that gives an attacker more choices in launching attacks. The unused yet shared library code also reduces the software diversity among the applications. Removing such unused code from each address space will not only lead to leaner and more efficient code, but also enable the computer systems to achieve better “vertical” application-application isolation, reduced attack surface, and enhanced diversity. In this project, we aim to build infrastructure and technologies for software customization, especially for libraries at binary code level. We aim to have a set of new capabilities to achieve better isolation, less sharing and less dependencies between code, and to implicitly diversify software.


Research Tags: i. Binary Code Analysis, De-bloating, Software Bloat, Software Analysis, Software Security, Program Analysis
Faculty Researcher: Zhenhui (Jessie) Li
Sponsoring Agency: National Science Foundation

Rapid advances of sensing and positioning technologies have provided us with an increasing amount of trajectory data collected from human movements, animal traces, and traffic. Understanding such large-scale trajectory data together with their surrounding contexts (e.g., location information, local events, weather and environment) could benefit a number of important applications. For example, a semantic understanding of human trajectories can help profiling a person's interest, socioeconomic status and health conditions; mining traffic patterns w.r.t. local events and weather conditions can lead to a more resilient transportation system; and studying how animal movements respond to environmental changes can advance our understanding of the ecological system. This project investigates data mining algorithms and provides solutions toward semantic trajectory mining with rich spatial-temporal contexts. The results will have broader impacts in other disciplines such as social science, health, transportation, and ecology through interdisciplinary collaborations.


Research Tags: Big Data
Faculty Researcher: James Wang
Sponsoring Agency: National Science Foundation

The emergence of massive human-rated and commented visual data has opened avenues for exploring fundamental questions in artificial intelligence beyond the horizon. This project tackles the challenge of automatically inferring visual aesthetics and emotions and inventing new systems that assist creative and decision-making activities of the general public. An interdisciplinary team, with expertise in visual modeling, data mining, psychology, and computational sciences will build tools to distill information from a combination of visual, textual, and numerical data. Visual features, selected based on published literature and consultation with domain experts, will be extracted for discriminating types of emotions. The resulting systems can select and rank visual information based on aesthetics and emotions.


Research Tags: Emotion, Affective Computing, Image Analysis, Machine Learning
Faculty Researcher: David Fusco
Sponsoring Agency: National Science Foundation

The significance and importance of this project is the creation of two workshops that will focus on the learning of science, technology, engineering, and mathematics (STEM) subjects in Virtual Worlds and how other education technologies can be augmented with Virtual Worlds. At each workshop, STEM educators will share their research and learn from specialists in education and industry. Anticipated outcomes of the workshops include: improved educational activities for college students in STEM fields, training faculty with a better understanding of how students learn STEM subjects using Virtual Worlds, and stronger collaborations between educators and Virtual World designers in the computer/software industry.

Each four-day workshop will offer in-depth research presentations, hands-on experiences, engaging activities, and lively discussions. The principle investigators will invite approximately 30 attendees to each workshop. In addition, experts in curriculum design and assessment will help STEM educators understand how to develop activities with achievable, measurable learning objectives. Industry experts, such as programmers and representatives from companies developing virtual reality hardware and software, will share their technical knowledge and learn about the needs of educators who use their services and products.

Faculty Researcher: C. Lee Giles
Sponsoring Agency: National Science Foundation

This is a collaborative project involving Ohio State University (lead institution), Pennsylvania State University, American Institutes for Research, University of Illinois-Urbana, and the University of Iowa. The project examines the impact of different research funding structures on the training of graduate students and postdoctoral fellows and the impact of their subsequent outcomes. The rationale for the study is the recognition that research teams are organized differently in composition, size, and reliance of graduate students versus postdoctoral fellows. In addition, funding agencies change the structure of science training by creating programs that encourage interdisciplinary groups, multi-university collaborations, or large research centers that focus on specific research questions. However, little research has been done about how these factors shape the career preparation of STEM professionals.

The project will have broad implications for the entire field of STEM education policy and research. The underlying algorithms and tools will be made available to the academic research community and can be leveraged to link internal human resources data sets to external data sets. This new data infrastructure will also facilitate the assessment of the effects of research investments on research productivity, as well as undergraduate and graduate curriculum development.

Faculty Researcher: Peng Liu
Sponsoring Agency: CVS Health

As companies develop their serialization programs, it is imperative that the industry and supply chain partners are evolving in a harmonized manner. Current approaches to the serialization business process, however, tend to be driven in silos by specific interest groups rather than an industry-wide approach. This research aims to encourage the end-to-end, industry-wide transformation, an understanding of serialization impacts on supply chain processes and IT systems is a vital first step. Specifically, we will investigate these two aspects to (1) conceptualize a framework representing the pharmaceutical supply chain in the evolving serialization eta, and (2) evaluate the serialization data architecture required for both regulatory compliance, supply chain performance improvements, and potential business value to organizations.

Faculty Researcher: John M. Carroll
Sponsoring Agency: Intel Corp.

Local community is an important level of social structure in which humans develop their identities; they understand and practice critical skills and capacities such as participation, collective awareness, and giving/receiving social support. For most of human history communities were mediated through face-to-face interaction.  Community information infrastructures are evolving rapidly in recent decades. Our project investigates local-scale mobile computing support for making community heritage, values, news and opinion, municipal planning, and crisis response more visible and participatory.

Faculty Researcher: C. Lee Giles
Sponsoring Agency: National Science Foundation

During these big data times, researchers are exposed to large numbers of research papers, which provide the technological basis for worldwide collection, sharing and dissemination of scientific discoveries. The most important parts or concepts of the text of a paper are not always readily available, but are hidden in the multitude of details that accompany them. Keyword extraction, defined as the problem of automatically extracting the important words or concepts of a text, is central to dealing with the overwhelming amounts of information available in these papers.

The goal of this project is to explore robust and accurate approaches that uncover the most important parts of documents to allow researchers process more information in less time. In particular, this project will investigate models that take into consideration the linkage between citing and cited documents in a document network and will explore various qualitative and quantitative aspects of the questions: What are the key words or concepts in a document? The proposed models will be based on novel unsupervised approaches that combine the complementary strengths of topic models and graph-based algorithms.

In the big picture, this research has the potential to help researchers navigate through the large number of research papers that are available in this information age, which in turn may facilitate progress in science. The results of this proposed research will have a direct pipeline to the CiteSeer digital library. In particular, the proposed models will be tested for their robustness and utility in real world settings.

Faculty Researcher: Xiang Zhang
Sponsoring Agency: National Science Foundation

A fundamental challenge in life sciences is the characterization of genetic factors that underlie phenotypic differences. Thanks to the advanced sequencing technologies, an enormous amount of genetic variants have been identified and cataloged. Such data hold great potential to understand how genes affect phenotypes and contribute to the susceptibility to environmental stimulus. However, the existing computational methods for analyzing and interpreting the high-throughput genetic data are still in their infancy.

This research will investigate the computational and statistical principles in modeling and discovering genetic basis of complex phenotypes. Specifically, we aim to provide answers to understand how (1) to effectively and efficiently assess statistical significance of the findings, (2) to account for the relatedness between samples in genetic association study, and (3) to accurately capture possible interactions between multiple genetic factors and their joint contribution to phenotypic variation. Collectively, the theoretic framework and algorithms will provide the research community much better tools to dissect complex relationships between genotypes and phenotypes, and gain deeper understanding of the roles of environmental stimuli.

Faculty Researcher: Vasant Honavar, John Yen, Yasser El-Manzalawy
Sponsoring Agency: National Science Foundation

In many applications, our ability to realize the full potential of big data to improve decisions and outcomes is currently limited by the lack of practical frameworks for analysis of sensitive data in a manner that does not violate applicable data access and use policies.

This project aims to explore a framework and software infrastructure for data access and use policy compliant analysis and visualization of sensitive data. It aims to develop a novel “Query, Model, Evaluate and Deploy” (QMED) framework for data access and use policy compliant analysis and visualizations of sensitive data. This project seeks to test the feasibility of the proposed framework using predictive and causal modeling of data from an online health community as a test case. This research will yield a prototype open source software infrastructure to support analysis and visualization of sensitive data, accelerate date-driven advances in domains that involve sensitive data through the broad engagement of talent in developing better algorithms and support the incorporation of hands-on experience with such applications into Data Sciences education through hackathons and competitions organized around specific sensitive data sets.

Faculty Researcher: Peng Liu
Sponsoring Agency: National Science Foundation

Cloud computing offers many benefits to users, including increased availability and flexibility of resources, and efficiency of equipment. However, privacy concerns are becoming a major barrier to users transitioning to cloud computing. The privilege design of existing cloud platforms creates great challenges in ensuring the trustworthiness of cloud by granting too much power to the cloud administrators, who could launch serious insider attacks by abusing their administrative privileges.

This research uses a well-understood philosophy – separation-of-privilege – in the architectural design of a cloud platform. The architectural design and the strong homomorphic cryptographic approach protect data privacy in cloud environments from different angles. This project develops an innovative privacy-driven architectural design with one focus on the privilege-level design of each software component of a cloud platform and another on defending insider attacks. This project investigates new mechanisms to de-privilege the cloud administrator and enable more fine-grained access control among the software components of a cloud platform. More specifically, the new mechanisms enable agile configuration of the platform; user-configurable privacy protection; and strong isolation in the user space. The techniques developed under this project are immensely important as users place more of their data into the cloud and rely upon cloud providers to keep that data private.

Faculty Researcher: Xinyu Xing
Sponsoring Agency: National Science Foundation

Despite the best efforts of developers, software inevitably contains flaws that may be leveraged as security vulnerabilities. Modern operating systems integrate various security mechanisms to prevent software faults from being exploited. To bypass and hijack program execution, an attacker therefore needs to constantly mutate an exploit and make many attempts. While in their attempts, the exploit triggers a security vulnerability and makes the running process terminate abnormally. After a program has crashed and terminated abnormally, it typically leaves behind a snapshot of its crashing state in the form of a core dump. While a core dump carries a large amount of information, which has long been used for software debugging, it barely serves as informative debugging aids in locating software faults, particularly memory corruption vulnerabilities. As such, previous research mainly seeks full reproducible execution tracing to identify software vulnerabilities in crashes. Such techniques, however, are usually impractical for complex programs.

This research aims to explore, design and develop lightweight, systematic, and automatic approaches that run a core dump to an informative aid in tracking down memory corruption vulnerabilities. The project aims to (1) develop a technical approach to improve the quality of information extracted from core dumps, (2) explore a set of technical approaches to enhance this readily-available information, and (3) develop a technical approach to automatically analyze enhances core dumps and pinpoint the root cause of software crashes.

Faculty Researcher: Sencun Zhu, Peng Liu, Dinghao Wu
Sponsoring Agency: National Science Foundation

Software plagiarism is an act of reusing someone else's code, in whole or in part, into one’s own program in a way that violates the terms of original license. Along with the rapid developing software industry and the burst of open source projects, software plagiarism has become a very serious threat to Intellectual Property Protection and the health of the open-source-embracing software industry. Meanwhile, software plagiarism and “app repackaging” have become even more common phenomena in the mobile app markets for monetary profit or propagation of malware by inserting malicious payloads into the original apps.

This project will study the software plagiarism detection problem in a systematic way. The proposed plagiarism detection methods for PC applications leverage program logic and longest semantically-equivalent-basic-block subsequences. They are capable of detecting partial program plagiarism and also provide formal guarantee on obfuscation resilience. The proposed method for mobile apps exploits user interface for plagiarism detection, and this unique design angle empowers it to defeat various code obfuscation techniques. Our research will significantly deter the intention or practice of software plagiarism. It will not only serve as a useful tool in collecting strong plagiarism evidences for lawsuits related to intellectual property, but also promote a more healthy and trustworthy sharing environment for the open source community and for the mobile app markets. Broader impact will also result from the education and dissemination initiatives.

Faculty Researcher: Anna Squicciarini
Sponsoring Agency: National Science Foundation

Images are now one of the key enablers of users’ connectivity. Every day, more than 4.5 million images are uploaded on Flickr, and nearly a billion images are posted on Facebook. Various types of images are shared to represent users’ interests and show their experiences for social purpose. While extremely convenient, this level of pervasiveness introduces acute privacy concerns such as disclosing unwanted information or sharing images with unintended audiences. Malicious attackers can take advantage of these unnecessary leaks to launch content-aware attacks or even impersonation attacks.

This research investigates approaches to protecting users’ online image privacy. We will develop new techniques to tackle image privacy based on the image content as well as image and users’ meta-data, by (1) inferring the sensitivity of a given image based on the visual properties of the images and the users’ image sharing patterns, and then automatically applying the appropriate privacy settings for that image; and (2) by using discovered users’ sharing patterns to define access policies according to the locally enforceable controls on the domain of interest. This work will entail a complex set of methodologies, including machine learning, access control, and information retrieval.

Faculty Researcher: Carleen Maitland
Sponsoring Agency: National Science Foundation

Tribal communities represent the final frontiers of internet access in the U.S., with broadband internet access available to fewer than 10 percent of Native Americans on tribal reservation lands. The lack of broadband access is caused by a collection of challenges, including remote terrain, inadequate funding, and complex telecommunication policies. Yet Native Americans need reliable avenues for participation and contribution to internet content to strengthen their communities.

This project investigates technologies that will increase internet availability on reservation lands. Further, it will develop new methodologies of disseminating internet content to reservation residents, prioritizing content by relevance during periods of limited connectivity. The goal of the research is to make critical inroads to address the lack of internet access on tribal reservations, to increase the number of Native American reservation residents who are able to engage with, create, and disseminate Internet and on-line social network content.

Faculty Researcher: David Reitter
Sponsoring Agency: National Science Foundation

The Correlates of War Project's Militarized Interstate Dispute (MID) Data is the most prominent and heavily used data collection in the study of international conflict. The most recent version (MID4) was released in 2014 and brings the period covered to 1816-2010. The MID4 project utilized automated text classification procedures to make the process of identifying relevant news stories more efficient. Over the course of that project, the principle investigators (PIs) determined the primary bottleneck in the workflow was the coding of those news documents. To address this inefficiency, the PIs completed a pilot project to determine whether crowdsourcing techniques could be used to code these documents. In the pilot, non-expert workers were paid small sums to read documents and to answer sets of questions, the answers to which were used to identify features of possible militarized incidents (the events that comprise MIDs). A systematic comparison of the crowdsourced responses with those of MID4 Project's trained coders revealed that the crowdsourced codings were completely accurate for 68 percent of the news reports coded; more importantly, high agreement among crowd responses on specific reports was strongly associated with correct coding. This enables the PIs to detect which documents require further expert involvement. As a result, the PIs can produce a majority of the MID data in near-realtime and at limited financial cost. These procedures are applied on the MID5 Project, which will update the MID data for the period 2011-2017.

Faculty Researcher: Anna Squicciarini
Sponsoring Agency: National Science Foundation

This project will develop models and techniques to facilitate controlled information sharing of users' data in domains where the data is associated with and co-managed by multiple users, such as bio-repositories, remote teleworking, and social computing. Specifically, we will (1) build on prior work to develop a foundational model describing access control in terms of the decision making process of a single content manager or content owner, laying the groundwork for the second objective; (2) develop new models to support synchronous, asynchronous, and combined joint specification of access control policies for shared content for multiple users and site administrators; and (3) apply those solution concepts to two specific applications, group work and a biobank, and conduct user studies to test goodness of fit, suitability and feasibility of the resulting access setting mechanisms.

This project takes an innovative user-centric approach to ensure that the rigorous models developed result in enforceable mechanisms that can be used on a variety of existing platforms in and multiple domains. To accomplish this, the proposed work draws from multiple disciplines, including access control, game theory for security and privacy, and decision support systems. This research will provide users with the ability to express preferred access control settings for shared multi-owned data, jointly influencing with that input the final access settings, while taking into account organizational constraints and existing laws.

Faculty Researcher: James Wang
Sponsoring Agency: National Science Foundation

Severe weather causes an enormous amount of damages to life and property around the world. One indication of severe weather is a wind pattern known as “bow echoes,” which is currently manually identified by meteorologists. We are analyzing vast historical data collected by the National Oceanic and Atmosphere Administration (NOAA) to develop an automatic framework that can automatically and accurately identify bow echoes as they begin to form, potentially providing better and earlier forecasting of severe weather.


Research Tags: Big Data, Information Fusion and Visualization, Weather, Satellite Imaging, Radar
Faculty Researcher: Vasant Honavar, Mary Beth Rosson, C. Lee Giles
Sponsoring Agency: Rutgers-The State University of New Jersey

This project develops a virtual data collaboratory that can be accessed by researchers, educators, and entrepreneurs across institutional and geographic boundaries, fostering community engagement and accelerating interdisciplinary research. A federated data system is created, using existing components and building upon existing cyberinfrastructure and resources in New Jersey and Pennsylvania.

The end product is a fully-developed system for collaborative use by the research and education community. A data management and sharing system is constructed, based largely on commercial off-the-shelf technology. The storage system is based on the Hadoop Distributed File System (HDFS), a Java-based file system providing scalable and reliable data storage, designed to span large clusters of commodity servers. The Fedora and VIVO object-based storage systems are used, enabling linked data approaches.

The system will be integrated with existing research data repositories, such as the Ocean Observatories Initiative and Protein Data Bank repositories. The project also develops a custom site federation and data services layer; the data services layer provides services for data linking, search, and sharing; coupling to computation, analytics, and visualization; mechanisms to attach unique Digital Object Identifiers (DOIs), archive data, and will broadly publish to internal and wider audiences; and manage the long-term data lifecycle, ensuring immutable and authentic data and reproducible research.

Faculty Researcher: John M. Carroll, Mary Beth Rosson
Sponsoring Agency: National Science Foundation

Expeditions in Computing are the top-tier of research awards NSF makes in Computer and Information Science; this project is quite broad and ambitious in many areas. The project leverages low-power, top-down “gist”-based computer vision hardware and algorithms to enable a new generation of smart-camera visual prosthetics. Our focus is on the potential utilization of these new technologies by visually impaired people. We created a permanent participatory research partnership with a dozen local people who have helped us to understand their daily activities in great detail, and who have helped to investigate a series of prototypes. We have also worked with them to clarify what assistive technology means in this context.