Wenke Lee's Research

Highlights of Recent Research Projects

Systems, Software, and Web Security, and Machine Learning for Security

We have studied the problem of defending against sophisticated web-based social engineering techniques where attackers leverage low-tier ad networks to inject social engineering components onto web pages to lure users into websites that the attackers control for further exploitation. Most of these exploitations are Web-based Social Engineering Attacks (WSEAs), such as reward and lottery scams. We have developed TRIDENT, a novel defense system that aims to detect and block generic WSEAs in real-time. TRIDENT stops WSEAs by detecting Social Engineering Ads (SE-ads), the entry point of general web social engineering attacks distributed by low-tier ad networks at scale. Our extensive evaluation showed that TRIDENT can detect SE-ads with an accuracy of 92.63% and a false positive rate of 2.57% and is robust against evasion attempts. We also evaluated TRIDENT against the state-of-the-art ad-blocking tools. The results show that TRIDENT outperforms these tools with a 10% increase in accuracy. Additionally, TRIDENT only incurs 2.13% runtime overhead as a median rate, which is small enough to deploy in production. This work was published in the 2023 USENIX Security Symposium.
We have been developing machine learning techniques for program analysis. We have studied the problem of automating the identification of security bugs with the goal of classifying and identifying the line on which the vulnerability occurs in a program. We have developed VulChecker, a tool that can precisely locate vulnerabilities in source code (down to the exact instruction) as well as classify their type (CWE). To accomplish this, we developed a new program representation, program slicing strategy, and the use of a message-passing graph neural network to utilize all of code's semantics and improve the reach between a vulnerability's root cause and manifestation points. We also develop a novel data augmentation strategy for cheaply creating strong datasets for vulnerability detection in the wild, using free synthetic samples available online. With this training strategy, VulChecker was able to identify 24 CVEs (10 from 2019 & 2020) in 19 projects taken from the wild, with nearly zero false positives compared to a commercial tool that could only detect 4. VulChecker also discovered an exploitable zero-day vulnerability, which has been reported to developers for responsible disclosure. This work was published in the 2023 USENIX Security Symposium.
We have been developing symbolic and dynamic analysis techniques for vulnerability analysis. We proposed bug hunting using symbolically reconstructed states based on execution traces to achieve better detection and root cause analysis of overflow, use-after-free, double free, and format string bugs across user programs and their imported libraries. We discovered that with the right use of widely available hardware processor tracing and partial memory snapshots, powerful symbolic analysis can be used on real-world programs while managing path explosion. Better yet, data can be captured from production deployments of live software on end-host systems transparently, aiding in the analysis of user clients and long-running programs like web servers. We have implemented a prototype, Bunkerbuster, for Linux and evaluated it on 15 programs, where it found 39 instances of our target bug classes, 8 of which have never before been reported and have lead to 1 EDB and 3 CVE IDs being issued. These 0-days were patched by developers using Bunkerbuster's reports, independently validating their usefulness. In a side-by-side comparison, our system uncovered 8 bugs missed by AFL and QSYM, and correctly classified 4 that were previously detected. Bunkerbuster accomplishes this with 7.21% recording overhead. This work was published in the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS).

Cyber-Physical Systems Security

We have studied the security of cyber-physical systems (CPS), in particular, industrial control systems such as the Supervisory Control and Data Acquisition (SCADA) systems. We have developed SCAPHY, a systems to detect ICS attacks in SCADA by leveraging the unique execution phases of SCADA to identify the limited set of legitimate behaviors to control the physical world in different phases, which differentiates from attacker's activities. For example, it is typical for SCADA to setup ICS device objects during initialization, but anomalous during process-control. To extract unique behaviors of SCADA execution phases, SCAPHY first leverages open ICS conventions to generate a novel physical process dependency and impact graph (PDIG) to identify disruptive physical states. SCAPHY then uses PDIG to inform a physical process-aware dynamic analysis, whereby code paths of SCADA process-control execution is induced to reveal API call behaviors unique to legitimate process-control phases. Using this established behavior, SCAPHY selectively monitors attacker's physical world-targeted activities that violates legitimate process-control behaviors. We evaluated SCAPHY at a U.S. national lab ICS testbed environment. Using diverse ICS deployment scenarios and attacks across 4 ICS industries, SCAPHY achieved 95% accuracy and 3.5% false positives (FP), compared to 47.5% accuracy and 25% FP of existing work. This work was published in the 2023 IEEE Symposium on Security and Privacy (Oakland).

Privacy-Preserving Biometrics Based Authentication and Surveillance

The explosive growth of biometrics use (e.g., in surveillance) poses a persistent challenge to keep biometric data private without sacrificing the apps' functionality. We consider private querying of a real-life biometric scan (e.g., a person's face) against a private biometric database. The querier learns only the label(s) of a matching scan(s) (e.g. a person's name), and the database server learns nothing. We formally define Fuzzy Labeled Private Set Intersection (FLPSI), a primitive computing the intersection of noisy input sets by considering closeness/similarity instead of equality. Our FLPSI protocol's communication is sublinear in database size and is concretely efficient. We have implemented it and applied it to facial search by integrating with our fine-tuned toolchain that maps face images into Hamming space. We have implemented and extensively tested our system, achieving high performance with concretely small network usage: for a 10K-row database, the query response time over WAN (resp. fast LAN) is 146ms (resp. 47ms), transferring 12.1MB; offline precomputation (with no communication) time is 0.94s. FLPSI scales well: for a 1M-row database, on- line time is 1.66s (WAN) and 1.46s (fast LAN) with 40.8MB of data transfer in online phase and 37.5s in offline precomputation. This improves the state-of-the-art work (SANNS) by 9 to 25 times (on WAN) and 1.2 to 4 times (on fast LAN). Our false non-matching rate is 0.75% for at most 10 false matches over 1M-row DB, which is comparable to underlying plaintext matching algorithm. This work was published in the 2021 USENIX Security Symposium.
Biometric authentication has become increasingly popular because of its appealing usability and improvements in biometric sensors. At the same time, it raises serious privacy concerns since the common deployment involves storing bio-templates in remote servers. Current solutions propose to keep these templates on the client's device, outside the server's reach. This binds the client to the initial device. A more attractive solution is to have the server authenticate the client, thereby decoupling them from the device. Unfortunately, existing biometric template protection schemes either suffer from the practicality or accuracy. The state-of-the-art deep learning (DL) solutions solve the accuracy problem in face- and voice-based verification. However, existing privacy-preserving methods do not accommodate the DL methods, as they are tailored to hand-crafted feature space of specific modalities in general. We have developed a novel pipeline, Justitia, that makes DL-inferences of face and voice biometrics compatible with the standard privacy-preserving primitives, like fuzzy extractors (FE). For this, we first form a bridge between Euclidean (or cosine) space of DL and Hamming space of FE, while maintaining the accuracy and privacy of underlying schemes. We also introduce efficient noise handling methods to keep the FE scheme practically applicable. We implement an end-to-end prototype to evaluate our design, then show how to improve the security for sensitive authentications and usability for non-sensitive, day-to-day, authentications. Justitia achieves the same, 0.33% false rejection at zero false acceptance, errors as the plaintext baseline does on the YouTube Faces benchmark. Moreover, combining face and voice achieves 1.32% false rejection at zero false acceptance. According to our systematical security assessments conducted through prior approaches and our novel black-box method, Justitia achieves ~25 bits and ~33 bits of security guarantees for face- and face-and-voice-based pipelines, respectively. This work was published in the 16th ACM ASIA Conference on Computer and Communications Security (ACM AsiaCCS 2021).
We have developed a Real Time Captcha system called rtCaptcha, which stops/slows down attacks on face/voice-based authentication by turning the adversary's task from creating authentic video/audio of the target victim performing known authentication tasks (e.g., smile, blink) to figuring out what is the authentication task, which is encoded as a Captcha. Specifically, when a user tries to authenticate using rtCaptcha, they will be presented a Captcha and will be asked to take a "selfie" video while announcing the answer to the Captcha. As such, the security guarantee of our system comes from the strength of Captcha, and not how well we can distinguish real faces/voices from synthesized ones. This work was published in the 2018 Network and Distributed System Security Symposium (NDSS).

User Interface Security

We pioneered the study of security issues in modern UI designs. For example, we showed that accessibility support in all modern operating systems can be exploited by attackers to gain unauthorized access to privileged resources because the accessibility libraries are not integrated properly with the OS access control mechanisms. This work was published in the 2014 ACM CCS. As another example, we showed an app with the two Android permissions, SYSTEM ALERT WINDOW and BIND ACCESSIBILITY SERVICE, can completely control the UI feedback loop and create devastating attacks (that are hidden from the user). This work was published in the 2017 IEEE Symposium on Security and Privacy and won the Distinguished Practical Paper Award. In order to improve UI security, we developed a framework for eradicating clickjacking on Android, and this work was published in the 2018 ACM CCS.

Past Research Activities

Transparency of information access on the Internet: identifying censorship attempts and developing techniques to circumvent/defeat censorship, funded by NSF and the industry.
PEASOUP: Preventing Exploits Against Software of Uncertain Provenance, funded by Air Force (led by GrammaTech).
Botnet modeling, analysis, detection and attribution, funded by NSF, DHS, and ONR MURI.
"CLEANSE: Cross-Layer Large-Scale Efficient Analysis of Network Activities to Secure the Internet", funded by NSF (Large Team project).
Malware analysis algorithms and platforms, funded by NSF and industry.
Host-based Security, in particular, virtual machine monitoring techniques, funded by NSF, IARPA, and industry.
Web security and privacy, in particular, access control and information flow, funded by industry.
Foundational and Systems Support for Quantitative Trust Management, ONR MURI (led by U Penn).
An Information-Theoretic Framework for Evaluating and Optimizing Intrusion Detection Performance, funded by Army Research Office.
Preventing SQL Code Injection by Combining Static and Runtime Analysis, funded by Department of Homeland Security.
Anomaly and Misuse Detection in Network Traffic Streams -Checking and Machine Learning Approaches, funded by Office of Naval Research (ONR MURI).
Intrusion Detection Techniques for Mobile Ad Hoc Networks, funded by NSF.
CAREER: Adaptive Intrusion Detection Systems, funded by NSF.
Agile Security for Storing Sensitive and Critical Information, funded by NSF.
Guarding the Next Internet Frontier: Countering Denial of Information, funded by NSF.
Vulnerability Assessment Tools for Complex Information Networks, funded by Army Research Office (ARO MURI).
Cost-sensitive intrusion detection, funded by DARPA, 5/200-8/2003.

Technology Transfer Efforts

Co-founded Damballa in 2006, based on the botnet detection technologies developed by my research group.