Research

Work organized by lab and institution. Each entry outlines my contributions, methods, and future directions

DRAM Processing-in-Memory for Fully Homomorphic Encryption

SAFARI Research Group · ETH Zürich

Prof. Onur Mutlu · with Ismail Emir Yuksel, Mayank Kabra

May 2025 – December 2025

Designed DRAM-based processing-in-memory architectures that run fully homomorphic encryption directly in memory, cutting latency and energy for large-scale polynomial multiplication (4096-element polynomials across 131,072 ciphertexts).

Key Contributions

Designed and benchmarked three DRAM data-placement strategies (FIGARO, LISA+FIGARO, LISA+RowCopy), reaching 318× speedup over pure in-memory FIGARO and 6.06× lower latency than processor-centric baselines.
Cut memory energy 41.5× with RowClone-optimized polynomial-shift strategies that minimize costly inter-bank data movement across a 16-bank hierarchy.
Enabled concurrent processing of 131,072 ciphertexts by combining column-granularity (FIGARO) and hierarchical row-layout (LISA) in-memory paradigms with minimal row-buffer contention.
Extended Ramulator 2.0 and CipherMatch simulators with subarray-level access tracking and RowCopy latency models to validate architectural feasibility.

Multimodal Neuromorphic Digit Recognition (NeuTNN)

NeuroAI Computer Architecture Lab (NCAL) · Carnegie Mellon University

Prof. John Shen · with Shanmuga Venkatachalam, Liam Carden

Ongoing

Extended the NeuTNN architecture to classify visual and auditory digits jointly using only biologically-plausible R-STDP learning (no backpropagation), building an edge-ready system that holds stable multimodal representations under strict biological and hardware constraints.

Key Contributions

Lifted audio-only accuracy from 21% → 90.26% with a log-mel spike-encoding pipeline (per-bin median thresholding) — no changes to the architecture or learning rule.
Achieved 100% multimodal accuracy on a 72-segment model via a block-diagonal segment mask enforcing strict modality separation, stable across 2:1 and 1:1 visual-to-audio ratios.
Sustained 100% accuracy at 9.7% synaptic density (~460K of 4.75M synapses) through systematic pruning analysis, quantifying the efficiency relevant to NCAL's edge hardware targets.
Proposed a dynamic confidence-weighting mechanism — using winning body-potential magnitude as an inference-time reliability signal — to fix robustness under single-modality degradation, with no retraining or architecture change.

3D CNFET Accelerator — Pin Allocation & PCB Bring-Up

Nexus Research Group · Carnegie Mellon University

Tathagata Srimani

January 2026 – Present

Supported bring-up of a monolithic-3D CNFET accelerator within the lab's 3D Integration program, focused on pin allocation and PCB placement-and-routing for the chip's high-speed external test interface.

Key Contributions

Refined pin allocation across 220+ functional pins over six VHDCI connectors and chip-on-board routing, preserving signal grouping for dual instruction/data buses with zero routing conflicts on the mapped plan.
Applied SerDes and differential-pair PCB constraints for GHz+ signaling — 100Ω ±10% impedance, <50ps inter-pair skew, 3× spacing for crosstalk isolation.
Supported DFT pin planning across scan domains and 15+ redundancy configuration pins to preserve yield and diagnostic coverage across VHDCI channels.
Applied split-voltage power-distribution and ground-stitching constraints (250µm perimeter via spacing, <5% droop targets) for signal integrity under high-speed switching.

Hardware-Aware Sensor Placement for Autonomous Vehicles

CMU ECE — Cyber-Physical Systems (course research)

Embedded Systems: CPS Design

Spring 2026

Optimizes which sensor — lidar, radar, camera, or disabled — fills each of five fixed AV chassis slots under a target SoC's ingest-bandwidth budget, balancing detection rate, time-to-detect, and cost.

Key Contributions(Phase 1 of 4 complete)

Reframed the problem as hardware/software co-design where the "software" is the sensor suite and the binding constraint is ingest bandwidth (not memory bandwidth, which never bound in v1) — making the hardware-aware claim substantive.
Built a CMA-ES optimizer over a continuous relaxation of the 5-slot assignment, with candidates evaluated in parallel MuJoCo ray-cast simulations honoring per-sensor FOV, range, and latency.
Defined the objective L = w_acc·detection + w_lat·latency + w_cost·cost with a hard ingest-bandwidth cap and two weight presets (safety_first, efficiency).
Delivered the Phase 1 evaluation infrastructure: spawn-safe parallel workers, Common Random Numbers for fair within-generation comparison, provenance logging (git SHA, MuJoCo version, host), and declarative RUN_CONFIGS for scenario/platform swaps.

Planned Work & Future Directions

Phase 2 — Reliability: 30-seed bootstrap confidence intervals; four scenarios (straight, urban-cluttered, highway-speed, adversarial-blind); sensor-noise models for rain (lidar), glare (camera), and multipath ghosts (radar).
Phase 3 — Ablations & SoC sweep: m_lidar, bandwidth-cap, and w_cost sweeps; a five-platform SoC sweep yielding a "bandwidth needed for a 4-sensor AV stack" figure.
Phase 4 — Manuscript: replace point estimates with CI-bracketed results and add a hardware-co-design section grounded in the SoC sweep.