Home

A few paragraphs.

As a 2nd year Data Science & Statistics student at University of Michigan, few of things I've shipped include a RAG system for dermatology literature, a CDC pipeline streaming Postgres changes through Kafka, a B2C AI copywriting product, and full-stack tools used by my own student community. Most of these started from a problem I'd actually run into: a gap in a tool I needed, a workflow that felt unnecessarily painful, etc. I build to figure out the answer, and I care about getting it right rather than just shipping something that looks right.

I treat working with AI as a system to design, not just a tool to prompt. That means tracking what changes when new primitives land - MCP servers, agent skills, hooks, context engineering - and adjusting how I work rather. I believe the interesting frontier isn't writing better prompts, but deciding what context the model sees, what tools it can reach, and where human judgment stays in the loop.

Right now, I'm splitting time between a research lab at Seoul National University - working on network propagation score cutoffs in genetic comorbidity analysis, and a handful of independent projects on questions I'd like answered (most recently, whether mandatory military service measurably shifts career trajectories); both gives the room to chase for questions. Beyond that, I try to stay close to the broader community - attending conferences (ex. ICML, KSID, Coex STK, etc.) and joining groups where I can share what I'm learning and pick up perspectives from people working on different problems.

Background.

B.S. Data Science & Statistics @ University of Michigan, Ann Arbor (C/O 2030)

F-1 student & CPT/OPT-authorized for US internships. Open to relocate.

Skills.

languages
Python C++ JavaScript TypeScript SQL HTML/CSS
data & ml
pandas scikit-learn NumPy Jupyter Matplotlib Seaborn LangChain ChromaDB Kafka Debezium
infra
Docker AWS Firebase Git Vercel PostgreSQL
apps & viz
Streamlit Next.js React TailwindCSS Amplitude
other
Excel LaTeX

Experience.

Undergraduate Researcher · Seoul National University (BH)
5.2026 — Present · Seoul, KR
Network Propagation Genetic Comorbidity Statistical Genetics
  • Will be updated soon.
Undergraduate Researcher · UM Medical School
6.2025 — Present · Remote
Python Matrix Factorization ETL
  • Built a parallel ETL pulling PheWAS GWAS data from multiple PheWeb instances (MGI-BioVU, UKB-TOPMed, MGI); schema normalization and allele orientation alignment to keep β-signs consistent across sources.
  • Constructed a sign-consistent variant × phenotype β-matrix (~1M rows × ~1.4K cols).
  • Applied truncated SVD (k = 50, ~96% variance retained) for a shared 50-dim latent space; validated against 200+ known comorbidity pairs.
Undergraduate Researcher · UM Cutaneous Lab
9.2024 — 12.2025 · Ann Arbor, MI
LangChain Sentence-BERT ChromaDB RAG
  • Built Quanta, an advanced LangChain RAG system for source-grounded dermatology literature review (Python, Sentence-BERT, GPT-4, ChromaDB, Streamlit).
  • Designed a layered retrieval pipeline (query expansion → semantic + BM25 → MMR → cross-encoder reranking → parent-doc) — F1 (p = 0.0165) and NDCG (p = 0.004) gains over the vector-similarity baseline; ~97% memory reduction.
  • Tested across 55+ manuscripts and 10+ clinicians; cut literature-review time by ~75%.
  • Preprint on bioRxiv (2025.08.14.670384v1); abstract accepted at KSID × ISID APAC 2026 — Young Investigator Selection.
Data Engineer / Analyst Intern · Maetel
5.2025 — 9.2025 · Seoul, KR
Funnel Analysis Data Analytics Amplitude
  • Built Syndy.ai, an AI-native LinkedIn copywriting assistant — drove 500+ signups in 48h via Product Hunt; integrated LLM APIs and few-shot prompt engineering to capture user persona.
  • Mined behavioral data from 3,000+ user sessions via Amplitude; identified note-creation friction points and features most correlated with retention and informed product changes that lifted activation-to-retention by 22%.
  • Designed a 3-layer document ingestion architecture with auto-diff for schema-level persona detection (Persona Chat), and an LLM-prompted lead scoring engine ranking prospects against client ICP criteria.

Projects.