Home

Pieces worth seeing.

001 / quantabot.py DA
2024 – 2025 · UM Cutaneous Lab

Quanta

A LangChain RAG system for source-grounded dermatology literature review. Layered retrieval (semantic + BM25 + MMR + cross-encoder), F1/NDCG gains over the vector-similarity baseline, ~97% memory reduction.

Python LangChain GPT-4 ChromaDB
002 / seoul_restaurants.ipynb DA
2024 · 25 districts · 528K records

Seoul Restaurant Survival Analysis

Survival analysis of 528,000+ restaurant license records across Seoul's 25 districts. Kaplan-Meier curves, log-rank tests, Cox proportional hazards regression. Median lifespan 6.1 years; only 35% reach 10 — location matters more than cuisine.

Python pandas lifelines Matplotlib
003 / military_career_analysis.ipynb DA
2026 · YP2021 panel · longitudinal

Military Career Longitudinal Analysis

Examining how military service relates to career development and certainty using the multi-wave YP2021 panel — fixed-effects, group comparisons, trajectories over time.

Python pandas statsmodels
004 / visa_perm_analysis.py DA
2025 · DOL PERM disclosure

Visa PERM Data Analysis

Slicing U.S. labor certification data to surface trends in employer-sponsored work visas — who sponsors, in which roles, at what wages, across years.

Python pandas Matplotlib
005 / korean_price_monitor.py DE
2025 · KOSTAT × ECOS · ongoing

Korean Price Monitor Pipeline

Unified pipeline ingesting Korean consumer prices from KOSTAT (weekly XML) and ECOS (monthly JSON CPI). 6 quality checks per run, schema-drift detection, raw-data preservation — comparing what consumers actually pay vs. what the official inflation index reports.

Python Docker cron Streamlit
006 / pheweb_beta_matrix.py DE
2025 · UM Medical School · genomics

PheWeb β-Matrix Builder

Parallel ETL crawling GWAS summary statistics from PheWeb instances (MGI-BioVU, UKB-TOPMed, MGI) and constructing a sign-consistent variant × phenotype β-matrix for downstream comorbidity research.

Python pandas scikit-learn
007 / cdc_event_pipeline.py DE
2026 · streaming · multi-DB

CDC Event Pipeline

Change Data Capture event streaming with Kafka — real-time data synchronization from a source DB into multiple analytical sinks, with replay and idempotency.

Python Kafka PostgreSQL Docker
008 / food_ordering_platform.tsx SWE
2024 · real-time · 600+ transactions

Real-Time Food Ordering Platform

Full-stack ordering system built from scratch for pop-up events. Scalable backend supporting 150+ concurrent users and 3,000+ orders; 600+ live Stripe transactions; real-time tracking via Socket.io cut manual labor ~50%.

Next.js Flask Stripe Socket.io
009 / subway_route_optimization.tsx SWE
2026 · graph algorithms · web app

Subway Route Optimization

Computing the optimal subway path with classic graph algorithms (Dijkstra / A*), surfaced as a Next.js web app that visualizes the route over the live network.

TypeScript Next.js React

More coming soon. Get in touch if you'd like to chat about any of these.