Seoul Restaurant Survival
Of 528,000+ Seoul restaurants, only 35% reach 10 years. Where you open matters more than what you cook.
Overview
Median Seoul restaurant lifespan is 6.1 years. Jongno-gu reaches 10. Yangcheon-gu hits 4.7.
I pulled 528,000+ restaurant license records from Seoul's open-data portal and treated business lifespan as a survival problem. Kaplan-Meier curves, log-rank tests, and Cox proportional hazards regression isolated which covariates actually predict survival — and the answer wasn't cuisine.
Technologies
The Problem
"Don't open a Korean place in Gangnam, the market's saturated" — except is it?
Restaurant survival advice in Korea is mostly folklore: cuisine choice, neighborhood buzz, foot traffic. None of it is grounded in a long-tail dataset of actual openings and closures. Without survival analysis (which handles right-censoring properly), you can't separate "still open" from "lived a long life."
Product Vision
A district-and-cuisine survival explorer for anyone deciding where to open.
The end state is a public tool: pick a district + cuisine combination, see the Kaplan-Meier curve, the median life, and how the hazard compares to alternatives.
Censoring-aware survival analysis on 528K licenses
Records are right-censored at the dataset cutoff (active restaurants). The pipeline fits Kaplan-Meier per stratum (district × cuisine), runs log-rank tests for between-group differences, and uses a Cox PH model to quantify hazard ratios while controlling for confounders.
My Contribution
End-to-end: data acquisition through statistical write-up.
Built the entire pipeline solo — from pulling raw open-data CSVs to fitting models to producing the figures. The interesting work wasn't running the model; it was cleaning license records that had inconsistent district codes, then designing comparisons that actually answered the "where should I open" question.
What I worked on
- Cleaning 528K license records with inconsistent district codes and date formats
- Stratified Kaplan-Meier fits across district × cuisine combinations
- Pairwise log-rank tests with Bonferroni correction across 25 districts
- Cox proportional hazards regression controlling for cuisine, size, and opening year
- Matplotlib visualizations of survival curves with confidence bands
Key Achievements
[ Headline outcome — the one-line "this is what it delivered." ]
[ Slightly longer narrative on the most meaningful results. ]
Lessons Learned
[ The single biggest takeaway from this project. ]
[ What worked, what didn't, what you'd do differently next time. ]
Where It's Going
From a static notebook into an interactive survival explorer.
Next: a Streamlit front-end where users pick district + cuisine and get the curve. Past that: extend the methodology to other Korean metros and to non-restaurant small-business categories — the survival framework generalizes.