Kivan Polimis

Princeton • University of Washington • Data Science • Machine Learning • Population Research

I’m a data scientist and ML engineer. At Karna I build systems that inform public policy decisions. I consult independently through Atlas Analytics and hold a Regional Affiliate position at the University of Washington’s Center for the Study of Demography and Ecology.

My background is in demography and causal inference, and I spend most of my time building production ML systems. That combination tends to pull me toward problems where the statistical question and the engineering constraint are both hard: predicting patient risk in ways that hold up across demographic groups, building sports models that can update within minutes of a roster change, classifying financial transactions where the rare categories carry the highest cost of error.

Kivan Polimis

Selected Work

NBA analytics pipeline (private): End-to-end MLOps on AWS for real-time NBA performance prediction. The problem is non-stationarity. A player injury announced at 10 PM changes every subsequent game’s probability distribution. Monthly retraining isn’t fast enough.
forest-confidence-interval (open source): One of the original developers of scikit-learn-contrib/forest-confidence-interval and author of the JOSS paper. The package brings Stefan Wager’s infinitesimal jackknife approach — developed in his 2014 JMLR paper with Hastie and Efron and implemented in the randomForestCI R package — to Python’s scikit-learn ecosystem.
Paratransit routing (open source): Team member at Data Science for Social Good, on a team of five students and two data scientists building routing systems for riders with disabilities. The constraint set is harder than standard routing, with wheelchair requirements, medical time windows, and a population that can’t easily rebook if the system gets it wrong.
Financial transaction classification (open source): Fine-tuned BERT with a custom weighted loss function to categorize noisy financial text (truncated merchant names, payment processor prefixes). Class imbalance is the core problem. Legal Services transactions are 0.3% of volume but the highest-cost errors.

Writing

Technical Articles → Reproducible analyses, tutorials, and deep dives in Python and R.

Blog → Notes, reviews, and shorter pieces.

Contact: kivan.polimis@gmail.com

--- title: "Kivan Polimis" page-layout: full title-block-banner: false --- <p style="font-size: 0.78rem; color: #aaa; letter-spacing: 0.04em; margin-top: -0.5rem; margin-bottom: 2rem;">Princeton  •  University of Washington  •  Data Science  •  Machine Learning  •  Population Research </p> <div style="display: flex; flex-wrap: wrap; align-items: center; gap: 40px; margin-bottom: 40px;"> <div style="flex: 2; min-width: 300px;"> I'm a data scientist and ML engineer. At [Karna](https://karna.com) I build systems that inform public policy decisions. I consult independently through [Atlas Analytics](https://www.analyticsbyatlas.com/) and hold a Regional Affiliate position at the University of Washington's [Center for the Study of Demography and Ecology](https://csde.washington.edu). My background is in demography and causal inference, and I spend most of my time building production ML systems. That combination tends to pull me toward problems where the statistical question and the engineering constraint are both hard: predicting patient risk in ways that hold up across demographic groups, building sports models that can update within minutes of a roster change, classifying financial transactions where the rare categories carry the highest cost of error. </div> <div style="flex: 1; text-align: center; min-width: 250px;"> <img src="images/kivan_polimis_headshot.png" alt="Kivan Polimis" style="width: 250px; height: 250px; border-radius: 50%; object-fit: cover; border: 3px solid #eee;"> </div> </div> <hr> ### Selected Work - **NBA analytics pipeline (private):** End-to-end [MLOps](https://en.wikipedia.org/wiki/MLOps) on AWS for real-time NBA performance prediction. The problem is non-stationarity. A player injury announced at 10 PM changes every subsequent game's probability distribution. Monthly retraining isn't fast enough. - **forest-confidence-interval (open source):** One of the original developers of [scikit-learn-contrib/forest-confidence-interval](https://github.com/scikit-learn-contrib/forest-confidence-interval) and author of the [JOSS paper](https://joss.theoj.org/papers/10.21105/joss.00124). The package brings Stefan Wager's infinitesimal jackknife approach — developed in his [2014 JMLR paper](https://jmlr.org/papers/v15/wager14a.html) with Hastie and Efron and implemented in the [randomForestCI R package](https://github.com/swager/randomForestCI) — to Python's scikit-learn ecosystem. - **Paratransit routing (open source):** Team member at [Data Science for Social Good](https://github.com/DSSG-paratransit/main_repo), on a team of five students and two data scientists building routing systems for riders with disabilities. The constraint set is harder than standard routing, with wheelchair requirements, medical time windows, and a population that can't easily rebook if the system gets it wrong. - **Financial transaction classification (open source):** Fine-tuned [BERT](https://arxiv.org/abs/1810.04805) with a custom weighted loss function to categorize noisy financial text (truncated merchant names, payment processor prefixes). Class imbalance is the core problem. Legal Services transactions are 0.3% of volume but the highest-cost errors. <hr> ### Writing [**Technical Articles →**](technical.qmd) Reproducible analyses, tutorials, and deep dives in Python and R. [**Blog →**](blog.qmd) Notes, reviews, and shorter pieces. <hr> **Contact:** [kivan.polimis@gmail.com](mailto:kivan.polimis@gmail.com)