Kivan Polimis
  • Home
  • Technical Articles
  • Blog
  • Teaching
  • Software
  • Vita

On this page

  • Selected Work
  • Writing

Kivan Polimis

Princeton  •  University of Washington  •  Data Science  •  Machine Learning  •  Population Research

I’m a data scientist and ML engineer. At Karna I build systems that inform public policy decisions. I consult independently through Atlas Analytics and hold a Regional Affiliate position at the University of Washington’s Center for the Study of Demography and Ecology.

My background is in demography and causal inference, and I spend most of my time building production ML systems. That combination tends to pull me toward problems where the statistical question and the engineering constraint are both hard: predicting patient risk in ways that hold up across demographic groups, building sports models that can update within hours of a roster change, classifying financial transactions where the rare categories carry the highest cost of error.

Kivan Polimis


Selected Work

  • NBA analytics pipeline (private): End-to-end MLOps on AWS for real-time NBA performance prediction. The problem is non-stationarity. A player injury announced at 10 PM changes every subsequent game’s probability distribution. Monthly retraining isn’t fast enough.
  • forest-confidence-interval (open source): One of the original developers of scikit-learn-contrib/forest-confidence-interval and author of the JOSS paper. The package brings Stefan Wager’s infinitesimal jackknife approach — developed in his 2014 JMLR paper with Hastie and Efron and implemented in the randomForestCI R package — to Python’s scikit-learn ecosystem.
  • Paratransit routing (open source): Team member at Data Science for Social Good, on a team of five students and two data scientists building routing systems for riders with disabilities. The constraint set is harder than standard routing, with wheelchair requirements, medical time windows, and a population that can’t easily rebook if the system gets it wrong.
  • Financial transaction classification (open source): Fine-tuned BERT with a custom weighted loss function to categorize noisy financial text (truncated merchant names, payment processor prefixes). Class imbalance is the core problem. Legal Services transactions are 0.3% of volume but the highest-cost errors.

Writing

Technical Articles → Reproducible analyses, tutorials, and deep dives in Python and R.

Blog → Notes, reviews, and shorter pieces.


Contact: kivan.polimis@gmail.com

Source Code
---
title: "Kivan Polimis"
page-layout: full
title-block-banner: false
---

<p style="font-size: 0.78rem; color: #aaa; letter-spacing: 0.04em; margin-top: -0.5rem; margin-bottom: 2rem;">Princeton &nbsp;•&nbsp; University of Washington &nbsp;•&nbsp; Data Science &nbsp;•&nbsp; Machine Learning &nbsp;•&nbsp; Population Research </p>

<div style="display: flex; flex-wrap: wrap; align-items: center; gap: 40px; margin-bottom: 40px;">

<div style="flex: 2; min-width: 300px;">

I'm a data scientist and ML engineer. At [Karna](https://karna.com) I build systems that inform public policy decisions. I consult independently through [Atlas Analytics](https://www.analyticsbyatlas.com/) and hold a Regional Affiliate position at the University of Washington's [Center for the Study of Demography and Ecology](https://csde.washington.edu).

My background is in demography and causal inference, and I spend most of my time building production ML systems. That combination tends to pull me toward problems where the statistical question and the engineering constraint are both hard: predicting patient risk in ways that hold up across demographic groups, building sports models that can update within hours of a roster change, classifying financial transactions where the rare categories carry the highest cost of error.

</div>

<div style="flex: 1; text-align: center; min-width: 250px;">
<img src="images/kivan_polimis_headshot.png" alt="Kivan Polimis" style="width: 250px; height: 250px; border-radius: 50%; object-fit: cover; border: 3px solid #eee;">
</div>

</div>

<hr>

### Selected Work

- **NBA analytics pipeline (private):** End-to-end [MLOps](https://en.wikipedia.org/wiki/MLOps) on AWS for real-time NBA performance prediction. The problem is non-stationarity. A player injury announced at 10 PM changes every subsequent game's probability distribution. Monthly retraining isn't fast enough.
- **forest-confidence-interval (open source):** One of the original developers of [scikit-learn-contrib/forest-confidence-interval](https://github.com/scikit-learn-contrib/forest-confidence-interval) and author of the [JOSS paper](https://joss.theoj.org/papers/10.21105/joss.00124). The package brings Stefan Wager's infinitesimal jackknife approach — developed in his [2014 JMLR paper](https://jmlr.org/papers/v15/wager14a.html) with Hastie and Efron and implemented in the [randomForestCI R package](https://github.com/swager/randomForestCI) — to Python's scikit-learn ecosystem.
- **Paratransit routing (open source):** Team member at [Data Science for Social Good](https://github.com/DSSG-paratransit/main_repo), on a team of five students and two data scientists building routing systems for riders with disabilities. The constraint set is harder than standard routing, with wheelchair requirements, medical time windows, and a population that can't easily rebook if the system gets it wrong.
- **Financial transaction classification (open source):** Fine-tuned [BERT](https://arxiv.org/abs/1810.04805) with a custom weighted loss function to categorize noisy financial text (truncated merchant names, payment processor prefixes). Class imbalance is the core problem. Legal Services transactions are 0.3% of volume but the highest-cost errors.

<hr>

### Writing

[**Technical Articles →**](technical.qmd)
Reproducible analyses, tutorials, and deep dives in Python and R.

[**Blog →**](blog.qmd)
Notes, reviews, and shorter pieces.

<hr>

**Contact:** [kivan.polimis@gmail.com](mailto:kivan.polimis@gmail.com)
 

© Kivan Polimis. Built with Quarto.