Leveraging Python and JAX in R workflows

Andrés Cruz (UT Austin)

NU Statistical Computing Workshop

Apr 22, 2026

Python won

  • (in ML/AI)
  • Are we missing out?

R vs Python

  • R: statistics/data analysis focused
    • Data wrangling
    • Plotting
    • Solid stats / econometrics / pol. methodology
  • Python: general-purpose
    • Inter-operability (e.g., API calls)
    • Cutting-edge AI/ML
  • Question: are LLMs better at R or Python?

R vs Python: LLM performance

  • 🐍 “LLMs Love Python” (Twist et al. 2026)

    • In language-agnostic queries, “Python accounts for 90-97% of generated solutions” (7)
  • ®️ AutoCodeBench performance (Chou et al. 2025, 6)

A compromise

  • Integrate tidbits of Python into our R workflows

What we’ll cover today

  1. Using Python from R

    • Intro to reticulate

    • Example: sentence-level semantic similarity

  2. Leveraging JAX, a high-performance Python library

    • Automatic differentiation

    • Example: sensitivity analysis for error propagation

1. Using Python from R

reticulate

  • A package to interact with Python from R

    • Manages Python packages and environments

    • Translates between R and Python objects

  • Everything happens in R. You write R code!

    • e.g., (1) R data prep; (2) Python snippet; (3) R analysis

What’s (usually) better in Python?

  • Commercial APIs

    • LLMs (Anthropic, OpenAI, Google)

    • OCR (Mistral)

    • Data download (Youtube, Google Maps)

  • Pre-trained models for image/video/text

  • Cutting-edge scientific computing

Example: embeddings

  • Working with image/video/text usually involves:
    1. Wrangling
    2. Numerical representation
    3. Analysis (stats, plots, etc.)
  • For step 2, we often want to use pre-trained encoder models to generate “embeddings”
    • Wide availability in Python, both closed- and open-source

Constitutional consultation (Cruz et al. 2023)

  • ~250k citizen submissions to the CL constitutional process

  • Compare submissions with topics and text from the world’s constitutions

    1. Wrangle data

    2. Embed text using multilanguage encoder

    3. Compute “semantic” similarities

Script: 1_embeddings.R

Translating Python to reticulate

  • The package automatically handles loops, indexing, and other tricky parts

  • R lists do a lot: e.g., package.module.function() becomes package$model$function()

  • In my experience, LLMs are good for short Python-to-reticulate translations

    • except for infrastructure (setting, exit)

2. Leveraging JAX from R

JAX

  • A high-performance Python package/ecosystem

  • By Google: powers some of their ML/AI

  • Good CPU and GPU performance

  • Key features:

    • Just-in-time-compilation

    • Automatic vectorization

    • Automatic differentiation

How to calculate derivatives?

  • Analytic differentiation

    • e.g., we know \(f(x)=x^2; f'(x)=2x\).
  • Numeric differentiation

    • Approximate derivative by guessing, guessing, guessing…
  • Automatic differentiation

    • Under the hood: boil functions down to their elementary operations; use the chain rule.

    • In JAX: differentiate (pretty much) arbitrary functions, e.g., most things in numpy (ex: marginaleffects).

Script: 2_jax.R (part one)

Example: sensitivity analysis

  • Examine the degree to which our assumptions affect our results

  • Often apply extreme pressure

  • Tools: differentiation, optimization

A sensitivity approach to measurement error

  • How much measurement error would invalidate a downstream estimate?