CS · CORNELL

Saiakhil
Chilaka

Building full stack products and ML systems for people to use.

HOLD & DRAG TO INTERACT

01 — ABOUT

Builder. Researcher.

I'm a CS student at Cornell focused on AI software engineering — building products where the intelligence is the feature, not an afterthought.

I've shipped four products end-to-end: a vehicle marketplace, a recruitment automation platform, a private aviation ops dashboard, and a GNN-powered ingredient pairing app. At Cornell, I'm a part of Generative AI @ Cornell and Cornell Data & Strategy, where I build full stack AI applications and help upstate NY based companies make data driven decisions through developing ML models.

I also do research — a first-author IEEE paper on embedded ML systems, a webcam-based cognitive load mapper at 3.7 MAE across 86 users. But the goal is always the same: something that works in production, for real people.

TECHNICAL SKILLS

Languages

Python

TypeScript

JavaScript

Java

SQL

ML / AI

PyTorch

TensorFlow

scikit-learn

XGBoost

LightGBM

OpenCV

MediaPipe

pandas

Frontend

React

Next.js

Tailwind CSS

Three.js

Claude API

Backend & Infra

FastAPI

Node.js

PostgreSQL

Redis

Supabase

AWS

Docker

Celery

CAMPUS

DeveloperGenerative AI at Cornell

Technology Implementation AssociateCornell Data & Strategy

COURSEWORK

Coursework

Data Structures & Algorithms, Functional Programming, Discrete Math, Linear Algebra, Multivariable Calculus, Statistical Theory & Application

02 — PROJECTS

Selected
Work

03 — EXPERIENCE

Timeline

WORK

Full-stack engineering at a national infrastructure and environmental consulting firm — building internal tools, data pipelines, and client-facing applications.

—Designed a PDF-to-form pipeline for human-car crash reports, cutting analyst time by 66% over 10k+ crashes annually.
—Engineered a RAG-style LLM classification layer using Claude API with methodology docs as context, automating crash type and location coding at ~90% accuracy and passing outputs into a PyQt6 GUI for Playwright-driven form automation.

Building and deploying AI solutions for enterprise clients, focusing on LLM integration, prompt engineering, and production rollout of AI-powered features.

—Built a multi-stage GPT-4 pipeline that generates synthetic research personas & simulates real-time interviews with context-aware trait injection, replacing 2–3 week manual testing cycles with synthesized testing data in under 5 minutes.
—Engineered a serverless backend across 20+ Supabase Edge Functions with a 3-stage LLM pipeline (synthesis, compression, & generation), cutting per-request token usage by 40% through dynamic context selection and summary compression.

Developed and evaluated machine-learning models for health applications, contributing to data pipelines, model optimization, and experimental analysis.

Conducted topic modeling and large-scale text analysis on academic and policy corpora, using NLP techniques to study citation networks, thematic evolution, and scholarly influence.

—Used NLP & Latent Dirichlet Allocation to model topic distributions across citation networks of up to 5 research papers.
—Achieved 0.71 coherence and 0.88 diversity scores, & used similarity scores to recommend relevant future research topics.

Published ML research on embedded medical systems and built real-time cognitive load inference from webcam input.

—Published a paper as first author in IEEE Xplore about a CNN ensemble framework for Multiple Sclerosis diagnosis on embedded systems under Dr. Zishan Guo, using knowledge distillation; achieved 97.5% accuracy & 97.8% AUC-ROC.
—Improved accuracy by 19%, demographic fairness by 37%, & latency by 32% on a skin cancer classification framework for embedded devices with knowledge distillation, generative adversarial networks, & Gaussian white noise augmentation.

CLUBS

Leading the engineering team building AI-powered applications and research projects for one of Cornell's premier AI student organizations.

—Built an outreach pipeline surfacing 10k+ startup leads from 5 sources for job seekers, reducing outreach time by 95%.
—Engineered an agentic Claude API pipeline to scrape leads & generate 1k+ personalized, humanized cold emails/month.
—Implemented a PostgreSQL schema for lead management & Redis/BullMQ job queues for scalable background processing.

Technology consultant advising client organizations across industries on data strategy, tooling, and technical implementation.

—Built a multi-class ML classifier using LightGBM to disambiguate 37.6% of unresolved payer records across Albany Med's hospital system, enabling the first reliable estimate of government payer exposure across 100k+ discharge records.
—Trained a gradient boosting regressor on 210k SPARCS discharge records to predict inpatient costs by payer type, achieving R²=0.90 on held-out data; applied counterfactual simulation to quantify payer-mix revenue gaps across competitors.
—Modeled asset demand for Sciencecenter's $12.3M portfolio by computing Markov transition probabilities over a location graph spanning 50+ East Coast sites, enabling data-driven resource reallocation & optimizing asset utilization.

04 — CONTACT

Let's talk

Open to interesting projects and opportunities.

GITHUB↗LINKEDIN↗

SaiakhilChilaka

Builder. Researcher.

SelectedWork

Timeline

Let's talk

Saiakhil
Chilaka

Selected
Work