ARTIFICIAL INTELLIGENCE LEAD ENGINEER @ UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGNENTERPRISE DATA STRATEGY

AI-Driven Faculty Research Classification

Developed an agentic MVP using LangGraph and Gemini to categorize unstructured university-wide research descriptions into standardized taxonomies.

01. The Business Challenge

At a massive research institution like the University of Illinois, tracking and mapping faculty research is an operational hurdle. With hundreds of distinct majors and thousands of researchers, information was fundamentally siloed.

The Ambiguity of Interdisciplinary Data

While research is increasingly interdisciplinary, the university lacked a unified system capable of interpreting cross-functional interests. Faculty profiles consisted of unstructured, free-form text, making programmatic indexing or cross-departmental collaboration nearly impossible.

LangGraph

Stateful, cyclic workflow

Pydantic

Strict taxonomy validation

Gemini Flash

Optimized API resource usage

02. Pipeline Architecture

To solve this, I engineered an automated, agentic pipeline that accepts natural language inputs and guarantees validated, standardized research classifications.

Input Processing

Description Cleaner

Normalizes raw, unstructured faculty text inputs before passing them into the classification chain.

Agentic Logic

Tiered Classification

Moves sequentially through a High-Level Field Classifier into a granular Subfield Classifier, triggering a Subfield Proposer for edge cases.

Output

Strict Validation

Output must pass rigorous Pydantic validation to ensure exact alignment with official university taxonomy.

03. Strategic Trade-Offs

Framework: LangGraph vs. Langchain

Opted for LangGraph over standard linear Langchain. This allowed us to build the stateful, cyclic graphs required to implement strict validation gates and iterative workflow loops.

Model Selection: Gemini Flash

Strategically deployed the Gemini Flash model to mitigate the risk of hitting rate limits associated with Pro tiers, effectively balancing computational performance with resource constraints.

Human-in-the-Loop Integration

Designed a "Taxonomy Curator" function to allow for human review of edge-case proposed additions. This managed stakeholder expectations by retaining academic oversight while ensuring long-term systemic accuracy.

04. Outcome & Impact

Institutional Standardization

Provided university administration with a structured methodology to organize faculty research, replacing inconsistent departmental terminology with a shared, scalable classification language.

Cross-Departmental Connectivity

Enabled the university to accurately map individual research profiles, successfully breaking down information silos to facilitate interdisciplinary connections across the campus ecosystem.