About

Skills
  • Programming Languages
    • Python, SQL, Bash, R, C, YAML, Jinja2, JavaScript/TypeScript, HTML, CSS, Visual Basic, MATLAB
  • Data Analysis Tools
    • NumPy, Pandas, Matplotlib, Seaborn, Plotly, Tableau, ArcGIS, Excel
  • Machine Learning Frameworks
    • Scikit-learn, PyTorch, TensorFlow, Keras, LangChain, OpenAI API (LLMs & embeddings), Vector Databases (Chroma), Prompt Engineering, Retrieval-Augmented Generation (RAG)
  • Data Engineering Tools
    • PySpark, Databricks, Delta Lake, dbt, Kafka, Snowflake, PostgreSQL, MongoDB, GraphQL, Hadoop, Apache Airflow, FastAPI microservices, Data Modeling (Bronze → Silver → Gold), Semantic Layer Design
  • Cloud, CRM, & Version Control
    • GCP (GCS, Dataproc, IAM), AWS (S3, EMR), Azure, Docker, Terraform (IaC), Git, GitHub Actions (CI/CD), REST APIs, React/TypeScript (Vite), Full-stack Integration (API + Vector DB + LLM + UI), Salesforce, SharePoint, LaTeX, Apricot
  • Soft Skills
    • Problem solving, detailed, team-oriented, business, scientific, and technical communication, ethical critical thinking, adaptable, leadership, conflict resolution

I’m a Data Engineer and researcher with a background in physics and philosophy, a combination that shapes how I approach every system I build: Analytical rigor, ethical intentionality, and a genuine commitment to the communities the data represents.

My work sits at the intersection of modern data engineering, scientific research, and social impact. I design cloud-native ETL/ELT pipelines, real-time streaming architectures, and AI-assisted analytics platforms that help mission-driven organizations turn fragmented, complex data into trustworthy, decision-ready insights. Recent work includes a real-time school climate and social vulnerability research platform investigating whether school climate indicators predict community vulnerability, independent citizen science research on SARS-CoV-2 RNA concentration patterns across 14 NYC wastewater sites, and peer-reviewed NLP research on large-scale image captioning datasets.

Technically, I specialize in Python, SQL, Bash, PySpark, Snowflake, Databricks, Kafka, dbt, and Airflow, with production experience across GCP, AWS, and Azure. I’ve designed FERPA and HIPAA-aligned data governance frameworks, built RAG services using LangChain and vector databases, and developed NLP and classification models in PyTorch and scikit-learn.

I also believe deeply in science communication and accessible education. I’ve taught cybersecurity to adult learners, ecology and environmental science in wilderness settings, and robotics and programming to students across all ages. Whether I’m building a data platform or facilitating a hands-on workshop, the goal is the same: Translate complexity into something that empowers people to understand and act on the world around them.

I care about data transparency, equity, and infrastructure that serves people rather than just processes. Whether I’m reconciling multi-source datasets to within 0.5% margin of error for grant reporting, migrating ten years of protected student records for social workers, or building dashboards that help leadership understand program impact across underserved communities, the through line is always the same: Data should strengthen trust and enable meaningful action.

Outside of engineering, you’ll find me running long distances, scuba diving, surfing, swimming, or practicing mindfulness. I’m always exploring unconventional learning pathways and curious, interdisciplinary conversations.

If you’d like to collaborate, brainstorm, or connect over data, AI, or community-focused innovation, feel free to reach out — I’d love to chat.