Dillon Shearer
Data Engineer
Data engineer with 4+ years building production pipelines, standardizing multi-source datasets, and automating data operations in regulated healthcare environments. Architected ETL workflows processing millions of clinical and genomic records, built automation that saves 15+ hours weekly, and maintain data infrastructure serving 470+ researchers. Strong foundation in Python, SQL, and Azure cloud services with hands-on OMOP CDM and healthcare terminology mapping.
Technical Skills
Experience
Data Scientist
Answer ALS - Remote
Feb 2022 - Present
- Architected and maintain ETL pipelines processing 1,200+ participant records from 9 disparate clinical sources, unifying data into a standardized master index that powers the research data portal.
- Built automated user management pipeline running 3x weekly, eliminating 15+ hours of manual processing through Python scripts handling agreement tracking, renewal monitoring, and committee reporting.
- Engineered OMOP Common Data Model transformations mapping raw clinical data to SNOMED CT, LOINC, and RxNorm terminologies, enabling cross-institutional research compatibility.
- Designed and implemented Azure data infrastructure including blob storage organization, automated file inventory systems, and cross-source validation pipelines for clinical data releases.
- Created Looker semantic layer with intermediate tables standardizing external dataset curation, reducing analyst onboarding time and ensuring consistent metric definitions.
- Own the data dictionary across 9 source systems, validating schema changes per release cycle and maintaining documentation that serves as the source of truth for 470+ researchers.
Data Analyst
Equity Quotient - Remote
Sep 2022 - Apr 2023
- Designed and built Snowflake data models ingesting 12 datasets with 150+ standardized fields, powering 7 client dashboards and 30 KPI widgets with sub-second query performance.
- Engineered ETL pipelines processing 15M+ rows of U.S. Census and HMDA data, creating tract-to-county crosswalks and pre-aggregated tables that reduced dashboard load times by 35%.
- Built SQL views and staging tables with documented transformation logic, establishing patterns adopted by data engineering for production workflows.
Data Standards Intern
RARE-X - Remote
Jun 2021 - Feb 2022
- Mapped internal data structures to standardized health terminologies, ensuring interoperability across rare disease datasets and external research platforms.
- Built JSON schemas for the Data Collection Platform defining field types, validation rules, and entity relationships that enforced data quality at ingestion.
Software Quality Assurance Intern
Across Healthcare - Atlanta, GA
May 2021 - Feb 2022
- Integrated survey response data into normalized database structures, handling schema mapping and data migration for clinical data collection workflows.
- Created and maintained data dictionaries ensuring compliance with healthcare data standards (HIPAA, HL7) across the platform.
Key Projects
Variant Reporting Application
Production genomic variant reporting tool serving ALS researchers. Built with Panel (Bokeh) and backed by Azure SQL pipelines with BCP bulk loading. Processes 50K+ variants across 939 participants with millions of genome calls, replacing manual workflows with on-demand report generation.
DUA Tracking System
Automation platform managing 470+ data use agreements for research compliance. Handles user lifecycle tracking, generates committee reports, monitors renewal deadlines, and maintains audit trails. Saves 15+ hours weekly of manual administrative work.
Education
Bachelor of Business Administration, Management Information Systems
University of West Georgia
2022
Certifications
Protecting Human Research Participants
PHRP Online Training, Inc. | ID: 3004648
Apr 2025