Z
Research Scientist
Zendrive
Jul'2019 - May'2021
Developed low-level Scala APIs and high-level Python APIs to enable large-scale GIS queries with GeoSpark,
using quad-tree partitioning. Achieved >5x faster speed than PostGIS queries on PostgreSQL database. This
has helped scale the enrichment of geospatial features, such as reverse geocoding, segmenting a trip based on
zones, etc on millions of trips. Geo data is a primary feature in insurance scoring.
- Leveraged the geo-platform above to refactor and scale an existent GeoPandas pipeline to predict stop-signs
on roads in the USA based on GPS trails near road intersections.
- Scraped geographic information like boundaries and roads of the entire world from OpenStreetMaps, converted
them into Scala-compatible formats, and designed a hierarchical storage to enable the geoplatform APIs.
- Automated large dataset generation and validation tasks - processing '00s of millions of rows - on Airflow.
- Migrating in-house libraries to be Python 3 compatible to enable using newer machine-learning frameworks.