Data Engineer
Shippo San Francisco $100k – $130k Long term Any
Flexible Hours
Health Insurance
About Shippo

Shippo lowers the barriers to shipping for businesses around the world. As free and fast shipping becomes the norm, better access to shipping is a competitive advantage for businesses. Through Shippo, e-commerce businesses, marketplaces, and platforms are able to connect to multiple shipping carriers around the world from one API and dashboard. Businesses can get shipping rates, print labels, automate international documents, track shipments, and facilitate returns. Internally, we think of Shippo as providing the building blocks of shipping.

Join us to build the foundations of something great, roll up your sleeves and get important work done everyday. Founded in 2013, we are a diverse set of individuals based out of San Francisco. Shippo’s investors include Bessemer Venture Partners, Union Square Ventures, Uncork Capital, VersionOne Ventures, FundersClub and others.

As a Data Engineer, you will be responsible for building systems to collect, process and store events at massive scale to gain operational and business insights into the performance and optimization of shipping services.

Job Description

Implement and maintain data extraction, processing and storage processes in large scale data systems (data pipelines, data warehouses) for internal and customer facing analytics, and reporting features. 

Implement and maintain machine learning systems (feature generation, learning, evaluation, publishing) primarily using Spark for our data scientists. 

Integrate data from various data sources, internal and external, to ensure consistency, quality, integrity, and availability of data sets and insights. 

Work closely with engineers, product managers, data scientists and data analysts to understand needs and requirements. 

Design, build and launch new data models and datasets in production. 

Define and manage SLA for datasets across the different storage layers. 

Maintain and improve existing systems and processes in production.

Requirements

2+ years working experience as a data engineer. 

Ability to implement ETL processes using batch and streaming frameworks such as Hadoop, HDFS, MapReduce and Spark. 

Work experience with RDBMS, such as PostgreSQL or MySQL, NoSQL and columnar data stores. 

Investigate, analyze, identify, and debug data related issues to ensure stability, quality, and integrity of datasets. 

Familiar with columnar data warehouse technologies, in particular Redshift. 

Understand business processes, overall application components, and how data is gathered; and design a data model that ties the application telemetry data to metadata, and transactional data. 

Build expertise and own data quality for various datasets. 

Fluent in scripting languages such as Python, Ruby or Perl. 

Collaborate with multiple teams in high visibility roles and own the solution end-to-end. 

Self-starter individual who truly enjoys a fast-paced, innovative software start-up environment with a focus on delivering business value in a teamwork centric environment, groundbreaking technology. 

Excellent written, oral communication, and presentation skills. 

BS or MS in Computer Science or related technical discipline or equivalent job experience.


PREFERRED

Building, monitoring, managing and maintaining large data processing pipelines using frameworks and patterns such as MapReduce, Spark and Pig; and distributed columnar data warehouses including but not limited to Redshift and Druid. 

Batch and streaming data transport using traditional ETL, AWS Kinesis and Kafka. 

Workflow management tools such as Airflow; and data serialization formats such as Avro and Parquet; ; and data modeling concepts, methodologies and best practices. 

Machine learning infrastructure such as Tensorflow or MXNet. 

Cloud environments and devops tools; working experience with AWS and its associated products.

Please reference you found the job on https://worfor.com as thank you to us, this helps us get more companies to post here!