You are viewing a preview of this job. Log in or register to view more details about this job.

Data Engineer

bigspark, a growing consultancy with a focus on exciting technologies including Apache Spark, Apache Kafka and working on projects within AI, Machine Learning, Data Engineering and Event Streaming is looking for a technical consultant to join the team with skills in either Java, Scala, Python in a remote, work from home capacity with some attendance to client sites where necessary.

We provide the backbone for modern analytics to their clients through expertise in DevOps, distributed computing, AI and machine learning with the adoption of proven open source projects. We specialise in backend development, infrastructure automation and performance engineering for data workloads at scale.

Typical duties:

Interacting with senior stakeholders and programme management to understand client requirements
Solution Design - Using predefined architectural components to agree an acceptable design
Lead the delivery of software development on client site, often managing team members across the globe with varying skill levels. Building creative and cutting edge solutions for our clients
Helping bigspark build the next generation of AI back assets and accelerators to help drive value for our clients

Skills, experience and qualifications:

Educated to at least BSc (Hons) level - preferably in Computer Science, Mathematics or Physics
At least 2 years in a commercial software environment and experience in technology consulting
Big data stack experience - SQL, Hadoop, Spark, Hive/Impala, Kafka
Cloud experience - AWS (preferred), Azure
Linux expertise
Proficiency in object-oriented programming languages (C#, Java, Python, Scala).
Proficiency in scripting languages (Powershell, bash/sh/ksh)
Experience with database integrations (SQL/NoSQL) and processing data at scale
Understanding of source control management e.g. GitHub, GitLab
Experience in use of CI tools (Jenkins) or an understanding of their role
Flexible in learning and working with new technologies

For Data Engineering engagements the company would be particularly interested in the following experience:

Deep knowledge of the Hadoop ecosystem (Apache Spark, Hive metastore, YARN, HDFS, S3)
SQL expertise - modeling, advanced query techniques and performance optimisation
Cloud tooling - AWS EMR, AWS Glue, Databricks, Snowflake
Orchestration - Airflow, DBT
Python / Java / Scala knowledge - including relevant development frameworks
Open storage formats - Parquet, Iceberg, Delta
Search and caching - ElasticSearch, OpenSearch, Redis

For Event Streaming engagements the company would be particularly interested in the following experience:

Deep knowledge of Apache Kafka (e.g. Confluent) - Kafka streams, Kafka connect,
Knowledge of stream processing - Apache Flink and Spark Streaming
CDC technologies - IBM infosphere, Debezium
Java / Scala knowledge - including relevant development frameworks (e.g. Spring, Akka)
Knowledge of Streamsets Data Collector, StreamSets Transformer and Control Hub or equivalent tooling (e.g. Apache NiFi)

For Machine Learning / AI engagements the company would be particularly interested in the following experience

Proficiency in Machine Learning Frameworks and deep Python expertise
Spark ML including both batch and structured streaming
Python ML e.g. scikitlearn, shap, lightGBM/XGBoost and underlying mathematical theories
AWS Sagemaker or similar MLOps frameworks

In return you will receive:

Top end equipment - MacBook pro, 23 inch LED screen
Certified training
2 mandatory certifications in the first 12 months (funded by bigspark)
Elective tracks of training thereafter in preferred topics
Competitive salary and benefits
Exposure to cutting edge technologies and open source projects
Remote, Work from Home position