You are viewing a preview of this job. Log in or register to view more details about this job.

Data Engineer

bigspark, a growing consultancy with a focus on exciting technologies including Apache Spark, Apache Kafka and working on projects within AI, Machine Learning, Data Engineering and Event Streaming is looking for a technical consultant to join the team with skills in either Java, Scala, Python in a remote, work from home capacity with some attendance to client sites where necessary.


 

We provide the backbone for modern analytics to their clients through expertise in DevOps, distributed computing, AI and machine learning with the adoption of proven open source projects. We specialise in backend development, infrastructure automation and performance engineering for data workloads at scale.


 

Typical duties:

  • Interacting with senior stakeholders and programme management to understand client requirements
  • Solution Design - Using predefined architectural components to agree an acceptable design
  • Lead the delivery of software development on client site, often managing team members across the globe with varying skill levels. Building creative and cutting edge solutions for our clients
  • Helping bigspark build the next generation of AI back assets and accelerators to help drive value for our clients
     


 

Skills, experience and qualifications:

  • Educated to at least BSc (Hons) level - preferably in Computer Science, Mathematics or Physics
  • At least 2 years in a commercial software environment and experience in technology consulting
  • Big data stack experience - SQL, Hadoop, Spark, Hive/Impala, Kafka
  • Cloud experience - AWS (preferred), Azure
  • Linux expertise
  • Proficiency in object-oriented programming languages (C#, Java, Python, Scala).
  • Proficiency in scripting languages (Powershell, bash/sh/ksh)
  • Experience with database integrations (SQL/NoSQL) and processing data at scale
  • Understanding of source control management e.g. GitHub, GitLab
  • Experience in use of CI tools (Jenkins) or an understanding of their role
  • Flexible in learning and working with new technologies


 

For Data Engineering engagements the company would be particularly interested in the following experience:

  • Deep knowledge of the Hadoop ecosystem (Apache Spark, Hive metastore, YARN, HDFS, S3)
  • SQL expertise - modeling, advanced query techniques and performance optimisation
  • Cloud tooling - AWS EMR, AWS Glue, Databricks, Snowflake
  • Orchestration - Airflow, DBT
  • Python / Java / Scala knowledge - including relevant development frameworks
  • Open storage formats - Parquet, Iceberg, Delta
  • Search and caching - ElasticSearch, OpenSearch, Redis 


 

For Event Streaming engagements the company would be particularly interested in the following experience:

  • Deep knowledge of Apache Kafka (e.g. Confluent) - Kafka streams, Kafka connect, 
  • Knowledge of stream processing - Apache Flink and Spark Streaming
  • CDC technologies - IBM infosphere, Debezium
  • Java / Scala knowledge - including relevant development frameworks (e.g. Spring, Akka)
  • Knowledge of Streamsets Data Collector, StreamSets Transformer and Control Hub or equivalent tooling (e.g. Apache NiFi)


 

For Machine Learning / AI engagements the company would be particularly interested in the following experience

  • Proficiency in Machine Learning Frameworks and deep Python expertise
  • Spark ML including both batch and structured streaming
  • Python ML e.g. scikitlearn, shap, lightGBM/XGBoost and underlying mathematical theories
  • AWS Sagemaker or similar MLOps frameworks
     

In return you will receive:


 

  • Top end equipment - MacBook pro, 23 inch LED screen
  • Certified training
  • 2 mandatory certifications in the first 12 months (funded by bigspark)
  • Elective tracks of training thereafter in preferred topics
  • Competitive salary and benefits
  • Exposure to cutting edge technologies and open source projects
  • Remote, Work from Home position