You are viewing a preview of this job. Log in or register to view more details about this job.

Google Software Engineer, Machine Learning Performance

Minimum qualifications:

  • Bachelor's degree or equivalent practical experience.
  • Candidates will typically have experience with software development in one or more programming languages, or experience with an advanced degree.
  • Typically experience with data structures or algorithms.
  • Typically experience with machine learning algorithms and tools (e.g., TensorFlow), artificial intelligence, deep learning, or natural language processing.

Preferred qualifications:

  • Master's degree or PhD in Computer Science or a related technical field.
  • Experience with performance analysis and optimization, including system architecture, performance modeling, or similar.
  • Experience with distributed development and large-scale data processing.
  • Experience with compiler optimizations or related fields.

About the job

Google Cloud's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google Cloud's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. You will anticipate our customer needs and be empowered to act like an owner, take action and innovate. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will focus on Large Language Models (LLMs) (e.g., Google Deepmind Gemini, Bard, Search Magi, Cloud LLM APIs, etc.) performance analysis and optimizations.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Responsibilities

  • Identify and maintain LLM training and serving benchmarks that represent Google production, industry and Machine Learning community, identify performance opportunities and drive TensorFlow/JAX TPU performance toward state-of-the-art, and to gate TF/JAX releases.
  • Engage with Google Product teams to solve their LLM performance problems. Onboard new LLM models and products on Google new TPU hardware, enabling LLMs to train efficiently on large-scale.
  • Analyze performance and efficiency metrics to identify bottlenecks, design, and implement solutions.
  • Work with tooling and fleet metrics subteams to build tools to track performance and efficiency and to extract metrics from Google running workloads.
  • Explore model/data efficiency techniques, new machine learning model architecture/optimizer/training technique to solve a machine learning task, new techniques to reduce the label/unlabeled machine learning data needed to train a model.

If you have any extenuating circumstances or require further information about this role, please contact EmployAbility on +44 (0)7776 090 508 or +44 (0)7852 764684, alternatively email us info@employ-ability.org.uk

 


apply now