The Unofficial Google Data Science Blog

Posts

Quantifying the statistical skills needed to be a Google Data Scientist

March 24, 2025

by DAVID MEASE and AMIR NAJMI What does someone need to know in order to be a successful data scientist at Google? This blog post shares a set of questions that were answered by Google data scientists and how they did. See how much you agree with the authors’ view of the importance of these questions in assessing practical data science ability. Defining "Data Scientist" If you look through job listings at Google for data scientists , you will find a role called “Data Scientist - Research” (DS-R for short). This role has several explicit requirements including statistical expertise, programming/ML, communication, data analysis/intuition. Focusing narrowly on the first of these, the description currently states that candidates “will bring scientific rigor and statistical methods to the challenges of product creation”. Internally, more detailed text descriptions exist to parallel these external descriptions. For this DS-R role at Google, part of the internal text description i...

Keep reading

Towards optimal experimentation in online systems

April 23, 2024

by CHRIS HAULK It is sometimes useful to think of a large-scale online system ( LSOS ) as an abstract system with parameters

$X$ affecting responses

$Y$ . Here,

$X$ is a vector of tuning parameters that control the system's operating characteristics (e.g. the weight given to Likes in our video recommendation algorithm) while

$Y$ is a vector of outcome measures such as different metrics of user experience (e.g., the fraction of video recommendations resulted in positive user experiences). If we wish to tune the system parameters

$X$ for optimal performance of

$Y$ , there are several challenges: The relationship between

$X$ and

$Y$ may be complex and poorly understood It may be impossible to simultaneously maximize every element of

$Y$ , requiring trade offs There may be hard constraints on the

$X$ and

$Y$ , either individually or in combination, that limit what we deem to be acceptable operating points for the system One approach to this problem is to experiment with one or two system p...

Keep reading