With steadily growing interest in Data Science studies, one question that is commonly asked is, 'What does a Data Scientist actually do?’ We at GreyAtom demystify the ambiguity around the discipline, by presenting a transparent look into the day-to-day activities of Data Scientists in our blog series: A Day in the life of a Data Scientist.
This month, we had the opportunity to meet Tristan Bergh, Data Scientist at aYo Holdings. He had some clear and on-point insights to give us and it was a pleasure interviewing him.
Tristan Bergh - Data Scientist, aYo Holdings
What are the different business problems that you solve with the help of Data Science on a day-to-day basis?
Every day presents and demands different problems and solutions, but to broadly classify, below are the most common:
- Communications uplift modeling
- Insurance product propensity modeling
- Big data processing
What does your typical workday look like? How do you distribute your time across different activities?
The day typically starts out by attending a team stand up meeting, often remotely, followed by any order of the following processes as the day demands:
- Check that daily data ingestions completed successfully
- Build new data ingestion, using cloud-specific tools
- Add to my data pre-processing code base, in PySpark, to create optimised data sets
- Run cluster-based feature selection and autoML on preprocessed data
- Run Databricks notebooks using the models I have built to build operational datasets
- Assist colleagues in their data processing in PySpark
- Assist colleagues in their reporting code in PySpark
- Set up business value strategies, for marketing and operations, as we parse our data sets to build a single view of customer data model.
- Integrate with other business data sources on an adhoc basis
- Interview and hire data scientists
Which teams and stakeholders do you work with?
My work usually revolves around:
- Heads of Customer Value Management, Marketing and Business Operations
- Data management
- Core infotech systems and database administrators
How do you build your own capabilities and those of your team? How do you learn newer technologies?
- Setting up learning arcs and development plans for ourselves
- Assisting and mentoring
- Learning, always learning from my colleagues
- Following multiple Data Science and interest accounts on social media
- Reading about state-of-the-art techniques in Data Science
- Trying out new libraries in R and Python
- Reaching out to and assisting a network of colleagues and others I have worked with or met at events, my network of folks I am good friends with, so that I learn from them and also offer help to them
- Visiting kaggle and trying out the competitions, building by submissions, reading the kernel discussions
What are the most rewarding/ frustrating moments in your journey as a data scientist?
When a model tests and calibrates really well on all the test data and out of time, that to a Data scientist is hooray moment.
It can be really frustrating when a Business Executive or Manager ignores our work, delays on deployment and does not implement data-driven methodologies. Also, it is a pain when data sources prove very difficult and excessively time consuming to ingest.
What according to you are typical behavioural and technical traits needed in a data scientist?
Behavioural: Curiosity, playful joy at exploring data; happy to find out negative as well as positive results, i.e. features a, b, c and d are NOT useful; team-oriented, in that a data scientist recognises that a diverse, rich set of perspectives are optimal for delivering robust, ethical and valuable models and processes to business.
- Ability to solve problems by code
- Basic knowledge of programming elements in any language
- Strong understanding of the limitations of statistics, models and analytics
- Basic understanding of distributed systems architecture
- Basic understanding of transactional business systems
- Understanding of techniques to access data tables
- Understanding of what supervised and unsupervised models are
- Being able to decide on which algorithms are suited for which problems
- Selecting models for performance, especially within the limitations of each metric applied
- Understanding of model performance testing on both in-time hold out data and also out-of-time data sets
- Knowledge of the full life-cycle of model deployment, including initial limitations, dataset applicability, monitoring of model impact, return of results into model refresh processes and ongoing refinement and enrichment of models
Interested in being featured is our series? Write a comment below, or drop us a line, and we'll be in touch!