What an analytics engineer is. Pretty much ????So, is an analytics engineer just a buzzword or a new distinct role in data-driven landscapes?
In this post, we’ll explain the main responsibilities, and list the skills that should be under the belt of this data Jedi. We’ll also discuss how your existing specialists can transition into the new role and whether you need this new role on your data science team in the first place.
For more detailed information on data science team roles, check our video
What is an analytics engineer?An analytics engineer is a modern data team member that is responsible for modeling data to provide clean, accurate datasets so that different users within the company can work with them. Their role entails transforming, testing, and documenting data.
In addition to understanding data and how it is going to be used, an analytics engineer has to be pretty tech-savvy to apply software engineering best practices to the analytics.
There's no question that an analytics engineer is officially a thing. But it’s worth explaining what changes have led to the creation of the new role and comparing it to other positions on data teams.
The story behind a new data job: Analytics engineer vs data engineer vs data analyst vs data scientist vs ML engineerThe data management landscape has been changing dramatically in recent years. If we take the more traditional approach to data-related jobs used by larger companies, there are different specialists doing narrowly-focused tasks on different sides of the project.
Here’s the video explaining how data engineers work
- Data analysts are responsible for building reports and dashboards on top of pre-processed data and drawing out insights from it. They work with Excel, SQL code, and analytics tools to perform ad-hoc analyses and forecasting.
- Data scientists also sit on the side of data analysis and business logic, but this side is more sophisticated. They commonly prepare data and build machine learning (ML) models. A big chunk of their work includes helping businesses get better insights and make predictions based on data. Such specialists use Python and programming languages for statistical analysis like R and SAS.
Get acquainted with how data is prepared for machine learning projects in our dedicated video
- Machine learning (ML) engineers combine software engineering expertise and machine learning. Unlike data scientists, they don’t normally work with analytical tasks but focus on training ML models and deploying them to the production environment.
Learn more about what a machine learning engineer is in our videoThe development of new tools and the improvement of old ones has led to the transformation of the traditional data roles, simplifying some job aspects and complicating other ones.
Here are the key changes:
- the advancement of cloud data warehouses (like Snowflake, Redshift, BigQuery) that rely on MPP (Multiple Parallel Processing);
- the shift to the ELT approach, meaning the data is loaded into a warehouse before it's transformed into any useful information;
- the arrival of SaaS tools like Fivetran, Stitch, and Hevo to automatically integrate data from different sources;
- the introduction of multi-functional BI tools like Mode and Looker; and
- the presentation of the unique transformation layer – dbt (data build tool) – that can be created on top of modern data warehouses.
As tech know-how increases, the data roles get changed and combined. Companies start hiring that one data guy or gal that wears all the hats — knows the complete data stack from both the engineering and analytics sides. Such specialists can use SaaS platforms to continuously extract the underlying data from existing databases and centralize it in cloud data warehouses. They can perform complex data transformations in SQL using dbt. They know Python to handle some data orchestration services. And of course, they can analyze data and build reports in BI tools.
These multi-tasking specialists have begun being called analytics engineers.
Data roles comparedThe credit for coining the new role goes to Michael Kaminsky, a former Director of Analytics at Harry’s Grooming and a founder of Recast, who wrote the article about analytics engineering on the Locally Optimistic blog in 2019. This way of thinking was accepted by the dbt community, which contributed to the popularity of the term.
As Michael Kaminsky describes the role, “The analytics engineer sits at the intersection of the skill sets of data scientists, analysts, and data engineers. They bring a formal and rigorous software engineering practice to the efforts of analysts and data scientists, and they bring an analytical and business-outcomes mindset to the efforts of data engineering. It’s their job to build tools and infrastructure to support the efforts of the analytics and data team as a whole.”
Let’s get into more detail about the responsibilities of analytics engineers.
Analytics engineer duties and responsibilitiesAn analytics engineer role blurs the line between technology and business, so the responsibilities may differ greatly from company to company. Below we list the core duties that this data specialist may undertake.
Data modeling. One of the core responsibilities of an analytics engineer is to model raw data into clean, tested, and reusable datasets. As such, they make it easier for business analysts and other stakeholders to view and understand data in a data warehouse or database. Data modeling is the process of building visual representations of data and communicating connections between different information points and structures. Since data models are created around business needs, the job of analytics engineers is to define the rules and requirements for the formats and attributes of data.
Data transformation. Since not all information can be useful as is, analytics engineers need to apply various transformations to different data pieces to ensure they correspond to given tasks.
The ELT paradigm allows for loading raw data right into a cloud warehouse, data lake, or lakehouse, so transformations can happen afterward.
Transformations may include
- removing inaccurate or corrupted data;
- aggregating data items into a summarized version;
- filtering information to get rid of irrelevant, duplicated, or overly sensitive data;
- joining two or more database tables by their matching attributes; and
- splitting a single column into multiple ones, to name a few.
Data-associated documentation. Analytics engineers are often tasked with maintaining data documentation to ensure that everyone on the team uses the same definitions and language. This involves providing identifiable and understandable descriptions of data as well as exposing them in a way for all consumers to easily find answers to their queries. Experts document data at every stage, specifying the details of data features.
Defining data quality rules, standards, and metrics. It is not rare for analytics engineers to take responsibility for data quality management. That said, they define certain metrics to be used and measures to be taken to guarantee data is accurate enough to fit operational and analytics needs. The same goes for defining data quality standards — agreements on how data should be formatted, shown, and used across an organization. Analytics engineers may also take care of writing cleansing algorithms to further improve the quality of data.
Setting software engineering best practices for analytics. Another crucial duty that separates an analytics engineer from a data analyst is applying software engineering best practices. Such an approach is called DataOps — a new methodology that ties together data engineering, data analytics, and DevOps.
Here are a few best practices analytics engineers can refer to:
- version control to trace the history of changes in datasets and roll back to older versions if something goes wrong;
- data unit testing to examine small chunks of data transformations for quality and correspondence to the set tasks; and
- continuous integration and continuous delivery (CI/CD) to ensure up-to-date and reliable data.
Close collaboration with other team members. It is an important part of an analytics engineer’s job to work collaboratively with all stakeholders namely data engineers, business analysts, and data scientists to align business requirements with data assets. That’s where the business-oriented side of analytics engineers should show up first and foremost. They often communicate with business clients to collect or develop data requirements.
In some cases, analytics engineers may also be responsible for migrating enterprise data into a warehouse or other centralized repository. However, this job is still more relevant for data engineers, especially in larger companies.
Analytics engineer skills and toolkit knowledgeHaving dealt with the responsibilities, it’s time to define the skillset and expertise this data unicorn should possess. Again, skills required for this position may differ greatly depending on the company. We’ll list the key aspects you should look for in an analytics engineer.
Experience working in the data space. Data is the cornerstone of the whole thing, so analytics engineers must have experience working in data-driven landscapes. Normally, these are either data engineers seeking to discover business aspects or data analytics people who are more tech oriented.
Education background. Analytics engineers are expected to have a bachelor's, master’s, or PhD degree in corresponding domains, e.g., statistics, mathematics, computer science, software engineering, or IT.
Strong SQL skills. Since the lion’s share of an analytics engineer’s duties will be creating logic for data transformations, writing lots of queries, and building data models, being an expert in SQL is a must.
Experience in programming languages. Apart from SQL, it is a big plus for such a specialist to know more advanced “data” languages like R and Python to handle various data orchestration tasks.
The dbt technology knowledge. As a rule, analytics engineers are expected to know how to work with dbt — a transformation command tool that allows implementing analytics code using SQL.
Knowledge of software engineering best practices. Analytics engineers must be well aware of how to adopt software engineering best practices and apply them to analytics code.
Git expertise. As the most commonly used version control system, Git should necessarily be within the tools a good analytics engineer is comfortable working with. It keeps track of any changes done to data and allows multiple users to make changes.
Data engineering and BI tools knowledge. It is a big plus if your future analytics engineer has hands-on experience with tools for building data pipelines. The list may include data warehouses like Snowflake, Amazon Redshift, and Google BigQuery; ETL tools like AWS Glue, Talend, or others; Business Intelligence tools like Tableau, Looker, or equivalent.
Interpersonal and communication skills. Being able to ask the right questions in an appropriate way is crucial to enable analytics engineers to excel in this career. They interact with different team members and other stakeholders on a regular basis, so employers should always check interpersonal skills.
While an analytics engineer seems to be a jack-of-all-trades, keep in mind that this employee can’t and shouldn’t do all the data-related tasks.
How to become or get an analytics engineerThe interest in analytics engineering grows in leaps and bounds. At the same time, the resources and certifications needed to gain knowledge in the field don’t keep up with the pace of the role’s popularity. Here are a few ways to become or transition to an analytics engineer given you are familiar with SQL, Python, GitHub, and other essential things to work with data.
- The dbt community offers a couple of free self-learning courses as well as paid ones for enterprise customers.
- Analytics engineers club initiated by Michael Kaminsky and Claire Carroll from dbt provides a 10-week program that teaches the main skills.
- Udemy has a special course to learn dbt from scratch.
- Northeastern University in Boston offers the Master of Science in Data Analytics Engineering program.
For the companies that want to expand their data teams with this new role, there are two scenarios available.
- Educate or redirect existing specialists who come from data engineering, business analytics, or data science backgrounds.
- Hire a specialist who already has experience working on an analytics engineering position (admittedly, it’s difficult to get people with expertise in this role).
Do you really need an analytics engineer?It depends on your data team, the amount of data your business processes rely on, and the insights you’d like to get from that data. Basically, both small and large companies can benefit from having an analytics engineer as part of their staff.
As a small company, you often don’t need to hire the complete data science team from data engineers to data analysts. You simply want your data to be collected, transformed, and turned into valuable assets. The role of an analytics engineer may be a perfect option in this case.
As a large company, you may have data engineers, data analysts, and data scientists but still begging for additional data talent. Analytics engineers are universal soldiers so they make a good hire for big organizations that want to increase the amount of insight they can draw from data.
To recap, the decision to go in the direction of analytics engineering is up to you. The topic is still hotly debated. Some see the role of analytics engineer as full of potential, while others call it “a marketing gimmick selling old wine in a new bottle.” Time will tell.