Learning Tracks: Working With Data Teams
If you work with data teams as part of your day to day, you'll need a strong technical foundation. This learning track will break down what concepts and tools you'll need to understand to be a great partner to all different types of data teams. And impress your boss.
The basics
Whether you're working with analytics, data science, or ML, there are some important basics that all data work starts with. Nail these down and you'll be ready to get into more role-specific stuff.
- What do data teams even do? Start by reading about the basic jobs to be done for data teams.
- At SaaS companies, product analytics is a big part of what data teams do.
- You can read an overview of different parts of the data stack here.
Where data comes from
To get powerful models and nice dashboards, the data needs to come from somewhere - and it's usually a mish mosh of sources from around your business.
- Data for analytics comes from across your business: your user and app data, and third party tools like Stripe and Salesforce.
- Relational databases are the ABCs of backends: they're where you store the data your app needs, like your users and their settings.
- NoSQL databases are another popular way to store data, with less structure and more flexibility.
Where data is stored
Once data teams have their source data in order, they usually store it in a special database designed specifically for analytics and data science.
- A popular but less organized storage format is called a Data Lake.
- Snowflake is the most popular cloud data warehouse, and was the biggest tech IPO ever.
PAID
- Elastic is an analytics database specifically built for searching through unstructured data.
PAID
- MongoDB is a popular type of NoSQL database for applications.
How data gets moved around
Source data is rarely in the format data teams need it in, so they need to transform it into the right form and shape. This is sometimes done before moving it into the warehouse (ETL), and sometimes done after (ELT).
- Transforming data usually gets called ETL, short for extract, transform, and load.
- dbt is an increasingly popular tool for transforming and organizing your warehouse data.
PAID
- Kafka is a powerful tool built at LinkedIn for streaming event data in real time.
- Segment helps data teams collect analytics events and send them to the tools they need to be in
- Databricks is a tool for running Spark jobs, basically ETL for big data.
PAID
How data gets used
Once cleaned, organized data is in the warehouse, you can do anything with it, from dashboards to operations to ML models.
- A language-based ML model named GPT-3 took the world by storm.
- For anyone who has seen or used ChatGPT or DALL-E, ML and AI have been advancing quickly over the past few years.
- Kafka is a popular tool for streaming event data in real time.
- Segment helps data teams collect analytics events and send them to the tools they need to be in
- Databricks is a tool for running Spark jobs, basically ETL for big data.
PAID
Technically learning tracks help make the world of software simple and digestible, so you can be better at your job. There are more on the way!
Ideas for other learning tracks? Ways we can improve this one? Let us know.