This repository is a curation of good blog posts and books for Analytics Engineers. It can also be very useful for Data Analysts and Data Scientists.
I really appreciate any contribution. Just make sure to describe the theme and why you found the resource useful.
- SQL
- Python
- Infrastructure
- Analytics Skills
- Data Warehousing
- Data Pipelines
- Starting analytics in a company
- Testing data
- Success Stories
- Organisation
- Data Visualisation
- Marketing and data
- More rigor for the analyst
- Other readings lists
- Top bloggers/blog
Definition of the Analytics Engineer: The Analytics Engineer.
- Learning SQL 201: Optimizing Queries, Regardless of Platform By Randy Au. I finally found a complete post on advanced SQL.
- Python for Data Analysis. Very comprehensive book about using python for data stuff.
- Pandas Cheatsheet I use it everyday!
- TIL Python by Vicki Boykis. Some tricks to deal with some frequent tasks when manipulating data.
- Modern pandas. A series of blog posts on intermediate/advanced pandas written by one of the maintainers.
- The Startup Founder's Guide to Analytics. An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.
- The missing layer of Analytics Stack.
- Choosing a Data Warehouse. A lot of excellent answers on what to choose for your data warehouse.
- Data science for start-ups. You can find some useful information in this free book.
Comparison of tools by Stephen Levin
- Looker vs Tableau vs Mode. Data Visualisation tools compared. .
- Segment vs Fivetran vs Stitch: Which Data Ingest Should You Use?
- One analyst's guide for going from good to great
- Suceeding as the first data person in a small company/startup. A must read for anyone working in data even in a big company.
- Prioritizing data science work. Too many engineers like building ivory towers. Make sure you don't fall in the trap.
- The beginner guide to data engineering series. Start here if you don't know what is a star schema, Airflow and some basic practices when writing data pipelines.
- Best practices for data modeling. A lot of practical tips on naming, grain, permissions and materialization.
- The Data Warehouse Toolkit by Ralph Kimball. A classic in Business Intelligence. Some chapters can be gold on modeling your data warehouse.
- Functional Data Engineering — a modern paradigm for batch data processing. You will learn the spirit behind good data pipelines and a well-designed data warehouse.
- The rise of the Data Engineer. Explains recent evolutions of the job and data practices.
- Five principles that will keep your data warehouse organized
- For Data Warehouse Performance, One Big Table or Star Schema?. Discussion on an alternative to star schema.
- Functional Data Engineering — a modern paradigm for batch data processing. You will learn the spirit behind good data pipelines and a well-designed data warehouse.
- Maintenable ETL: Tips for Making Your Pipelines Easier to Support and Extend. Best practices to write good ETL.
- Building a data practice from scratch. Very useful for your first weeks as a data person.
- The Startup Founder's Guide to Analytics. An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.
- Automated Testing In The Modern Data Warehouse. Practical advice to test data. Useful for everyone building data pipelines. Rare to found such a post dealing with non-sexy thing in data.
- Engineer shouldn't write ETL. It's more data science focused but it's a classic.
- Does my startup data team need a data engineer?
- Data Driven Marketing. Reading some chapters can help you think like a marketer with data driven approach. It's a gem. Didn't find this kind of insights elsewhere.
These books/articles helped me to think better when analysing data.
- Common Data Mistakes to Avoid. Excellent summary of the most common fallacies when analyzing data. Very clear and well-explained.
- Thinking fast and slow. Learning about bias can be super useful. For instance, I didn't have the reflex to think of a base rate anytime I see a figure.
- Fooled by randomness. Nassim Taleb taught so much both professionally and personnaly. In Fooled By Randomness, you will learn about major pitfalls when dealing with data in real life.
- Why you should care about the Nate Silver vs. Nassim Taleb Twitter war. Great chess players learn from high elo games. Great data people learn from debate between data experts.
- Five books every data scientist should read that are not about data science. I have not read them all yet. But these suggestions seems judicious.
- Fundamentals of Data Visualisation. Complete guide to visualisation. Free version online.
I really love Reading in Applied Data Science. But it is more for data scientists.
The GitLab data team also made an excellent list. (close to mine)
- Randy Au. You can read almost all his posts there are all very relevant for analytics engineers.
- Locally Optimistic. A blog dedicated to data in organizations.
- Tristan Handy. I also love his newsletter: Data Science Roundup.
- Dbt blog
- Locally Optimistic