3. Batch vs Streaming: Understand the Trade-Offs

Reis and Housley clearly outline the trade-offs between batch and streaming data systems. Batch processing (e.g., with Apache Spark or dbt) is robust and suitable for most business use cases. Streaming (e.g., Apache Kafka, Flink, or Beam) enables real-time decisions but requires complexity in design and fault tolerance. Engineers must evaluate latency needs, cost of processing, and system reliability. The key takeaway: don’t adopt real-time pipelines for novelty. Use streaming only when real business value demands it. Choose tools based on data freshness requirements, not hype.

1 read

CURATED FROM

Fundamentals of Data Engineering

by Joe Reis, Matt Housley

7 ideas

16 reads

IDEAS CURATED BY

Henderson Costa

@hendo4books2

computer scientist and data scientist from Brazil Insta : @hendosousa

This book is a modern manifesto for data engineers. It teaches how to build scalable, reliable, and modular data systems in the cloud era — going beyond ETL to embrace data as a product, observability, and the principles of resilient architecture.

“

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

—

100+ Learning Journeys

—

Access to 200,000+ ideas

—

Access to the mobile app

—

Unlimited idea saving

—

Unlimited history

—

Unlimited listening to ideas

—

Downloading & offline access

—

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

I agree to receive email updates

deepstash

Content

Ideas

Collections

Stories

Explore

Product

Pricing

Businesses

Resources

Terms

Privacy

Press Kit

Sitemap

Company

About

Contact