Data Engineer, localsearch.ch
Today almost all companies have some sort of data processing solution in place, for example to track users, calculate KPIs or doing machine learning (ML).
For most setups batch-processing is still the more common thing, even though everyone wants to do stream-processing now, especially for fancy real-time ML stuff.
Wouldn't it be nice to have a solution that is able to do both, so that it is easier to switch from one to the other at some point, which is also executable on many different runners? And wouldn't it be even nicer if this cronjob-hell of dependent batch-processing jobs would finally be organized cleanly? Meet Apache Beam and Apache Airflow (incubating).
DISCLAIMER: This talk will be 80% about batch-processing and 20% about streaming :)
Full-stack engineer, focussing on backend and (big-)data processing systems.