Cloudera User Group Meetup

Written by Matthias Vallaey | Nov 7, 2016 11:33:44 AM

Big Industries is the main sponsor and driving force behind the Belgian chapter of the Cloudera User Group. This is a group for Cloudera customers and anyone interested in Cloudera solutions in Belgium to network, share best practices, and exchange ideas around the Cloudera Big Data platform and eco-system.

On October 25, 2016 we organised a Meetup on Streamsets, Datameer and Apache Kudu.

Agenda:

19:00: Apache Kudu: Fast Analytics on Fast Data - Mike Percy, Cloudera

Apache Kudu is a fast new columnar data store for the Hadoop ecosystem designed to enable high-performance, flexible analytic pipelines.

19:45: Datameer: Make Big Data Analytics easy for everyone - Eelco Jan Boonstra & Erik Stalpers, Datameer

Joint Cloudera/Datameer Use Case regarding Customer Segmentation followed by a demonstration.

20:30: Rapid data ingestion pipelines with StreamSets - Robert Gibbon, Big Industries

In this talk Rob Gibbon will turn the microscope on StreamSets, a new, open source streaming data ingestion system for the Hadoop ecosystem and friends.

Rob will give us an overview of this useful tool, guide us through the process of developing a data ingestion pipeline, and look at options for extending the base functionality.

More Info on Apache Kudu: Fast Analytics on Fast Data

Apache Kudu is a fast new columnar data store for the Hadoop ecosystem designed to enable high-performance, flexible analytic pipelines. Being optimized for lightning-fast scans, Kudu is particularly well suited to hosting time-series data such as metrics, machine learning model-building workloads, and data warehousing applications. Despite its impressive scan speed, Kudu also supports operations supported by many traditional data stores, including real-time insert, update, and delete operations. Kudu supports a "bring your own SQL" model, and supports being queried by multiple SQL engines, including Apache Spark SQL, Apache Impala (incubating), and Apache Drill. This talk will discuss what Kudu is, why we decided to build it, what makes it fast, and an example of how it can be used for a time-series use case.

Bio:

Mike Percy is a Software Engineer at Cloudera and a PMC member / committer on Apache Kudu and Apache Flume. Prior to joining Cloudera, Mike worked at Yahoo! on big data infrastructure for machine learning at scale. Mike holds a BSCS from UC Santa Cruz and an MSCS from Stanford.

View full post