SlideShare a Scribd company logo
1 of 34
Real Time Big Data With Storm,
Cassandra, and In-Memory Computing
Nati Shalom @natishalom
DeWayne Filppi @dfilppi
Introduction to Real Time Analytics
Homeland Security
Real Time Search
Social
eCommerce
User Tracking &
Engagement
Financial Services
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved2
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved3
The Two Vs of Big Data
Velocity Volume
The Flavors of Big Data Analytics
Counting Correlating Research
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved4
It’s All about Timing
• Event driven / stream processing
• High resolution – every tweet gets counted
• Ad-hoc querying
• Medium resolution (aggregations)
• Long running batch jobs (ETL, map/reduce)
• Low resolution (trends & patterns)
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved5
This is what
we’re here
to discuss 
Facebook & Twitter Real Time Analytics
FACEBOOK REAL-TIME
ANALYTICS SYSTEM
(LOGGING CENTRIC APPROACH)
7
The actual analytics..
 Like button analytics
 Comments box analytics
8
® Copyright 2011 Gigaspaces Ltd. All Rights
PTail
Scribe
Puma
Hbase
FACEBOOK
Log
FACEBOOK
Log
FACEBOOK
Log
HDFS
Real Time Long Term
Batch
1.5 Sec
Facebook architecture..
10,000
write/sec
per server
TWITTER REAL-TIME
ANALYTICS SYSTEM
(EVENT DRIVEN APPROACH)
10
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved11
URL Mentions – Here’s One Use Case
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved12
Twitter Real Time Analytics based on Storm
Comparing the two approaches..
Facebook
 Rely on Hadoop for Real
Time and Batch
 RT = 10’s Sec
 Suits for Simple processing
 Low parallelization
Twitter
 Use Hadoop for Batch and
Storm for real time
 RT = Msec, Sec
 Suits for Complex
processing
 Extremely parallel
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved13
This is what
we’re here
to discuss 
Introduction
to Storm
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved14
 Popular open source, real time, in-memory, streaming
computation platform.
 Includes distributed runtime and intuitive API for defining
distributed processing flows.
 Scalable and fault tolerant.
 Developed at BackType,
and open sourced by Twitter
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved15
Storm Background
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved16
Storm Cluster
 Streams
 Unbounded sequence of tuples
 Spouts
 Source of streams (Queues)
 Bolts
 Functions, Filters, Joins, Aggregations
 Topologies
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved17
Storm Concepts
Spouts
Bolt
Topologies
Challenge – Word Count
Word:Count
Tweets
Count
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved18
• Hottest topics
• URL mentions
• etc.
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved19
Streaming word count with Storm
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved20
Computing Reach with Event Streams
But where is my
Big
Data?
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved21
Bolt
Bolt
Spout
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved22
The Big Picture …
Twitter
feed
Twitter
Feed
Twiter
Feed
Web
Activity
Web
Activity
Web
Activity
Analytics Data
Research
Data
Counters
Reference
Data
StormData feeds (Kafka, Twitter,..) Cassandra, MongoDB, Hbase,..
End to End Latency
 Storm performance and reliability
 Assumes success is normal
 Uses batching and pipelining for performance
 Storm plug-ins has significant effect on performance and
reliability
 Spout must be able to replay tuples on demand in case of error.
 Storm uses topology semantics for ensuring consistency
through event ordering
 Can be tedious for handling counters
 Doesn’t ensure the state of the counters
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved23
Your as as strong as your weakest link
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved24
Typical user experience…
Now, Kafka is *fast*. When running the Kafka
Spout by itself, I easily reproduced Kafka's claim
that you can consume "hundreds of thousands
of messages per second".
When I first fired up the topology, things went
well for the first minute, but then quickly
crashedas the Kafka spout emitted too
fast for the Cassandra Bolt to keep up. Even
though Cassandra is fast as well, it is still
orders of magnitude slower
than Kafka
Source: A Big Data Trifecta: Storm, Kafka and Cassandra. Brian Oniells Blog
What if we could put
everything In Memory?
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved25
An Alternative Approach
Did you know?
Facebook keeps 80%
of its data in
Memory
(Stanford research)
RAM is 100-1000x
faster than Disk
(Random seek)
• Disk: 5 -10ms
• RAM: ~0.001msec
 RAM is the new disk
 Data partitioned across a cluster
 Large “virtual” memory space
 Transactional
 Highly available
 Code with Data
In Memory Data Grid Review
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved27
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved28
Integrating with Storm
Bolt
Bolt
Spout
Web
Activity
Web
Activity
Web
Activity
Analytics Data
Research
Data
Counters
Reference
Data
In Memory Data Grid
(via Storm Trident State plug-in)
In Memory Data Stream
(Via Storm Spout Plugin)
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved29
In Memory Streaming Word Count with Storm
Storm has a simple builder
interface to creating
stream processing
topologies
Storm delegates
persistence to external
providers
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved30
Integrating with Hadoop, NoSQL DB..
Bolt
Bolt
Spout
Web
Activity
Web
Activity
Web
Activity
Analytics Data
Research
Data
Counters
Reference
Data
In Memory Data GridIn Memory Data Stream Storm Plugin
Hadoop, NoSQL, RDBMS,…
Write Behind
LRU based Policy
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved31
Live Demo – Word Count At In Memory Speed
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved32
Recent Benchmarks..
Gresham Computing plc, achieved over 50,000
equity trade transactions per second of load and match into
a database.
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved33
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved34
References
 Try the Cloudify recipe
 Download Cloudify : http://www.cloudifysource.org/
 Download the Recipe (apps/xapstream, services/xapstream):
– https://github.com/CloudifySource/cloudify-recipes
 XAP – Cassandra Interface Details;
 http://wiki.gigaspaces.com/wiki/display/XAP95/Cassandra+Space+Persistency
 Check out the source for the XAP Spout and a sample state
implementation backed by XAP, and a Storm friendly streaming
implemention on github:
 https://github.com/Gigaspaces/storm-integration
 For more background on the effort, check out my recent blog posts at
http://blog.gigaspaces.com/
 http://blog.gigaspaces.com/gigaspaces-and-storm-part-1-storm-clouds/
 http://blog.gigaspaces.com/gigaspaces-and-storm-part-2-xap-integration/
 Part 3 coming soon.

More Related Content

What's hot

Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm ConceptsAndré Dias
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Folio3 Software
 
Storm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationStorm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationFerran Galí Reniu
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 

What's hot (20)

Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm Concepts
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
 
Storm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationStorm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computation
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 

Viewers also liked

Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Storm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsStorm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsAerospike, Inc.
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Amazon Web Services
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...Kai Wähner
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Kai Wähner
 
A taste of Snowplow Analytics data
A taste of Snowplow Analytics dataA taste of Snowplow Analytics data
A taste of Snowplow Analytics dataRobert Kingston
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...yalisassoon
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareDATA360US
 
Service Blueprinting / Service Design Drinks Berlin
Service Blueprinting / Service Design Drinks BerlinService Blueprinting / Service Design Drinks Berlin
Service Blueprinting / Service Design Drinks BerlinService Design Berlin
 
Building a Real-Time Geospatial-Aware Recommendation Engine
 Building a Real-Time Geospatial-Aware Recommendation Engine Building a Real-Time Geospatial-Aware Recommendation Engine
Building a Real-Time Geospatial-Aware Recommendation EngineAmazon Web Services
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitternkallen
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormRan Silberman
 

Viewers also liked (20)

Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Storm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsStorm Persistence and Real-Time Analytics
Storm Persistence and Real-Time Analytics
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
 
A taste of Snowplow Analytics data
A taste of Snowplow Analytics dataA taste of Snowplow Analytics data
A taste of Snowplow Analytics data
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...
 
Brand strategy
Brand strategyBrand strategy
Brand strategy
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
Service Blueprinting / Service Design Drinks Berlin
Service Blueprinting / Service Design Drinks BerlinService Blueprinting / Service Design Drinks Berlin
Service Blueprinting / Service Design Drinks Berlin
 
Building a Real-Time Geospatial-Aware Recommendation Engine
 Building a Real-Time Geospatial-Aware Recommendation Engine Building a Real-Time Geospatial-Aware Recommendation Engine
Building a Real-Time Geospatial-Aware Recommendation Engine
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 

Similar to Real-Time Big Data at In-Memory Speed, Using Storm

Cassandra summit-2013
Cassandra summit-2013Cassandra summit-2013
Cassandra summit-2013dfilppi
 
Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyNati Shalom
 
Bigdata analytics-twitter
Bigdata analytics-twitterBigdata analytics-twitter
Bigdata analytics-twitterdfilppi
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 
Sysmech The Zen of Consolidated Network Performance Management
Sysmech The Zen of Consolidated Network Performance ManagementSysmech The Zen of Consolidated Network Performance Management
Sysmech The Zen of Consolidated Network Performance ManagementSystemsMechanics
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and PancakesBig Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and PancakesOsama Khan
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Amazon Web Services
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentationTUSHAR GARG
 
Big and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsBig and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsFred Melo
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 
ElastiCache & Redis: Database Week San Francisco
ElastiCache & Redis: Database Week San FranciscoElastiCache & Redis: Database Week San Francisco
ElastiCache & Redis: Database Week San FranciscoAmazon Web Services
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 

Similar to Real-Time Big Data at In-Memory Speed, Using Storm (20)

Cassandra summit-2013
Cassandra summit-2013Cassandra summit-2013
Cassandra summit-2013
 
Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case Study
 
Bigdata analytics-twitter
Bigdata analytics-twitterBigdata analytics-twitter
Bigdata analytics-twitter
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Sysmech The Zen of Consolidated Network Performance Management
Sysmech The Zen of Consolidated Network Performance ManagementSysmech The Zen of Consolidated Network Performance Management
Sysmech The Zen of Consolidated Network Performance Management
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and PancakesBig Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
Big Data & Machine Learning Pipelines: A Tale of Lambdas, Kappas and Pancakes
 
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Big Data
Big DataBig Data
Big Data
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentation
 
Big and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable SystemsBig and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable Systems
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
ElastiCache & Redis: Database Week San Francisco
ElastiCache & Redis: Database Week San FranciscoElastiCache & Redis: Database Week San Francisco
ElastiCache & Redis: Database Week San Francisco
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
CS-Op Analytics
CS-Op AnalyticsCS-Op Analytics
CS-Op Analytics
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 

More from Nati Shalom

Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integrationNati Shalom
 
Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Nati Shalom
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integrationNati Shalom
 
1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...Nati Shalom
 
Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Nati Shalom
 
What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like Nati Shalom
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production Nati Shalom
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackNati Shalom
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitNati Shalom
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaNati Shalom
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Nati Shalom
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined OperatorNati Shalom
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsNati Shalom
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)Nati Shalom
 
Application Centric Approach to Devops
Application Centric Approach to DevopsApplication Centric Approach to Devops
Application Centric Approach to DevopsNati Shalom
 
Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Nati Shalom
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOpsNati Shalom
 

More from Nati Shalom (20)

Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail! Why NFV and Digital Transformation Projects Fail!
Why NFV and Digital Transformation Projects Fail!
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...1 cloud, 2 clouds, 3 clouds, tons...
1 cloud, 2 clouds, 3 clouds, tons...
 
Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017Open Stack Days israel Keynote 2017
Open Stack Days israel Keynote 2017
 
What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like What A No Compromises Hybrid Cloud Looks Like
What A No Compromises Hybrid Cloud Looks Like
 
Running OpenStack in Production
Running OpenStack in Production Running OpenStack in Production
Running OpenStack in Production
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
 
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStackReal World Example of Orchestrating Docker, Node JS, NFV on OpenStack
Real World Example of Orchestrating Docker, Node JS, NFV on OpenStack
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
 
OpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the SummitOpenStack Juno The Complete Lowdown and Tales from the Summit
OpenStack Juno The Complete Lowdown and Tales from the Summit
 
Application and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & ToscaApplication and Network Orchestration using Heat & Tosca
Application and Network Orchestration using Heat & Tosca
 
Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users Introduction to Cloudify for OpenStack users
Introduction to Cloudify for OpenStack users
 
Software Defined Operator
Software Defined OperatorSoftware Defined Operator
Software Defined Operator
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Is Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOpsIs Orchestration the Next Big Thing in DevOps
Is Orchestration the Next Big Thing in DevOps
 
When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)When networks meets apps (open stack atlanta)
When networks meets apps (open stack atlanta)
 
Application Centric Approach to Devops
Application Centric Approach to DevopsApplication Centric Approach to Devops
Application Centric Approach to Devops
 
Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013Case Studies for moving apps to the cloud - DLD 2013
Case Studies for moving apps to the cloud - DLD 2013
 
Application Centric DevOps
Application Centric DevOpsApplication Centric DevOps
Application Centric DevOps
 

Recently uploaded

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Recently uploaded (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

Real-Time Big Data at In-Memory Speed, Using Storm

  • 1. Real Time Big Data With Storm, Cassandra, and In-Memory Computing Nati Shalom @natishalom DeWayne Filppi @dfilppi
  • 2. Introduction to Real Time Analytics Homeland Security Real Time Search Social eCommerce User Tracking & Engagement Financial Services ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved2
  • 3. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved3 The Two Vs of Big Data Velocity Volume
  • 4. The Flavors of Big Data Analytics Counting Correlating Research ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved4
  • 5. It’s All about Timing • Event driven / stream processing • High resolution – every tweet gets counted • Ad-hoc querying • Medium resolution (aggregations) • Long running batch jobs (ETL, map/reduce) • Low resolution (trends & patterns) ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved5 This is what we’re here to discuss 
  • 6. Facebook & Twitter Real Time Analytics
  • 8. The actual analytics..  Like button analytics  Comments box analytics 8 ® Copyright 2011 Gigaspaces Ltd. All Rights
  • 9. PTail Scribe Puma Hbase FACEBOOK Log FACEBOOK Log FACEBOOK Log HDFS Real Time Long Term Batch 1.5 Sec Facebook architecture.. 10,000 write/sec per server
  • 11. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved11 URL Mentions – Here’s One Use Case
  • 12. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved12 Twitter Real Time Analytics based on Storm
  • 13. Comparing the two approaches.. Facebook  Rely on Hadoop for Real Time and Batch  RT = 10’s Sec  Suits for Simple processing  Low parallelization Twitter  Use Hadoop for Batch and Storm for real time  RT = Msec, Sec  Suits for Complex processing  Extremely parallel ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved13 This is what we’re here to discuss 
  • 14. Introduction to Storm ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved14
  • 15.  Popular open source, real time, in-memory, streaming computation platform.  Includes distributed runtime and intuitive API for defining distributed processing flows.  Scalable and fault tolerant.  Developed at BackType, and open sourced by Twitter ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved15 Storm Background
  • 16. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved16 Storm Cluster
  • 17.  Streams  Unbounded sequence of tuples  Spouts  Source of streams (Queues)  Bolts  Functions, Filters, Joins, Aggregations  Topologies ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved17 Storm Concepts Spouts Bolt Topologies
  • 18. Challenge – Word Count Word:Count Tweets Count ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved18 • Hottest topics • URL mentions • etc.
  • 19. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved19 Streaming word count with Storm
  • 20. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved20 Computing Reach with Event Streams
  • 21. But where is my Big Data? ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved21
  • 22. Bolt Bolt Spout ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved22 The Big Picture … Twitter feed Twitter Feed Twiter Feed Web Activity Web Activity Web Activity Analytics Data Research Data Counters Reference Data StormData feeds (Kafka, Twitter,..) Cassandra, MongoDB, Hbase,.. End to End Latency
  • 23.  Storm performance and reliability  Assumes success is normal  Uses batching and pipelining for performance  Storm plug-ins has significant effect on performance and reliability  Spout must be able to replay tuples on demand in case of error.  Storm uses topology semantics for ensuring consistency through event ordering  Can be tedious for handling counters  Doesn’t ensure the state of the counters ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved23 Your as as strong as your weakest link
  • 24. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved24 Typical user experience… Now, Kafka is *fast*. When running the Kafka Spout by itself, I easily reproduced Kafka's claim that you can consume "hundreds of thousands of messages per second". When I first fired up the topology, things went well for the first minute, but then quickly crashedas the Kafka spout emitted too fast for the Cassandra Bolt to keep up. Even though Cassandra is fast as well, it is still orders of magnitude slower than Kafka Source: A Big Data Trifecta: Storm, Kafka and Cassandra. Brian Oniells Blog
  • 25. What if we could put everything In Memory? ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved25 An Alternative Approach
  • 26. Did you know? Facebook keeps 80% of its data in Memory (Stanford research) RAM is 100-1000x faster than Disk (Random seek) • Disk: 5 -10ms • RAM: ~0.001msec
  • 27.  RAM is the new disk  Data partitioned across a cluster  Large “virtual” memory space  Transactional  Highly available  Code with Data In Memory Data Grid Review ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved27
  • 28. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved28 Integrating with Storm Bolt Bolt Spout Web Activity Web Activity Web Activity Analytics Data Research Data Counters Reference Data In Memory Data Grid (via Storm Trident State plug-in) In Memory Data Stream (Via Storm Spout Plugin)
  • 29. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved29 In Memory Streaming Word Count with Storm Storm has a simple builder interface to creating stream processing topologies Storm delegates persistence to external providers
  • 30. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved30 Integrating with Hadoop, NoSQL DB.. Bolt Bolt Spout Web Activity Web Activity Web Activity Analytics Data Research Data Counters Reference Data In Memory Data GridIn Memory Data Stream Storm Plugin Hadoop, NoSQL, RDBMS,… Write Behind LRU based Policy
  • 31. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved31 Live Demo – Word Count At In Memory Speed
  • 32. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved32 Recent Benchmarks.. Gresham Computing plc, achieved over 50,000 equity trade transactions per second of load and match into a database.
  • 33. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved33
  • 34. ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved34 References  Try the Cloudify recipe  Download Cloudify : http://www.cloudifysource.org/  Download the Recipe (apps/xapstream, services/xapstream): – https://github.com/CloudifySource/cloudify-recipes  XAP – Cassandra Interface Details;  http://wiki.gigaspaces.com/wiki/display/XAP95/Cassandra+Space+Persistency  Check out the source for the XAP Spout and a sample state implementation backed by XAP, and a Storm friendly streaming implemention on github:  https://github.com/Gigaspaces/storm-integration  For more background on the effort, check out my recent blog posts at http://blog.gigaspaces.com/  http://blog.gigaspaces.com/gigaspaces-and-storm-part-1-storm-clouds/  http://blog.gigaspaces.com/gigaspaces-and-storm-part-2-xap-integration/  Part 3 coming soon.

Editor's Notes

  1. ActiveInsight
  2. ActiveInsight
  3. http://developers.facebook.com/blog/post/476/
  4. http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.htmlThe Winner: HBase + Scribe + Ptail + PumaAt a high level:HBase stores data across distributed machines.Use a tailing architecture, new events are stored in log files, and the logs are tailed.A system rolls the events up and writes them into storage.A UI pulls the data out and displays it to users.Data FlowUser clicks Like on a web page.Fires AJAX request to Facebook.Request is written to a log file using Scribe. Scribe handles issues like file roll over.Scribe is built on the same HTFS file store Hadoop is built on.Write extremely lean log lines. The more compact the log lines the more can be stored in memory.PtailData is read from the log files using Ptail. Ptail is an internal tool built to aggregate data from multiple Scribe stores. It tails the log files and pulls data out.Ptail data is separated out into three streams so they can eventually be sent to their own clusters in different datacenters.Plugin impressionNews feed impressionsActions (plugin + news feed)PumaBatch data to lessen the impact of hot keys. Even though HBase can handle a lot of writes per second they still want to batch data. A hot article will generate a lot of impressions and news feed impressions which will cause huge data skews which will cause IO issues. The more batching the better.Batch for 1.5 seconds on average. Would like to batch longer but they have so many URLs that they run out of memory when creating a hashtable.Wait for last flush to complete for starting new batch to avoid lock contention issues.UI  Renders DataFrontends are all written in PHP.The backend is written in Java and Thrift is used as the messaging format so PHP programs can query Java services.Caching solutions are used to make the web pages display more quickly.Performance varies by the statistic. A counter can come back quickly. Find the top URL in a domain can take longer. Range from .5 to a few seconds. The more and longer data is cached the less realtime it is.Set different caching TTLs in memcache.MapReduceThe data is then sent to MapReduce servers so it can be queried via Hive.This also serves as a backup plan as the data can be recovered from Hive.Raw logs are removed after a period of time.HBase is a distribute column store. Database interface to Hadoop. Facebook has people working internally on HBase. Unlike a relational database you don't create mappings between tables.You don't create indexes. The only index you have a primary row key.From the row key you can have millions of sparse columns of storage. It's very flexible. You don't have to specify the schema. You define column families to which you can add keys at anytime.Key feature to scalability and reliability is the WAL, write ahead log, which is a log of the operations that are supposed to occur. Based on the key, data is sharded to a region server. Written to WAL first.Data is put into memory. At some point in time or if enough data has been accumulated the data is flushed to disk.If the machine goes down you can recreate the data from the WAL. So there's no permanent data loss.Use a combination of the log and in-memory storage they can handle an extremely high rate of IO reliably. HBase handles failure detection and automatically routes across failures.Currently HBaseresharding is done manually.Automatic hot spot detection and resharding is on the roadmap for HBase, but it's not there yet.Every Tuesday someone looks at the keys and decides what changes to make in the sharding plan.Schema Store on a per URL basis a bunch of counters.A row key, which is the only lookup key, is the MD5 hash of the reverse domainSelecting the proper key structure helps with scanning and sharding.A problem they have is sharding data properly onto different machines. Using a MD5 hash makes it easier to say this range goes here and that range goes there. For URLs they do something similar, plus they add an ID on top of that. Every URL in Facebook is represented by a unique ID, which is used to help with sharding.A reverse domain, com.facebook/ for example, is used so that the data is clustered together. HBase is really good at scanning clustered data, so if they store the data so it's clustered together they can efficiently calculate stats across domains. Think of every row a URL and every cell as a counter, you are able to set different TTLs (time to live) for each cell. So if keeping an hourly count there's no reason to keep that around for every URL forever, so they set a TTL of two weeks. Typically set TTLs on a per column family basis. Per server they can handle 10,000 writes per second. Checkpointing is used to prevent data loss when reading data from log files. Tailers save log stream check points  in HBase.Replayed on startup so won't lose data.Useful for detecting click fraud, but it doesn't have fraud detection built in.Tailer Hot SpotsIn a distributed system there's a chance one part of the system can be hotter than another.One example are region servers that can be hot because more keys are being directed that way.One tailer can be lag behind another too.If one tailer is an hour behind and the others are up to date, what numbers do you display in the UI?For example, impressions have a way higher volume than actions, so CTR rates were way higher in the last hour.Solution is to figure out the least up to date tailer and use that when querying metrics.
  5. Storm (quite rationally) assumes success is normalStorm uses batching and pipelining for performanceTherefore the spout must be able to replay tuples on demand in case of error.Any kind of quasi-queue like data source can be fashioned into a spout.No persistence is ever required, and speed attained by minimizing network hops during topology processing.
  6. http://www.gigaspaces.com/greshams-ctc-reconciles-transactions-orders-magnitude-faster-competitive-offerings-intel-benchmarkinhttp://natishalom.typepad.com/nati_shaloms_blog/2013/01/in-memory-computing-data-grid-for-big-data.html