Apache Cassandra is a NoSQL database ideal for high-speed, online transactional data, while Hadoop is a big data analytics system that focuses on data warehousing and data lake use cases.
What is Hadoop?
Apache Hadoop, an Apache Software Foundation Project, is a big data analytics framework that focuses on near-time and batch-oriented analytics of historical data. Hadoop helps run analytics on high volumes of historical/line of business data on commodity hardware.
Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the stack itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
How does Cassandra complement Hadoop?
As with legacy relational database applications, there is typically a need in modern Web, mobile and IOT applications to have a database devoted to online operations (that includes analytics on hot data) and a batch-oriented data warehouse environment that supports the processing of colder data for analytic purposes.
Apache Cassandra™ is a perfect database choice for online Web and mobile applications, whereas Hadoop targets the processing of colder data in data lakes, warehouses, etc. This allows a IT organization to effectively support the different analytic “tempos” needed to satisfy customer requirements and run the business. Many companies have successfully deployed and benefited from Apache Cassandra including some large companies such as Apple, Comcast, Instagram, eBay, Rackspace, Netflix and many more.
When is Cassandra required for an application?
Cassandra is perfect for big data applications, and can be used in many different data management situations. Some of the most common use cases for Cassandra include:
• Time series data management
• High-velocity device data ingestion and analysis
• Media streaming (e.g., music, movies)
• Social media input and analysis
• Online web retail (e.g., shopping carts, user transactions)
• Web log management / analysis
• Web click-stream analysis
• Real-time data analytics
• Online gaming (e.g., real-time messaging)
• Write-intensive transaction systems
• Buyer event analytics
• Risk analysis and management
When should I not use Cassandra?
Cassandra is typically not the choice for transactional data that needs per-transaction commit/rollback capabilities. Note that Cassandra does have atomic transactional abilities on a per row/insert basis (but with no rollback capabilities).