Big Data/Cassandra
Search for Apache Cassandra on Wikipedia. |
Big Data |
---|
Technologies
|
Apache Cassandra is a NoSQL wide column-oriented database management system, distributed and scalable. In 2015, it has become one of the world's most popular SGBD[1].
Installation
[edit | edit source]The Java sources are available on https://github.com/apache/cassandra, but a tarball is on http://cassandra.apache.org/download/.
- MacOS:
brew install cassandra && brew services start cassandra
See also http://cassandra.apache.org/doc/latest/getting_started/installing.html for more information.
To launch the server:
- On Linux:
/cassandra/bin/cassandra
- On Windows: \cassandra\bin\cassandra.bat
Graphical user interface
[edit | edit source]There are several GUI to manage Cassandra. For example Helenos: its Java sources are available on https://github.com/tomekkup/helenos, and a compiled version on http://sourceforge.net/projects/helenos-gui/.
It includes an Apache + Tomcat server, launchable by \helenos\bin\startup.bat. Then, the web interface must be visible on http://localhost:8080 (login: admin / password: admin).
NB: it can create some column families, but not see the ones which were created in CQL.
Data manipulation
[edit | edit source]In 2011 Cassandra introduced the Cassandra Query Language (CQL)[2][3], you can interact with CQL using the cqlsh
client. Using cqlsh
you can create w:keyspaces and tables, insert and query tables among other operations.
The CQL 3.0 syntax looks like this[4]:
CREATE KEYSPACE MyBase1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE MyBase1;
CREATE TABLE MyTable1 (
id text,
FirstName text,
LastName text,
PRIMARY KEY(id));
INSERT INTO MyTable1 (id, LastName) VALUES ('1', 'Test');
SELECT * FROM MyTable1;
DROP TABLE MyTable1;
Additional Notes:
- There isn't any autoincrement option.
- No case-sensitive field names.
- Inserting a new record with an existing primary key will replace the old one, without any warning.
- When inserting more than 1,000 records, cqlsh may ignore the rest. It's recommended to use the ETL sstableloader.
Cassandra port usage
[edit | edit source]- 7000, cluster communication [5]
- 7001, cluster communication if SSL enabled [6]
- 7199 JMX (was 8080 pre Cassandra 0.8.xx)[7]
- 9042 CQL native clients
- 9160 Thrift client API[8]
How to use several nodes
[edit | edit source]To communicate from one server to another Cassandra needs to open the ports[9]: 7000, 7001, 7199 (SSL), 9042 and 9160.
There isn't any master node, so the fail-over is automatic. Each node must own a "seed node" in its configuration, to get the distributed architecture. Their description is stored into \cassandra\conf\cassandra-rackdc.properties.
To let the nodes communicate, into cassandra.yaml, the parameter endpoint_snitch must be RackInferringSnitch (instead of SimpleSnitch by default).
Then, the nodes list is visible with:
- On Linux: \cassandra\bin\nodetool status
- On Windows: \cassandra\bin\nodetool.bat status
NB: when a keyspace is cerated with a replication_factor superior to one, the nodes become redundant (mirroring).
Related Technologies
[edit | edit source]- Amazon Dynamo[10] - uses similar concepts like data distribution, vault tolerance
- BigTable - uses similar data model (column-families)
- Redis - in memory key value database[11]
- MongoDB
References
[edit | edit source]- ↑ http://db-engines.com/en/ranking
- ↑ https://grokbase.com/t/cassandra/user/1162fkpwx2/release-0-8-0
- ↑ https://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html
- ↑ https://cassandra.apache.org/doc/cql3/CQL.html
- ↑ http://cassandra.apache.org/doc/latest/faq/index.html#what-ports
- ↑ http://cassandra.apache.org/doc/latest/faq/index.html#what-ports
- ↑ https://stackoverflow.com/questions/2359159/cassandra-port-usage-how-are-the-ports-used
- ↑ https://stackoverflow.com/questions/2359159/cassandra-port-usage-how-are-the-ports-used
- ↑ http://docs.datastax.com/en/cassandra/2.0/cassandra/initialize/initializeSingleDS.html
- ↑ https://en.wikipedia.org/wiki/Amazon_DynamoDB
- ↑ https://en.wikipedia.org/wiki/Redis
- Apache Cassandra - home page
- A. Lakshman and P. Malik "Cassandra: a decentralized structured storage system" ACM SIGOPS Operating Systems Review, Volume 44 Issue 2, April 2010, Pages 35-40, ACM New York, NY, USA