Working with Amazon Keyspaces Cassandra distribution using Python

17 September, 2020

Working with Amazon Keyspaces Cassandra distribution using Python

Cassandra is a popular NoSQL database with capabilities to handle massive data by using a distributed array of commodity hardware. After a boring introduction, here's the fun part - Cassandra is a Facebook contribution to the world of Open Source, having been developed to handle the inbox search feature in Facebook almost a decade ago, and since it has been handed over to the Apache Software Foundation, it has been among the top open source projects under the organization.

Cassandra powers the highly intensive queries in the applications of several major tech players - Instagram, Netflix, Facebook (before replacing it with HBase and further HBase with MyRocks), Twitter (before replacing it with in-house Manhattan), Walmart Labs, CERN, Cisco WebEx, and many more. You can read an extensive praise-o-logy of Cassandra on this Ubuntu blog titled What is Cassandra and why are big tech companies using it?

While it can be a hassle to install and maintain a Cassandra database server online, Amazon Keyspaces offering by Amazon Web Services makes it a no-brainer to use. A similar powerful offering is provided by Datastax by the name Astra. Both provide a managed service for Cassandra database which you can readily use both for development and production needs. Big plus - you can try both for free!

In this tutorial, I shall be moving ahead with Amazon Keyspaces, considering the dearth of a complete example of how to use it on the Internet. You can very easily modify it to work with Datastax Astra.

First off, before we make any queries, head over to the Amazon Keyspaces dashboard and create a Keyspace and a sample Table using the Dashboard or the CQL Editor. Both actions being fairly simple, I shall leave it up to you to explore how to get it done. Let's move ahead with how you can use them with a Python application.

For convenience, I shall assume that the keyspace you created is named test_keyspace and the table is named users. The definition for users table is expected to be -

ColumnType
iduuid
namevarchar
ageinteger
cityvarchar

You'll need to download a Digital Certificate provided by Amazon to be able to connect to Keyspaces since the service only connects through TLS. To do so, use the following command in a terminal window -

curl https://www.amazontrust.com/repository/AmazonRootCA1.pem -O

Create a new folder in your working directory named .cassandra and move the AmazonRootCA1.pem file there. Just cleaning up the working space, tbh, you can keep it wherever you wish, as long as its accessible to your Python script.

Next, we shall need the cassandra-driver library for quick functionality to use Cassandra with Python. Run the following command to install it -

pip install cassandra-driver

Now, we shall write a small barebones wrapper object for our connection with Amazon Keyspaces. Create a file called db.py in your working directory.

Add the following code to make all necessary imports -

from cassandra.auth import PlainTextAuthProvider
from cassandra.cluster import Cluster
from ssl import SSLContext, PROTOCOL_TLSv1_2, CERT_REQUIRED
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement

The above imports indicate a few things -

    1. We're going to be using the PlainTextAuthProvider which means at some point we'll need a username and password combination for our connection with the database. We shall come around this.
    1. Cluster indicates that we shall be creating an object of the Cassandra Cluster we connect to. This provides cluster level operations which can be useful for you, but not much in the course of this tutorial.
    1. SSLContext, PROTOCOL_TLSv1_2, CERT_REQUIRED are all required for making a TLS connection to the Amazon Keyspaces service.
    1. ConsistencyLevel if this is a bouncer, we shall get to overcoming it further down in this tutorial.
    1. SimpleStatement is a simple CSQL statement, no wonders here.

Next, let's begin using these imports.

Create a class Cassandra and add an initialization constructor which creates and connection to the database service -

class Cassandra:
    def __init__(self):
        ssl_context = SSLContext(PROTOCOL_TLSv1_2)
        ssl_context.load_verify_locations('.cassandra/AmazonRootCA1.pem')
        ssl_context.verify_mode = CERT_REQUIRED
        auth_provider = PlainTextAuthProvider(
            username='ServiceUsername',
            password='ServicePassword')
        self.cluster = Cluster(
            ['cassandra.us-east-1.amazonaws.com'],
            ssl_context=ssl_context,
            auth_provider=auth_provider,
            port=9142)
        self.session = self.cluster.connect("test_keyspace")

Notice ServiceUsername and ServicePassword, you do not yet have them. To create a pair of credentials to use, follow this instructions in this tutorial.

Your Cluster endpoint (cassandra.us-east-1.amazonaws.com) could differ from the one I have used in my example. You can find out your endpoint by visiting this list and use the endpoint corresponding to your AWS Region.

Now, let's create a method for the Cassandra class that we can use to execute queries -

class Cassandra:
    ...
    def execute(self, query):
            return self.session.execute(SimpleStatement(
                query, consistency_level=ConsistencyLevel.LOCAL_QUORUM))

Look, we used the ConsistencyLevel object here! What is this bird?

Consistency Level is the number of nodes which need to confirm that a particular write operation into a Cassandra database is successful. In simpler words, since Cassandra is a distributed database, and the data stored in Cassandra is split over multiple nodes, a single write operation is successful only when maximum possible nodes acknowledge the write operation as valid. For this, we use different consistency levels, some of which are - LOCAL_ONE, LOCAL_QUORUM, ALL, etc. You can read more about consistency levels in this blog by Ashish Rana on GeeksforGeeks.

Currently, Amazon Keyspaces works only with the LOCAL_QUORUM consistency level.

Now, we can delve into some action!

Create a new file called main.py (or whatever fancy wording you can think of in 1 second), and put the following lines in it to try inserting a new entry to the database -

from db import Cassandra

csql = Cassandra()

# Insert Query
results = csql.execute("INSERT INTO users (id, name, age, city) \
                        VALUES (6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, \
                       'John', 24, 'Delhi')")

Now, save the file and try executing it from the terminal using the following command -

python main.py

If the query is successful, you shall see no errors.

Let's try reading the database and see if we've got the entry right. Comment out the Insert query code in the main.py file and add the following lines -

# Read query
results = csql.execute("SELECT * FROM users")
print([x for x in results])

You should see the output similar to this -

[Row(id=UUID('6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47'), city='Delhi', age=24, name='John')]

If not, you need to observe the error message and try fixing the code! You can find the full code for this tutorial at - https://github.com/xprilion/python-amazon-keyspaces

Cassandra can be a great tool if you're looking to build highly scalable and mission-critical applications (given you need something like Cassandra, at all) and Amazon Keyspaces makes it very simple and efficient to use and manage.

It is possible to use Amazon Keyspaces with other backends as well, and I shall leave it for you to explore them at your interest.

Thanks for reading this!


Subscribe to my newsletter

I often write about my experiences, sometimes tech, sometimes life



© Anubhav Singh 2024