I gave a talk at recent Cassandra meetup on the data structure that is conceptually used for Cassandras read/write path. Having a solid understanding of this is something critical to debugging and creating an appropriate data model for Cassandra. DataStax academy has a couple highly recommended courses that covers this in a lot more detail. Something it doesn’t mention much is the actual data structure that its based on. Log Structured Merge Trees.
An LSM-tree is composed of two or more tree-like components, each optimized for their type of storage in the case of Cassandra, a small in-memory tree and one or more on disk trees. LSM-Trees are used in Cassandra, HBase, LevelDB, Google Big Table, SQLite4 & more
I walked through some examples on how it works at a high level during my talk, you can see more at: