Developing a Performance Data Model for Cassandra

DataStax is working on building a performance data model for Apache Cassandra. What this work is and how to do it correctly, at the Cassandra Day Russia 2021 conference, Artyom Chebotko, Solutions Architect at DataStax, told.







image







Apache Cassandra. DataStax. use cases, . .

. , Cassandra , , . . 3 , . , , .







Cassandra



Cassandra , , KEYSPACE โ€” . . , replication strategy, - replication factors .







image







DC-WEST โ€” - replication factor 3. DC-EAST replication factor 5. KEYSPACE. , KEYSPACE, replication strategy.







KEYSPACE . Create Table โ€” .







image







. SQL: 4 , 4 . primary key โ€” โ€” , , 2 . โ€” year. , partition key, . โ€” name. clustering key, , .







image







Partition key YEAR , . . YEAR partition key. partition. , 2015 partition, 2015 partition. - .







image







โ€” Cassandra , , , replication factor. , partition โ€” - 3 , - 5 . 1- partition 3 . partition key Cassandra , , , .







image







KEYSPACE, โ€” Cassandra Query Language, Structured Query Language, SQL.







, Create Table, .







image







partition key, , primary key partition key , , clustering key. , clustering key.







, . , . , , - , . partition, partition.







clustering order by โ€” , partition, . , , clustering key. Cassandra , . , , , .







image







, partitions. , primary key. primary key ID, partition key. partition . . ยซ , ยป โ€” Single-Row Partitions. , Cassandra. partitions , 1. Multi-Row Partitions.







image







, partition key, clustering key, Cassandra, . . . . 10 , . partition partition - .







partition key. Venue year โ€” ยซยป ยซยป. DataStax Accelerate. partition key . , โ€” - . title, โ€” . .







Country , partition, . , .







. . ? , 5 , , K โ€” partition key, โ€” clustering key, โ€” ascending descending, , โ€” . S โ€” .







image







, . , CQL. SQL: select, from, where, group by, order by, limit. allow filtering โ€” .







image







Select โ€” , from โ€” . Cassandra . . , join โ€” , union โ€” , intersection โ€” . , 2 , . , , , join, , join.







where โ€” , primary key. partition key โ€” . โ€” โ€” clustering key, , /. . use cases, , .







Group by primary key , .

Order by โ€” . Cassandra , . , . , . . .







Limit โ€” .







llow filtering โ€” , . , . , , , , .







, artefacts_by_venue.







image







artefacts, venue - , year - , partition key. partition key clustering key โ€” . clustering key. : partition key clustering key.







, .







image







, venue. partition key, Cassandra , , . partition key, clustering key.







venue, year โ€” partition key, title , primary key, . Country. . , , .







image







Primary key , . -, , partition key, partition , , partition. .







clustering key ( ). , join, - , , . , , , . .









โ€” . , , . .







image







โ€” . . โ€” . , โ€” , . , . โ€” , ( ). , .







, , . , โ€” access patterns . . , , , , . . , , .







- โ€” โ€” .







, Cassandra , (consistency) , , . โ€” join . , .







, โ€” , , , . , .







image







4 :







  1. .
  2. , , .
  3. , .
  4. .


:







  1. Conceptual Data Model.
  2. Application Workflow Model.
  3. Logical Data Model.
  4. Physical Data Model.


- : Entity-Relationship Diagram (-), Application Workflow Diagram ( ), Chebotko Diagram Chebotko Diagram&CQL.







. โ€” .







, : ยซ โ€” Conceptual Data Model Application Workflow Modelยป? . , , . , . , , .







: ? consistency level , ?



: , . . , . ? partition key, Cassandra- , . 100 , replication factor 3, partition key , 3 โ€” . secondary index partition key, 100 , .



?

  1. partition key
  2. . , OLTP-, , . Cassandra, -. . - Cassandra โ€” Spark, - . - -, , , .




consistency level . , . .



, , .




DataStax Academy , 2. , . , : , .









โ€” Internet of Things . ? - , , . , , , , . - , , , . .







image







.







, . , ?







image







, - . - โ€” .







, , . , . , , - .







, . โ€” , . โ€” , . , . ID โ€” . , โ€” . โ€” โ€” , , : , , . , .







, , , โ€” . , . ID timestamp - . โ€” timestamp โ€” .







, Entity-Relationship (-), . , . , .







image







Application Workflow Model โ€” . : , .







Application Workflow . . . : - โ€” , . , - , . . - data access pattern. , , batch.







4 , 4 4 . , , 1 โ€” . ?







  1. .
  2. . ? . , . . .
  3. : .
  4. : .


. , . ? : . โ€” : /. clustering key, partition key. , . , , ID .







image







, . , , Application Workflow. โ€” . โ€” , . , , DataStax Academy.







sensors_bynetwork โ€” . Network โ€” partition key, partition. Temperatures by_sensor โ€” , timestamp. , + . timestamp clustering key, . , . .







image







, ? , . โ€” . 3 . โ€” . bucket โ€” partition key, name โ€” clustering key. partition . partition. Bucket โ€” , , partition.







: networks โ€” . , partition.







? week โ€” . . partition key. . partition , partition . ? โ€” , . , , . , .







, , 100 000 100 . . , 5 , - 100 . 100 000 - โ€” 10 . - 100 000 โ€” 1 . .







, ? , , โ€” 24 . , . 1 000 โ€” 24 * 1 000 = 24 000 . , , . , . . .







โ€” . โ€” . timestamp โ€” .







: , like - , ?



secondary indexes, , , secondary indexes . , , Cassandra . , , , . , โ€” solar indexes, Cassandra, .

, โ€” . , CQL. . . , KEYSPACE, . , , , , , partition key, clustering key โ€” . โ€” CQL , , Stargate API โ€” .







image







2 : , . , , . , partition, .. bucket = all. , , , partition.







. forest-net, , . : network = forest-net, -. - . . .







, , ? ? 2 partition, 2 . , . 2 : , . . , in, . in, , 2 . , .







, , , .









โ€” . , . .







image







, . โ€” . , . , โ€” ยซยป ยซยป. - . , mutual funds ( ), ETF (Exchange-traded fund). . , .







. keys, username, , , โ€” . . , . , . -, , : , . , .







image







Workflow โ€” 3 . . , , . โ€” . , . . 5 . , 5 , . , . โ€” . โ€” : . โ€” + + + . โ€” + + . .







, ?







image







4 3- . 3.1 3.2. , , , . Trade_id โ€” id . , : . partition โ€” , trade_id.







, . ? . โ€” . โ€” . , .







, trades_by_a_d ? ? , โ€” . , . , , 100 000 โ€” . โ€” โ€” . , , , 100 000 .







image







, โ€” trade_id . Trade_id โ€” TIMEUUID. UUID โ€” . timestamp, . , .







, - . .







image







? , TIMEUUID? TIMEUUID timestamp .







image







, , , . TIMEUUID โ€” , .







, โ€” TIMEUUID, . trade_id > maxTIMEUUID โ€” , , . , timestamp. timestamp . .







: . ?



: ? โ€” update insert . , . : trades โ€” 4 , , -. -. ? baches, . baches , , baches, partition, . .



partition , . insert application retry, - . - โ€” - , - , . Spark , , . join Spark, .



All Articles