Riak Cloud Storage. Part 1. Setting up Riak KV

Riak CS (Cloud Storage) is an easy-to-use object storage software that runs on top of Riak KV. Riak (KV) is a distributed NoSQL key-value database. Riak CS is designed to provide simplicity, availability, and distribution of cloud storage of any size, and can be used to build cloud architectures - both public and private - or as infrastructure storage for highly loaded applications and services. The Riak CS API is compatible with Amazon S3 and supports the ability to generate reports for various situations.



image


This article is a free translation of the official manual for the Riak CS system version 2.1.1.In



the Riak CS storage system, three components work together with each other, which means that each component must be configured to work with other components:



  • Riak (KV) is a database system that acts as an end system.
  • Riak CS is a cloud storage layer on top of Riak that provides storage and API capabilities, stores files and metadata in Riak, and then transfers them to end users.
  • Stanchion - Manages queries involving globally unique entities such as buckets and users in a Riak instance. For example, creating users, creating or deleting buckets.


Additionally, you can also configure the S3 client for use in messaging with the Riak CS system.



You should plan to have one Riak Node for each Riak CS Node on your system. Riak and Riak CS nodes can run on different physical machines, but in most cases it is preferable to run one Riak node and one Riak CS node on the same physical machine. Assuming a single physical machine has enough power to meet the needs of both Riak and Riak CS nodes, you will generally see better performance due to reduced network latency.



If your system consists of multiple nodes, configuration is primarily about setting up communication between components. Other settings, such as where the log files will be stored, have default values ​​and only need to be changed if you want to use non-standard values.



Configuring system components. Setting up Riak KV for CS



Since Riak CS is an application built on top of Riak, it is very important to pay attention to your Riak configuration when starting Riak CS. This document is both a Riak configuration guide and also a reference document for describing important configuration parameters.



Make sure Riak KV and Riak CS are installed on each node in your cluster before configuring. Stanchion, on the other hand, should only be installed on one node in the entire cluster.



image


Backends for Riak CS



By default the backend used by Riak is Bitcask, but the Riak CS package includes a special backend that must be used by the Riak cluster that is part of the Riak CS system. The regular version has a standard Multi backend that ships with Riak.



The same Riak buckets used internally by Riak CS use secondary indexes that now require the LevelDB backend. Other parts of the Riak CS system can benefit from the Bticask backend. The use of the exemplary Multi backend is included in Riak CS to take advantage of both of these backends to achieve the best combination of performance and functionality. The next section describes how to properly configure Riak to use this Multi-backend.



The backend is what Riak will use to save the data. There are several backends in the Riak KV arsenal: Bitcask, LevelDB, Memory and Multi.


Additionally, the storage computation system uses Riak MapReduse to summarize files into buckets. This means that you must tell all Riak nodes where to look for the provisioned Riak CS files before computing storage.



Several other parameters must be changed in order to configure the Riak node as part of the Riak CS system, such as the IP address and IP address and port for messaging via Protocol Buffers. The rest of the settings can be changed if necessary. The following sections describe how to configure a Riak node to operate as part of a Riak CS system.



Setting up the Riak backend



First, riak.conf or advanced.config / app.config config files are edited. These files can be located in / etc / riak or / opt / riak / etc directories. By default, Riak uses the Bitcask backend. The first thing we need to do is change the config file by removing the following line:



RIAK.CONF



## Delete this line:
storage_backend = bitcask


ADVANCED.CONFIG



{riak_kv,
 [ %% Delete this line: 
{storage_backend, riak_kv_bitcask_backend},
 ]}


APP.CONFIG



{riak_kv, 
  [ %% Delete this line:
    {storage_backend, riak_kv_bitcask_backend},
]}


Next, we need to show the need for RiakCS modules for Riak and tell Riak to use the customized Riak CS backend. We need to use the advanced.config or app.config file for this and add the following options:



ADVANCED.CONFIG



{eleveldb, [
    {total_leveldb_mem_percent, 30}
    ]},
{riak_kv, [
    %% Other configs
    {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
    {storage_backend, riak_cs_kv_multi_backend},
    {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
    {multi_backend_default, be_default},
    {multi_backend, [
        {be_default, riak_kv_eleveldb_backend, [
            {data_root, "/var/lib/riak/leveldb"}
        ]},
        {be_blocks, riak_kv_bitcask_backend, [
            {data_root, "/var/lib/riak/bitcask"}
        ]}
    ]},
    %% Other configs
]}


APP.CONFIG



{eleveldb, [
    {total_leveldb_mem_percent, 30}
    ]},
{riak_kv, [
    %% Other configs
    {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
    {storage_backend, riak_cs_kv_multi_backend},
    {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
    {multi_backend_default, be_default},
    {multi_backend, [
        {be_default, riak_kv_eleveldb_backend, [
            {data_root, "/var/lib/riak/leveldb"}
        ]},
        {be_blocks, riak_kv_bitcask_backend, [
            {data_root, "/var/lib/riak/bitcask"}
        ]}
    ]},
    %% Other configs
]}


It is very important to note that many of these values ​​will depend on directory variations specific to your operating system, so follow the instructions accordingly. For example, the add_paths parameter assumes that Riak CS is installed in / usr / lib / riak-cs, while the data_root parameters assumes that Riak is installed in / var / lib. (Note. In my case it was add_paths - / usr / lib64 / riak-cs /).



This configuration assumes the Riak CS is installed on the same machine as the Riak. If not, then the package needs to be copied to a separate host.



Setting up the creation of siblings



Now, we need to set the allow_mult parameter to true. We can add a line in the riak.conf config file, or a riak_core section in advanced.config or app.config.



RIAK.CONF



buckets.default.allow_mult = true




ADVANCED.CONFIG

{riak_core, [
    %% Other configs
    {default_bucket_props, [{allow_mult, true}]},
    %% Other configs
]}


APP.CONFIG



{riak_core, [
    %% Other configs
    {default_bucket_props, [{allow_mult, true}]},
    %% Other configs
]}


This will allow Riak to create the siblings that Riak CS needs to function. If you are connecting to Riak CS using the client library, do not worry: you do not have to resolve conflicts, as all Riak CS operations are strictly consistent according to their definition.



Sibling is a way of storing multiple objects in one key so that the object has different values ​​at different nodes.


Note: allow_mult

Any Riak node that also supports Riak CS will always have the allow_mult parameter set to true. Riak CS will reset startup if the value is false.


Setting the hostname and IP address



Each Riak node has a name that can be specified in riak.conf with the nodename option. If you are using the app.config configuration file, you need to create a file called vm.args in the same directory as app.config and specify the hostname using the -name flag. We recommend that you name the nodes in the @ format. So if you have three running nodes on the same host 100.0.0.1, you can name them riak1@100.0.0.1, riak2@100.0.0.1, and riak3@100.0.0.1, or you can name them more specific, such as test_cluster1@100.0 .0.1, user_data3@100.0.0.1, and so on. The example below demonstrates changing the hostname to riak1@127.0.0.1, which will work on the local host.



RIAK.CONF



 nodename = riak1@127.0.0.1 


VM.ARGS



 -name riak1@127.0.0.1


You must name all nodes before starting and adding them to the cluster.



Tuning test



Now that all the necessary node settings have been completed, we can try to start Riak:



SHELL



 riak start 


Approx. The answer in my case:



image



Here you have to wait a little. Then you can start testing the running node.



SHELL



 riak ping


If the response is pong, then Riak is running: if the response is Node not responding to pings, then something went wrong.



Approx. The answer in my case:



image



If the node did not start correctly, look at the erlang.log.1 log in the / log directory of the node if the problem can be identified. One of the most common errors is invalid_storage_backend. Which indicates that the path to the Riak CS library in advanced.config or app.config is incorrect (or Riak CS is not installed on the server). Despite this error, make sure you haven't changed from riak_cs_kv_multi_backend to riak_kv_multi_backend.



Configuring Riak to Use Protocol Buffers



Riak protocol buffer settings are located in the riak.conf or riak_api section in the advanced.config or app.config files, which are located in the / etc / riak / directory. By default, the host has an IP address of 127.0.0.1 and a port of 8087. You will need to change these if you plan to run Riak and Riak CS outside of your local environment. Replace 127.0.0.1 with the Riak host IP and port 8087 with a suitable one.



RIAK.CONF



 listener.protobuf.internal = 10.0.2.10:10001


ADVANCED.CONF



{riak_api, [
    %% Other configs
    {pb, ["10.0.2.10", 10001]},
    %% Other configs
]}




APP.CONFIG

riak_api, [
    %% Other configs
    {pb, ["10.0.2.10", 10001]},
    %% Other configs
]}


Note: The value of the listener.protobuf.internal parameter in riak.conf (or the value of the pb parameter in the advanced.conf / app.config) file must match the values ​​for riak_host in Riak CS riak-cs.config and Stanchion stanchion.conf (or riak_host respectively in advanced.config / app.config) files.


Note on port number

A different port number may be required if the port conflicts with ports used by another application, or if you are using a load balancer or proxy server.


It is also recommended that users ensure that the size of the Riak protobuf.backlog (or pb_backlog in the advanced.config / app.config files) is equal to or greater than the pool.request.size size specified for the Riak CS in riak-cs.config (or request_pool_size in advanced.config / app.conf files).



If the value of pool.request.size in Riak CS has been changed, then the value of protobuf.backlog in Riak must also be updated.



Other Riak settings



The riak.conf and advanced.config files include other settings that configure how log files are generated and where they are saved. These settings have default values ​​and should work in most cases. For more information, we recommend reading our documentation on config files.



Setting up an IP address for Riak



When configuring an IP address for Riak, you must ensure that the Riak nodes have a unique IP address, whether you are working with just one node or adding more nodes to the system. Riak's IP address is contained in riak.conf or - if you are using the app.config file - in the vm.args configuration file, which is located there in the / etc / riak directory (or / opt / riak / etc / on other operating systems ).



Initially, the line containing Riak's IP address points to the local host at this location:



RIAK.CONF



 nodename = riak@127.0.0.1


VM.ARGS



 -name riak@127.0.0.1


Replace 127.0.0.1 with your preferred IP address or hostname of the Riak host.



Performance and bandwidth settings



For performance reasons, we highly recommend adding values ​​to the riak configuration files riak.conf or vm.args located in the / etc / riak / or / opt / riak / etc directory.



RIAK.CONF



 erlang.max_ports = 65536


VM.ARGS



## This setting should already be present for recent Riak installs.
 -env ERL_MAX_PORTS 65536


Disable JavaScript MapReduce



It is recommended not to use deprecated JavaScript MapReduce in conjunction with any version of Riak CS. For performance reasons, you must disable the virtual machine performing JavaScript MapReduce operations by setting in the riak.conf configuration file, or in the riak_kv section of advanced.conf or app.config:



RIAK.CONF



 javascript.map_pool_size = 0
 javascript.reduce_pool_size = 0
 javascript.hook_pool_size = 0 


ADVANCED.CONFIG



{riak_kv, [
    %% Other configs
    {map_js_vm_count, 0},
    {reduce_js_vm_count, 0},
    {hook_js_vm_count, 0}
    %% Other configs
]}


APP.CONFIG



{riak_kv, [
    %% Other configs
    {map_js_vm_count, 0},
    {reduce_js_vm_count, 0},
    {hook_js_vm_count, 0}
    %% Other configs
]}


Next, we need to configure the remaining components of the Riak CS system.



Links



Riak Cloud Storage. Part 1. Configuring Riak KV

Riak Cloud Storage. Part 2. Configuring the Riak CS

Riak Cloud Storage component . Part 3. Stanchion, Proxy and Load Balancing, S3 Client



Original manual.



All Articles