In Tarantool, you can combine a super-fast database and application to work with them. Here's how easy it is

Five years ago I tried working with Tarantool, but then it did not come to me. But recently I hosted a webinar where I talked about Hadoop, about how MapReduce works. There I was asked a question - "Why not use Tarantool for this task?"



For the sake of curiosity, I decided to return to it, test the latest version - and this time I really liked the project. Now I will show you how to write a simple application in Tarantool, load it and test the performance, and you will see how easy and cool everything is there.







What is Tarantool



Tarantool positions itself as a super-fast database. You can push any data you want there. Plus, replicate them, shard - that is, split a huge amount of data across several servers and combine the results from them - to make fault-tolerant "master-master" bundles.



Secondly, it is an application server. You can write your applications on it, work with data, for example, delete old records in the background according to certain rules. You can write an Http server directly in Tarantula, which will work with data: issue their amount, write new data there and reduce it all to the master.



I read an article about how the guys made a 300-line message queue that just tears and flies - they have a minimum performance of 20,000 messages per second. Here you can really expand and write a very large application, and it will not be stored as in PostgreS.



About such a server, just simple, I will try to describe in this article.



Installation



For the test, I started three standard virtual machines - a 20 gigabyte hard drive, Ubuntu 18.04. 2 virtual CPU and 4 gigabytes of memory.



We install Tarantool - run the bash script or add a repository and do apt get install Tarantool. The link to the script is (curl -L https://tarantool.io/installer.sh | VER = 2.4 sudo -E bash). We have such commands as:



tarantoolctl - the main command for managing Tarantula instances.

/ etc / tarantool - all configuration lies here.

var / log / tarantool - this is where the logs are located.

var / lib / tarantool - this is where the data is stored , and then it is split into instances.



There are folders instance-available and instance-enable - it contains what will be launched - an instance configuration file with lua code, which describes on which ports it listens, what memory is available to it, Vinyl engine settings, code that is triggered at startup servers, sharding, queues, deleting obsolete data, and so on.



Instances work like PostgreS. For example, you want to run multiple copies of a database that hangs on different ports. It turns out that several database instances are launched on the same server, which hang on different ports. They can have completely different settings - one instance implements one logic, the second - another.



Instance management



We have the tarantoolctl command that allows you to manage your Tarantula instances. For example, tarantoolctl check example will check the configuration file and say - the file is ok if there are no syntax errors.



You can see the status of the instance - tarantoolctl status example. In the same way you can do start, stop, restart.



When instance is running, there are two ways to connect to it.



1. Administrative Console



By default, Tarantool opens a socket, where it transmits plain ASCII text to control the Tarantula. Connection to the console is always done under the admin user, there is no authentication, so you do not need to move the console port out to manage the Tarantula.



To connect in this way, enter Tarantoolctl enter instance name. The command will launch the console and connect as admin user. Never expose the console port to the outside - it's best to leave it as a unit socket. Then only those who have write access to the socket will be able to connect to the Tarantula.



This method is needed for administrative things. To work with data, use the second method - the binary protocol.



2. Using a binary protocol to connect to a specific port



There is a listen directive in the configuration, which opens the port for external communications. This port is used with a binary protocol and authentication is enabled there.



For this connection, tarantoolctl connect to port number is used. Using it, you can connect to remote servers, use authentication and give various access rights.



Data logger and Box module



Since Tarantool is both a database and an application server, it has various modules. We are interested in the box module - it implements work with data. When you write something to box, Tarantool writes the data to disk, stores it in memory, or does something else with it.



Recording



For example, we go into the box module and call the box.once function. It will make Tarantool run our code when the server is initialized. We create a space in which our data will be stored.



local function bootstrap()
    local space = box.schema.create_space('example')
    space:create_index('primary')
    box.schema.user.grant('guest', 'read,write,execute', 'universe')

    -- Keep things safe by default
    --  box.schema.user.create('example', { password = 'secret' })
    --  box.schema.user.grant('example', 'replication')
    --  box.schema.user.grant('example', 'read,write,execute', 'space', 'example')
end


After that, we create a primary index - primary - by which we can search for data. By default, if you do not specify any parameters, the first field in each entry for the primer index will be used.



Then we make a grant to the guest user, under which we connect via a binary protocol. Allowing read, write, and execute across the entire instance.



Compared to conventional databases, everything is quite simple here. We have a space - an area that simply stores our data. Each entry is called a tuple. It is packed in a MessagePack. This is a very cool format - it is binary and takes up less space - 18 bytes versus 27.







It's quite convenient to work with. Almost every line, every data record can have completely different columns.



We can look at all the spaces using the Box.space command. To select a specific instance - write box.space example and get full information on it.



Tarantool has two built-in engines: Memory and Vinyl. Memory stores all data in memory. Therefore, everything works simply and quickly. The data is dumped to disk, and there is also a write ahead log mechanism, so we will not lose anything if the server crashes.



Vinyl stores data on disk in a more familiar form - that is, you can store more data than we have memory, and the Tarantula will read it from disk.



We will now use Memory.



unix/:/var/run/tarantool/example.control> box.space.example
---
- engine: memtx
  before_replace: 'function: 0x41eb02c8'
  on_replace: 'function: 0x41eb0568'
  ck_constraint: []
  field_count: 0
  temporary: false
  index:
    0: &0
      unique: true
      parts:
      - type: unsigned
        is_nullable: false
        fieldno: 1
      id: 0
      space_id: 512
      type: TREE
      name: primary
    primary: *0
  is_local: false
  enabled: true
  name: example
  id: 512
...

unix/:/var/run/tarantool/example.control>


Index:



The primary index should be created for any space, because without it nothing will work. As with any database, we create the first field - the record ID.



Parts:



Here we indicate what our index consists of. It consists of one part - the first field we will use is unsigned, a positive integer. As I recall from the documentation, the maximum number that can be is 18 quintillion. Awesome lot.



Then we can insert data using the insert command.



unix/:/var/run/tarantool/example.control> box.space.example:insert{1, 'test1', 'test2'}
---
- [1, 'test1', 'test2']
...

unix/:/var/run/tarantool/example.control> box.space.example:insert{2, 'test2', 'test3', 'test4'}
---
- [2, 'test2', 'test3', 'test4']
...

unix/:/var/run/tarantool/example.control> box.space.example:insert{3, 'test3'}
---
- [3, 'test3']
...

unix/:/var/run/tarantool/example.control> box.space.example:insert{4, 'test4'}
---
- [4, 'test4']
...

unix/:/var/run/tarantool/example.control>


The first field is used as the primary key, so it must be unique. We are not limited by the number of columns, so we can insert as many data as we like. They are specified in the MessagePack format that I described above.



Data output



Then we can display data with the select command.



Box.example.select with the key {1} will display the required record. If we omit the key, we will see all the records that we have. They are all different in the number of columns, but here, in principle, there is no concept of columns - there are field numbers.



There can be absolutely a lot of data. And for example, we need to search for them by the second field. For this we create a new secondary index.




box.space.example:create_index( β€˜secondary’, { type = β€˜TREE’, unique = false, parts = {{field = 2, type =’string’} }}) 


We use the Create_index command.

We call it Secondary.



After that, you need to specify the parameters. The index type is TREE. It may not be unique, so we enter Unique = false.



Then we indicate which parts our index consists of. Field is the number of the field to which we bind the index, and specify the type string. And so it was created.



unix/:/var/run/tarantool/example.control> box.space.example:create_index('secondary', { type = 'TREE', unique = false, parts = {{field = 2, type = 'string'}}})
---
- unique: false
  parts:
  - type: string
    is_nullable: false
    fieldno: 2
  id: 1
  space_id: 512
  type: TREE
  name: secondary
...

unix/:/var/run/tarantool/example.control>


Now this is how we can call it:



unix/:/var/run/tarantool/example.control> box.space.example.index.secondary:select('test1')
---
- - [1, 'test1', 'test2']
...


Preservation



If we restart the instance and try to call the data again, we will see that they are not there - everything is empty. This happens because Tarantool makes checkpoints and saves data to disk, but if we stop working before the next save, we will lose all operations - because we will recover from the last checkpoint, which was, for example, two hours ago.



Saving every second will not work either - because constantly dumping 20 GB to disk is so-so.



For this, the write-ahead log concept was invented and implemented. It creates an entry in a small write-ahead log file for every change in data.



Each entry before the checkpoint is saved in them. For these files, we set the size - for example, 64 MB. When it fills up, the recording starts going to the second file. And after restarting, Tarantool restores from the last checkpoint and then rolls over all later transactions until it stops.







To make such a recording, you need to specify the option in the box.cfg settings (in the example.lua file):



wal_mode = β€œwrite”;


data usage



With what we have written now, you can use the Tarantula to store data, and it will work very quickly as a database. And now the cherry on top of the cake - what can you do with everything.



Writing an application



For example, let's write such an application for Tarantula



See the application under the spoiler
box.cfg {
    listen = '0.0.0.0:3301';
    io_collect_interval = nil;
    readahead = 16320;
    memtx_memory = 128 * 1024 * 1024; -- 128Mb
    memtx_min_tuple_size = 16;
    memtx_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
    vinyl_memory = 128 * 1024 * 1024; -- 128Mb
    vinyl_cache = 128 * 1024 * 1024; -- 128Mb
    vinyl_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
    vinyl_write_threads = 2;
    wal_mode = "write";
    wal_max_size = 256 * 1024 * 1024;
    checkpoint_interval = 60 * 60; -- one hour
    checkpoint_count = 6;
    force_recovery = true;
    log_level = 5;
    log_nonblock = false;
    too_long_threshold = 0.5;
    read_only   = false
}

local function bootstrap()
    local space = box.schema.create_space('example')
    space:create_index('primary')

    box.schema.user.create('example', { password = 'secret' })
    box.schema.user.grant('example', 'read,write,execute', 'space', 'example')

    box.schema.user.create('repl', { password = 'replication' })
    box.schema.user.grant('repl', 'replication')
end

-- for first run create a space and add set up grants
box.once('replica', bootstrap)

-- enabling console access
console = require('console')
console.listen('127.0.0.1:3302')

-- http config
local charset = {}  do -- [0-9a-zA-Z]
    for c = 48, 57  do table.insert(charset, string.char(c)) end
    for c = 65, 90  do table.insert(charset, string.char(c)) end
    for c = 97, 122 do table.insert(charset, string.char(c)) end
end

local function randomString(length)
    if not length or length <= 0 then return '' end
    math.randomseed(os.clock()^5)
    return randomString(length - 1) .. charset[math.random(1, #charset)]
end

local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')

local httpd = http_server.new('0.0.0.0', 8080, {
    log_requests = true,
    log_errors = true
})

local router = http_router.new()

local function get_count()
 local cnt = box.space.example:len()
 return cnt
end

router:route({method = 'GET', path = '/count'}, function()
    return {status = 200, body = json.encode({count = get_count()})}
end)

router:route({method = 'GET', path = '/token'}, function()
    local token = randomString(32)
    local last = box.space.example:len()
    box.space.example:insert{ last + 1, token }
    return {status = 200, body = json.encode({token = token})}
end)

prometheus = require('prometheus')

fiber = require('fiber')
tokens_count = prometheus.gauge("tarantool_tokens_count",
                              "API Tokens Count")

function monitor_tokens_count()
  while true do
    tokens_count:set(get_count())
    fiber.sleep(5)
  end
end
fiber.create(monitor_tokens_count)

router:route( { method = 'GET', path = '/metrics' }, prometheus.collect_http)

httpd:set_router(router)
httpd:start()




We declare some label in lua that defines symbols. This plate is needed to generate a random line.



local charset = {}  do -- [0-9a-zA-Z]
    for c = 48, 57  do table.insert(charset, string.char(c)) end
    for c = 65, 90  do table.insert(charset, string.char(c)) end
    for c = 97, 122 do table.insert(charset, string.char(c)) end
end


After that, we declare a function - randomString and give the length value in parentheses.



local function randomString(length)
    if not length or length <= 0 then return '' end
    math.randomseed(os.clock()^5)
    return randomString(length - 1) .. charset[math.random(1, #charset)]
end


Then we connect the http-router and http-server to our Tarantula-server, JSON, which we will send to the client.



local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')


After that, we start on port 8080 on all interfaces of the http server, which will log all requests and errors.



local httpd = http_server.new('0.0.0.0', 8080, {
    log_requests = true,
    log_errors = true
})


Next, we declare route that if a request with the GET method comes to port 8080 / count, then we call the function from one line. It returns the status - 200, 404, 403 or whatever we specify.



router:route({method = 'GET', path = '/count'}, function()
    return {status = 200, body = json.encode({count = get_count()})}
end)


In the body we return json.encode, in it we specify count and getcount, which is called and shows the number of records in our database.



Method two



router:route({method = 'GET', path = '/token'}, function() 
    local token = randomString(32) 
    local last = box.space.example:len() 
    box.space.example:insert{ last + 1, token } 
    return {status = 200, body = json.encode({token = token})}
end)


Where in the line router: route ({method = ' GET', path = '/ token'}, function () , we call the function and generate a token.



Bar local token = randomString (32) - is Randomnaya string of 32 characters.

Line local last = box.space.example: len () we pull out the last element,

and in the line box.space.example: insert {last + 1, token} we write to our database, that is, we just increase the ID by 1. This can be done by the way, not only in such a clumsy way. In Tarantula there are sequences for this case. We



write a token there.



Thus, we wrote an application in one file. In it, you can immediately handle the data, and the box module will do all the dirty work for you ...



It listens to http and works with data, everything is in a single instance - both the application and the data. Therefore, everything happens quickly enough.



To run, we install the http module:



How we do it, look under the spoiler
root@test2:/# tarantoolctl rocks install http
Installing http://rocks.tarantool.org/http-scm-1.src.rock
Missing dependencies for http scm-1:
   checks >= 3.0.1 (not installed)

http scm-1 depends on checks >= 3.0.1 (not installed)
Installing http://rocks.tarantool.org/checks-3.0.1-1.rockspec

Cloning into 'checks'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 28 (delta 1), reused 16 (delta 1), pack-reused 0
Receiving objects: 100% (28/28), 12.69 KiB | 12.69 MiB/s, done.
Resolving deltas: 100% (1/1), done.
Note: checking out '580388773ef11085015b5a06fe52d61acf16b201'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

No existing manifest. Attempting to rebuild...
checks 3.0.1-1 is now installed in /.rocks (license: BSD)

-- The C compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Found TARANTOOL: /usr/include (found version "2.4.2-80-g18f2bc82d")
-- Tarantool LUADIR is /.rocks/share/tarantool/rocks/http/scm-1/lua
-- Tarantool LIBDIR is /.rocks/share/tarantool/rocks/http/scm-1/lib
-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    version


-- Build files have been written to: /tmp/luarocks_http-scm-1-V4P9SM/http/build.luarocks
Scanning dependencies of target httpd
[ 50%] Building C object http/CMakeFiles/httpd.dir/lib.c.o
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:32:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c: In function β€˜tpl_term’:
/usr/include/tarantool/lauxlib.h:144:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
    (*(B)->p++ = (char)(c)))
    ~~~~~~~~~~~^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:62:7: note: in expansion of macro β€˜luaL_addchar’
       luaL_addchar(b, '\\');
       ^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:63:6: note: here
      default:
      ^~~~~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:39:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h: In function β€˜tpe_parse’:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:147:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
    type = TPE_TEXT;
    ~~~~~^~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:149:3: note: here
   case TPE_LINECODE:
   ^~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:40:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h: In function β€˜httpfast_parse’:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:372:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
                 code = 0;
                 ~~~~~^~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:374:13: note: here
             case status:
             ^~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:393:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
                 state = message;
                 ~~~~~~^~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:395:13: note: here
             case message:
             ^~~~
[100%] Linking C shared library lib.so
[100%] Built target httpd
[100%] Built target httpd
Install the project...
-- Install configuration: "Debug"
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/VERSION.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lib/http/lib.so
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/tsgi_adapter.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/nginx_server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/fs.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/matching.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/middleware.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/request.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/response.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/tsgi.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/utils.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/mime_types.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/codes.lua
http scm-1 is now installed in /.rocks (license: BSD)

root@test2:/#




We also need prometheus to run:



root@test2:/# tarantoolctl rocks install prometheus
Installing http://rocks.tarantool.org/prometheus-scm-1.rockspec

Cloning into 'prometheus'...
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 19 (delta 2), reused 5 (delta 0), pack-reused 0
Receiving objects: 100% (19/19), 10.73 KiB | 10.73 MiB/s, done.
Resolving deltas: 100% (2/2), done.
prometheus scm-1 is now installed in /.rocks (license: BSD)

root@test2:/#


We start and can access the modules



root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive

{"token":"e2tPq9l5Z3QZrewRf6uuoJUl3lJgSLOI"}

root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive

{"token":"fR5aCA84gj9eZI3gJcV0LEDl9XZAG2Iu"}

root@test2:/# curl -D - -s http://127.0.0.1:8080/count
HTTP/1.1 200 Ok
Content-length: 11
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive

{"count":2}root@test2:/#


/ count gives us 200 status.

/ token issues a token and writes this token to the database.



Testing speed



Let's run a benchmark with 50,000 queries. There will be 500 competitive requests.



root@test2:/# ab -c 500 -n 50000 http://127.0.0.1:8080/token
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests


Server Software:        Tarantool
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /token
Document Length:        44 bytes

Concurrency Level:      500
Time taken for tests:   14.578 seconds
Complete requests:      50000
Failed requests:        0
Total transferred:      7950000 bytes
HTML transferred:       2200000 bytes
Requests per second:    3429.87 [#/sec] (mean)
Time per request:       145.778 [ms] (mean)
Time per request:       0.292 [ms] (mean, across all concurrent requests)
Transfer rate:          532.57 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   10 103.2      0    3048
Processing:    12   69 685.1     15   13538
Waiting:       12   69 685.1     15   13538
Total:         12   78 768.2     15   14573

Percentage of the requests served within a certain time (ms)
  50%     15
  66%     15
  75%     16
  80%     16
  90%     16
  95%     16
  98%     21
  99%     42
 100%  14573 (longest request)
root@test2:/#


Tokens are issued. And we are constantly recording data. 99% of requests completed in 42 milliseconds. Accordingly, we have about 3500 requests per second on a small machine, where there are 2 cores and 4 gigabytes of memory.



You can also select some 50,000 token and see its value.



You can use not only http, you can run background functions that process your data. Plus, there are various triggers. For example, you can call functions on updates, check something - fix conflicts.



You can write application scripts right in the database server itself, and be unlimited, connect any modules and implement any logic.



An application server can access external servers, collect data and store it in its database. Data from this database will be used by other applications.



This will be done by Tarantula itself, and there is no need to write a separate application.



Finally



This is only the first part of a lot of work. The second one will be published very soon on the Mail.ru Group blog, and we will definitely add a link to it in this article.



If you are interested in attending events where we create such things online and asking questions in real time, join the DevOps by REBRAIN channel .



If you need a move to the cloud or have questions about your infrastructure, feel free to leave a request .



PS We have 2 free audits per month, perhaps your project will be among them.



All Articles