For the sake of curiosity, I decided to return to it, test the latest version - and this time I really liked the project. Now I will show you how to write a simple application in Tarantool, load it and test the performance, and you will see how easy and cool everything is there.
What is Tarantool
Tarantool positions itself as a super-fast database. You can push any data you want there. Plus, replicate them, shard - that is, split a huge amount of data across several servers and combine the results from them - to make fault-tolerant "master-master" bundles.
Secondly, it is an application server. You can write your applications on it, work with data, for example, delete old records in the background according to certain rules. You can write an Http server directly in Tarantula, which will work with data: issue their amount, write new data there and reduce it all to the master.
I read an article about how the guys made a 300-line message queue that just tears and flies - they have a minimum performance of 20,000 messages per second. Here you can really expand and write a very large application, and it will not be stored as in PostgreS.
About such a server, just simple, I will try to describe in this article.
Installation
For the test, I started three standard virtual machines - a 20 gigabyte hard drive, Ubuntu 18.04. 2 virtual CPU and 4 gigabytes of memory.
We install Tarantool - run the bash script or add a repository and do apt get install Tarantool. The link to the script is (curl -L https://tarantool.io/installer.sh | VER = 2.4 sudo -E bash). We have such commands as:
tarantoolctl - the main command for managing Tarantula instances.
/ etc / tarantool - all configuration lies here.
var / log / tarantool - this is where the logs are located.
var / lib / tarantool - this is where the data is stored , and then it is split into instances.
There are folders instance-available and instance-enable - it contains what will be launched - an instance configuration file with lua code, which describes on which ports it listens, what memory is available to it, Vinyl engine settings, code that is triggered at startup servers, sharding, queues, deleting obsolete data, and so on.
Instances work like PostgreS. For example, you want to run multiple copies of a database that hangs on different ports. It turns out that several database instances are launched on the same server, which hang on different ports. They can have completely different settings - one instance implements one logic, the second - another.
Instance management
We have the tarantoolctl command that allows you to manage your Tarantula instances. For example, tarantoolctl check example will check the configuration file and say - the file is ok if there are no syntax errors.
You can see the status of the instance - tarantoolctl status example. In the same way you can do start, stop, restart.
When instance is running, there are two ways to connect to it.
1. Administrative Console
By default, Tarantool opens a socket, where it transmits plain ASCII text to control the Tarantula. Connection to the console is always done under the admin user, there is no authentication, so you do not need to move the console port out to manage the Tarantula.
To connect in this way, enter Tarantoolctl enter instance name. The command will launch the console and connect as admin user. Never expose the console port to the outside - it's best to leave it as a unit socket. Then only those who have write access to the socket will be able to connect to the Tarantula.
This method is needed for administrative things. To work with data, use the second method - the binary protocol.
2. Using a binary protocol to connect to a specific port
There is a listen directive in the configuration, which opens the port for external communications. This port is used with a binary protocol and authentication is enabled there.
For this connection, tarantoolctl connect to port number is used. Using it, you can connect to remote servers, use authentication and give various access rights.
Data logger and Box module
Since Tarantool is both a database and an application server, it has various modules. We are interested in the box module - it implements work with data. When you write something to box, Tarantool writes the data to disk, stores it in memory, or does something else with it.
Recording
For example, we go into the box module and call the box.once function. It will make Tarantool run our code when the server is initialized. We create a space in which our data will be stored.
local function bootstrap()
local space = box.schema.create_space('example')
space:create_index('primary')
box.schema.user.grant('guest', 'read,write,execute', 'universe')
-- Keep things safe by default
-- box.schema.user.create('example', { password = 'secret' })
-- box.schema.user.grant('example', 'replication')
-- box.schema.user.grant('example', 'read,write,execute', 'space', 'example')
end
After that, we create a primary index - primary - by which we can search for data. By default, if you do not specify any parameters, the first field in each entry for the primer index will be used.
Then we make a grant to the guest user, under which we connect via a binary protocol. Allowing read, write, and execute across the entire instance.
Compared to conventional databases, everything is quite simple here. We have a space - an area that simply stores our data. Each entry is called a tuple. It is packed in a MessagePack. This is a very cool format - it is binary and takes up less space - 18 bytes versus 27.
It's quite convenient to work with. Almost every line, every data record can have completely different columns.
We can look at all the spaces using the Box.space command. To select a specific instance - write box.space example and get full information on it.
Tarantool has two built-in engines: Memory and Vinyl. Memory stores all data in memory. Therefore, everything works simply and quickly. The data is dumped to disk, and there is also a write ahead log mechanism, so we will not lose anything if the server crashes.
Vinyl stores data on disk in a more familiar form - that is, you can store more data than we have memory, and the Tarantula will read it from disk.
We will now use Memory.
unix/:/var/run/tarantool/example.control> box.space.example
---
- engine: memtx
before_replace: 'function: 0x41eb02c8'
on_replace: 'function: 0x41eb0568'
ck_constraint: []
field_count: 0
temporary: false
index:
0: &0
unique: true
parts:
- type: unsigned
is_nullable: false
fieldno: 1
id: 0
space_id: 512
type: TREE
name: primary
primary: *0
is_local: false
enabled: true
name: example
id: 512
...
unix/:/var/run/tarantool/example.control>
Index:
The primary index should be created for any space, because without it nothing will work. As with any database, we create the first field - the record ID.
Parts:
Here we indicate what our index consists of. It consists of one part - the first field we will use is unsigned, a positive integer. As I recall from the documentation, the maximum number that can be is 18 quintillion. Awesome lot.
Then we can insert data using the insert command.
unix/:/var/run/tarantool/example.control> box.space.example:insert{1, 'test1', 'test2'}
---
- [1, 'test1', 'test2']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{2, 'test2', 'test3', 'test4'}
---
- [2, 'test2', 'test3', 'test4']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{3, 'test3'}
---
- [3, 'test3']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{4, 'test4'}
---
- [4, 'test4']
...
unix/:/var/run/tarantool/example.control>
The first field is used as the primary key, so it must be unique. We are not limited by the number of columns, so we can insert as many data as we like. They are specified in the MessagePack format that I described above.
Data output
Then we can display data with the select command.
Box.example.select with the key {1} will display the required record. If we omit the key, we will see all the records that we have. They are all different in the number of columns, but here, in principle, there is no concept of columns - there are field numbers.
There can be absolutely a lot of data. And for example, we need to search for them by the second field. For this we create a new secondary index.
box.space.example:create_index( βsecondaryβ, { type = βTREEβ, unique = false, parts = {{field = 2, type =βstringβ} }})
We use the Create_index command.
We call it Secondary.
After that, you need to specify the parameters. The index type is TREE. It may not be unique, so we enter Unique = false.
Then we indicate which parts our index consists of. Field is the number of the field to which we bind the index, and specify the type string. And so it was created.
unix/:/var/run/tarantool/example.control> box.space.example:create_index('secondary', { type = 'TREE', unique = false, parts = {{field = 2, type = 'string'}}})
---
- unique: false
parts:
- type: string
is_nullable: false
fieldno: 2
id: 1
space_id: 512
type: TREE
name: secondary
...
unix/:/var/run/tarantool/example.control>
Now this is how we can call it:
unix/:/var/run/tarantool/example.control> box.space.example.index.secondary:select('test1')
---
- - [1, 'test1', 'test2']
...
Preservation
If we restart the instance and try to call the data again, we will see that they are not there - everything is empty. This happens because Tarantool makes checkpoints and saves data to disk, but if we stop working before the next save, we will lose all operations - because we will recover from the last checkpoint, which was, for example, two hours ago.
Saving every second will not work either - because constantly dumping 20 GB to disk is so-so.
For this, the write-ahead log concept was invented and implemented. It creates an entry in a small write-ahead log file for every change in data.
Each entry before the checkpoint is saved in them. For these files, we set the size - for example, 64 MB. When it fills up, the recording starts going to the second file. And after restarting, Tarantool restores from the last checkpoint and then rolls over all later transactions until it stops.
To make such a recording, you need to specify the option in the box.cfg settings (in the example.lua file):
wal_mode = βwriteβ;
data usage
With what we have written now, you can use the Tarantula to store data, and it will work very quickly as a database. And now the cherry on top of the cake - what can you do with everything.
Writing an application
For example, let's write such an application for Tarantula
See the application under the spoiler
box.cfg {
listen = '0.0.0.0:3301';
io_collect_interval = nil;
readahead = 16320;
memtx_memory = 128 * 1024 * 1024; -- 128Mb
memtx_min_tuple_size = 16;
memtx_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
vinyl_memory = 128 * 1024 * 1024; -- 128Mb
vinyl_cache = 128 * 1024 * 1024; -- 128Mb
vinyl_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
vinyl_write_threads = 2;
wal_mode = "write";
wal_max_size = 256 * 1024 * 1024;
checkpoint_interval = 60 * 60; -- one hour
checkpoint_count = 6;
force_recovery = true;
log_level = 5;
log_nonblock = false;
too_long_threshold = 0.5;
read_only = false
}
local function bootstrap()
local space = box.schema.create_space('example')
space:create_index('primary')
box.schema.user.create('example', { password = 'secret' })
box.schema.user.grant('example', 'read,write,execute', 'space', 'example')
box.schema.user.create('repl', { password = 'replication' })
box.schema.user.grant('repl', 'replication')
end
-- for first run create a space and add set up grants
box.once('replica', bootstrap)
-- enabling console access
console = require('console')
console.listen('127.0.0.1:3302')
-- http config
local charset = {} do -- [0-9a-zA-Z]
for c = 48, 57 do table.insert(charset, string.char(c)) end
for c = 65, 90 do table.insert(charset, string.char(c)) end
for c = 97, 122 do table.insert(charset, string.char(c)) end
end
local function randomString(length)
if not length or length <= 0 then return '' end
math.randomseed(os.clock()^5)
return randomString(length - 1) .. charset[math.random(1, #charset)]
end
local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')
local httpd = http_server.new('0.0.0.0', 8080, {
log_requests = true,
log_errors = true
})
local router = http_router.new()
local function get_count()
local cnt = box.space.example:len()
return cnt
end
router:route({method = 'GET', path = '/count'}, function()
return {status = 200, body = json.encode({count = get_count()})}
end)
router:route({method = 'GET', path = '/token'}, function()
local token = randomString(32)
local last = box.space.example:len()
box.space.example:insert{ last + 1, token }
return {status = 200, body = json.encode({token = token})}
end)
prometheus = require('prometheus')
fiber = require('fiber')
tokens_count = prometheus.gauge("tarantool_tokens_count",
"API Tokens Count")
function monitor_tokens_count()
while true do
tokens_count:set(get_count())
fiber.sleep(5)
end
end
fiber.create(monitor_tokens_count)
router:route( { method = 'GET', path = '/metrics' }, prometheus.collect_http)
httpd:set_router(router)
httpd:start()
We declare some label in lua that defines symbols. This plate is needed to generate a random line.
local charset = {} do -- [0-9a-zA-Z]
for c = 48, 57 do table.insert(charset, string.char(c)) end
for c = 65, 90 do table.insert(charset, string.char(c)) end
for c = 97, 122 do table.insert(charset, string.char(c)) end
end
After that, we declare a function - randomString and give the length value in parentheses.
local function randomString(length)
if not length or length <= 0 then return '' end
math.randomseed(os.clock()^5)
return randomString(length - 1) .. charset[math.random(1, #charset)]
end
Then we connect the http-router and http-server to our Tarantula-server, JSON, which we will send to the client.
local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')
After that, we start on port 8080 on all interfaces of the http server, which will log all requests and errors.
local httpd = http_server.new('0.0.0.0', 8080, {
log_requests = true,
log_errors = true
})
Next, we declare route that if a request with the GET method comes to port 8080 / count, then we call the function from one line. It returns the status - 200, 404, 403 or whatever we specify.
router:route({method = 'GET', path = '/count'}, function()
return {status = 200, body = json.encode({count = get_count()})}
end)
In the body we return json.encode, in it we specify count and getcount, which is called and shows the number of records in our database.
Method two
router:route({method = 'GET', path = '/token'}, function()
local token = randomString(32)
local last = box.space.example:len()
box.space.example:insert{ last + 1, token }
return {status = 200, body = json.encode({token = token})}
end)
Where in the line router: route ({method = ' GET', path = '/ token'}, function () , we call the function and generate a token.
Bar local token = randomString (32) - is Randomnaya string of 32 characters.
Line local last = box.space.example: len () we pull out the last element,
and in the line box.space.example: insert {last + 1, token} we write to our database, that is, we just increase the ID by 1. This can be done by the way, not only in such a clumsy way. In Tarantula there are sequences for this case. We
write a token there.
Thus, we wrote an application in one file. In it, you can immediately handle the data, and the box module will do all the dirty work for you ...
It listens to http and works with data, everything is in a single instance - both the application and the data. Therefore, everything happens quickly enough.
To run, we install the http module:
How we do it, look under the spoiler
root@test2:/# tarantoolctl rocks install http
Installing http://rocks.tarantool.org/http-scm-1.src.rock
Missing dependencies for http scm-1:
checks >= 3.0.1 (not installed)
http scm-1 depends on checks >= 3.0.1 (not installed)
Installing http://rocks.tarantool.org/checks-3.0.1-1.rockspec
Cloning into 'checks'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 28 (delta 1), reused 16 (delta 1), pack-reused 0
Receiving objects: 100% (28/28), 12.69 KiB | 12.69 MiB/s, done.
Resolving deltas: 100% (1/1), done.
Note: checking out '580388773ef11085015b5a06fe52d61acf16b201'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
No existing manifest. Attempting to rebuild...
checks 3.0.1-1 is now installed in /.rocks (license: BSD)
-- The C compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Found TARANTOOL: /usr/include (found version "2.4.2-80-g18f2bc82d")
-- Tarantool LUADIR is /.rocks/share/tarantool/rocks/http/scm-1/lua
-- Tarantool LIBDIR is /.rocks/share/tarantool/rocks/http/scm-1/lib
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
version
-- Build files have been written to: /tmp/luarocks_http-scm-1-V4P9SM/http/build.luarocks
Scanning dependencies of target httpd
[ 50%] Building C object http/CMakeFiles/httpd.dir/lib.c.o
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:32:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c: In function βtpl_termβ:
/usr/include/tarantool/lauxlib.h:144:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
(*(B)->p++ = (char)(c)))
~~~~~~~~~~~^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:62:7: note: in expansion of macro βluaL_addcharβ
luaL_addchar(b, '\\');
^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:63:6: note: here
default:
^~~~~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:39:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h: In function βtpe_parseβ:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:147:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
type = TPE_TEXT;
~~~~~^~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:149:3: note: here
case TPE_LINECODE:
^~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:40:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h: In function βhttpfast_parseβ:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:372:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
code = 0;
~~~~~^~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:374:13: note: here
case status:
^~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:393:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
state = message;
~~~~~~^~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:395:13: note: here
case message:
^~~~
[100%] Linking C shared library lib.so
[100%] Built target httpd
[100%] Built target httpd
Install the project...
-- Install configuration: "Debug"
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/VERSION.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lib/http/lib.so
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/tsgi_adapter.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/nginx_server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/fs.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/matching.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/middleware.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/request.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/response.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/tsgi.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/utils.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/mime_types.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/codes.lua
http scm-1 is now installed in /.rocks (license: BSD)
root@test2:/#
We also need prometheus to run:
root@test2:/# tarantoolctl rocks install prometheus
Installing http://rocks.tarantool.org/prometheus-scm-1.rockspec
Cloning into 'prometheus'...
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 19 (delta 2), reused 5 (delta 0), pack-reused 0
Receiving objects: 100% (19/19), 10.73 KiB | 10.73 MiB/s, done.
Resolving deltas: 100% (2/2), done.
prometheus scm-1 is now installed in /.rocks (license: BSD)
root@test2:/#
We start and can access the modules
root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"token":"e2tPq9l5Z3QZrewRf6uuoJUl3lJgSLOI"}
root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"token":"fR5aCA84gj9eZI3gJcV0LEDl9XZAG2Iu"}
root@test2:/# curl -D - -s http://127.0.0.1:8080/count
HTTP/1.1 200 Ok
Content-length: 11
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"count":2}root@test2:/#
/ count gives us 200 status.
/ token issues a token and writes this token to the database.
Testing speed
Let's run a benchmark with 50,000 queries. There will be 500 competitive requests.
root@test2:/# ab -c 500 -n 50000 http://127.0.0.1:8080/token
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests
Server Software: Tarantool
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /token
Document Length: 44 bytes
Concurrency Level: 500
Time taken for tests: 14.578 seconds
Complete requests: 50000
Failed requests: 0
Total transferred: 7950000 bytes
HTML transferred: 2200000 bytes
Requests per second: 3429.87 [#/sec] (mean)
Time per request: 145.778 [ms] (mean)
Time per request: 0.292 [ms] (mean, across all concurrent requests)
Transfer rate: 532.57 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 10 103.2 0 3048
Processing: 12 69 685.1 15 13538
Waiting: 12 69 685.1 15 13538
Total: 12 78 768.2 15 14573
Percentage of the requests served within a certain time (ms)
50% 15
66% 15
75% 16
80% 16
90% 16
95% 16
98% 21
99% 42
100% 14573 (longest request)
root@test2:/#
Tokens are issued. And we are constantly recording data. 99% of requests completed in 42 milliseconds. Accordingly, we have about 3500 requests per second on a small machine, where there are 2 cores and 4 gigabytes of memory.
You can also select some 50,000 token and see its value.
You can use not only http, you can run background functions that process your data. Plus, there are various triggers. For example, you can call functions on updates, check something - fix conflicts.
You can write application scripts right in the database server itself, and be unlimited, connect any modules and implement any logic.
An application server can access external servers, collect data and store it in its database. Data from this database will be used by other applications.
This will be done by Tarantula itself, and there is no need to write a separate application.
Finally
This is only the first part of a lot of work. The second one will be published very soon on the Mail.ru Group blog, and we will definitely add a link to it in this article.
If you are interested in attending events where we create such things online and asking questions in real time, join the DevOps by REBRAIN channel .
If you need a move to the cloud or have questions about your infrastructure, feel free to leave a request .
PS We have 2 free audits per month, perhaps your project will be among them.