14 things I would like to know before getting started with MongoDB

The translation of the article was prepared on the eve of the start of the course "Non-relational databases" .










Highlights:



  • It is extremely important to design the schema even though it is optional in MongoDB.
  • Likewise, indexes must match your schema and access patterns.
  • Avoid using large objects and large arrays.
  • Be careful with MongoDB settings, especially when it comes to security and reliability.
  • MongoDB does not have a query optimizer, so you must be careful when performing query operations.


I have been working with databases for a very long time, but only recently discovered MongoDB. There are a few things I would like to know before getting started with it. When a person already has experience in a certain area, they have preconceived ideas about what databases are and what they do. In the hope of making it easier for others to understand, here is a list of common mistakes.



Creating MongoDB Server Without Authentication



Unfortunately MongoDB is installed without authentication by default. It is normal for a workstation to be accessed locally. But since MongoDB is a multi-user system that loves to use large amounts of memory, it’s best if you put it on a server with as much RAM as possible, even if you’re only going to use it for development. Installing on a server via the default port can be problematic, especially if any javascript code can be executed in the request (for example, $whereas an injection idea ).



There are several authentication methods, but the easiest is to set a user ID / password. Take this idea while you think about fancy LDAP- based authentication . In terms of security, MongoDB should be kept up to date and logs should always be checked for unauthorized access. For example, I like to choose a different port as the default port.



Remember to bind the attack surface to MongoDB



The MongoDB Security Checklist contains good tips to reduce the risk of network intrusion and data leakage. It's easy to dismiss it and say that a development server doesn't need a high level of security. However, things are not so simple and this applies to all MongoDB servers. In particular, unless there is a compelling reason to use mapReduce, groupor $ where , you should disable the use of arbitrary JavaScript code by writing in the config file javascriptEnabled:false. Since data files are not encrypted in standard MongoDB, it makes sense to run MongoDB with a Dedicated User who has full file access, with limited access only to him and the ability to use the operating system's own file access controls.



Circuit design error



MongoDB does not use schema. But this does not mean that the circuit is not needed. If you just want to store documents without any consistent layout, saving can be quick and easy, but retrieving them later can be darn hard .



The classic article “ 6 rules of thumb for MongoDB schema design” is worth reading, while features like the Schema Explorer in Studio 3T's third-party tool are worth using for regular schema validation.



Don't forget the sort order



Forgetting sort order can be the most frustrating and wasteful of any other misconfiguration. MongoBD uses binary sort by default . But it is unlikely that it will be useful to anyone. Case-sensitive, stress-sensitive, binary sorts were considered curious anachronisms, along with beads, caftans and curly mustaches, back in the 1980s. Now their use is unforgivable. In real life "motorcycle" is the same as "motorcycle". And "Britain" and "Britain" are one and the same place. A lowercase letter is simply the uppercase equivalent of a capital letter. And don't make me talk about diacritical sorting. Use case-insensitive collation when creating a database in MongoDBthat correspond to the language and culture of the users of the system . This will greatly simplify your search for string data.



Creating Collections with Large Documents



MongoDB is happy to host large documents up to 16MB in size in collections, and GridFS is designed for large documents larger than 16MB. But just because large documents can be placed there, it is not a good idea to keep them there. MongoDB will work best if you save individual documents several kilobytes in size, treating them more like rows in a wide SQL table. Large documents will be a source of performance problems .



Create documents with large arrays



Documents can contain arrays. It is best if the number of elements in the array is far from the four-digit number. If elements are added to the array frequently, it will outgrow the document containing it, and it will need to be moved , which means that the indices will need to be updated as well . When re-indexing a document with a large array, the indices will often be overwritten, since for each element there is a record storing its index. This reindexing also occurs when a document is inserted or deleted.



MongoDB has a so-called "fill factor" that provides space for documents to grow to minimize this problem.

You might think that you can do without indexing arrays. Unfortunately, due to the lack of indexes, you may have other problems. Since documents are scanned from beginning to end, it will take longer to find items at the end of the array, and most operations associated with such a document will be slow .



Don't forget that the order of the stages in the aggregation matters



In a query optimizer database system, the queries you write are explanations of what you want to get, not how to get it. This mechanism works by analogy with ordering in a restaurant: usually you just order a dish, and do not give detailed instructions to the chef.



In MongoDB, you instruct the cook. For example, you need to make sure that data goes through reduceas early as possible in the pipeline using $matchand $project, and sorting occurs only after reduce, and that the search occurs in exactly the order in which you want it. Having a query optimizer that eliminates unnecessary work, optimally organizes the stages, and selects the type of connection can spoil you. In MongoDB, you have more control at the cost of convenience.



Tools likeStudio 3T will make it easy to build aggregation queries in MongoDB . The Aggregation Editor allows you to apply pipeline statements one step at a time, as well as validate the input and output at each step to simplify debugging.



Using quick recording



Never set MongoDB write parameters with high speed but low reliability. This "file-and-forget" mode seems fast because the command returns before the write is made. If the system crashes before data is written to disk, it will be lost and in an inconsistent state. Fortunately, 64-bit MongoDB has logging enabled.



The storage engines MMAPv1 and WiredTiger use logging to prevent this, although WiredTiger can recover to the last matched checkpoint if logging is disabled.



Journaling ensures that the database is in a consistent state after recovery and retains all data until it is written to the journal. The frequency of entries is configured using the parameter commitIntervalMs.



To be sure of the records, make sure that logging is enabled in the configuration file (storage.journal.enabled)and that the frequency of records is appropriate for the amount of information you can afford to lose.



Sorting without index



When searching and aggregating, it is often necessary to sort the data. Hopefully, this is done in one of the final stages, after filtering the result in order to reduce the amount of data being sorted. Even so, you need an index to sort . You can use a single or multiple index.



If there is no suitable index, MongoDB will do without it. There is a 32MB memory limit on the total size of all documents in a sort operation , and if MongoDB reaches this limit, it will either throw an error or return an empty recordset .



Search without index support



Search queries perform a function similar to the JOIN operation in SQL. For the best performance, they need the index of the key value used as the foreign key. This is not obvious since the usage is not reflected in the explain(). Such indices are in addition to the index written in explain(), which in turn is used by the pipeline operators $matchand $sort, when they occur at the beginning of the pipeline. Indexes can now cover any stage of the aggregation pipeline .



Opt out of using multi-update



The method is db.collection.update()used to change a part of an existing document or a whole document, up to a complete replacement, depending on the parameter you specify update. It is not so obvious that it will not process all documents in the collection until you set the option multito update all documents that meet the query criteria.



Don't forget the importance of the order of the keys in the hash table



In JSON, an object consists of an unordered collection of zero or more name / value pairs, where name is a string and value is a string, number, boolean, zero, object, or array.



Unfortunately, BSON places great importance on order when searching. In MongoDB, the order of keys within inline objects matters , i.e. { firstname: "Phil", surname: "factor" }Is not the same as { { surname: "factor", firstname: "Phil" }. That is, you must keep the order of the name / value pairs in documents if you want to be sure you find them.



Don't confuse "null" and "undefined"



The value "undefined" was never valid in JSON according to the official JSON standard (ECMA-404, Section 5), even though it is used in JavaScript. Moreover, for BSON it is deprecated and converted to $null, which is not always a good solution. Avoid using "undefined" in MongoDB .



Use $limit()without$sort()



Very often, when you're developing in MongoDB, it's helpful to just see a sample of the result that will return from a query or aggregation. It is useful for this task $limit(), but it should never be in the final version of the code, unless you use it in front of it $sort. This mechanic is necessary because otherwise you cannot guarantee the order of the result and you cannot reliably view the data. At the top of the result, you will get different records depending on the sort. To work reliably, queries and aggregations must be deterministic, that is, produce the same results each time they are executed. Code, which is $limit(), but not $sort, will not be deterministic and can subsequently cause errors that will be difficult to track down.



Conclusion



The only way to get frustrated with MongoDB is to compare it directly to another type of database, such as a DBMS, or come up with some specific expectation to use it. It's like comparing an orange to a fork. Database systems have specific goals. It is best to simply understand and appreciate these differences for yourself. It would be a shame to put pressure on MongoDB developers because of the path that forced them to follow the DBMS path. I want to see new and exciting ways to solve old problems, such as ensuring data integrity and building data systems that are resilient to failure and attack by malicious users.



MongoDB's 4.0 implementation of ACID transactionality is a good example of how important improvements are being innovated. Multi-document and multi-statement transactions are now atomic. It also became possible to adjust the time it takes to acquire locks and complete hung transactions, as well as change the isolation level.





Read more:






All Articles