Hosting Management: A Head View Tactician

We constantly clean IP addresses, deal with equipment supplies, manage the team, set development priorities, and do a bunch of other things inside the hosting. I want to tell you about how it looks from the position of the chief operating officer.



There are three levels of VDS hosting management: strategic, when you choose what kind of product you make, what hardware and from whom you buy, and in which data centers and on what conditions you get up - in fact, this is the formation of the DNA of hosting. There is an operational one - this is a routine like "to recapture the range of IP addresses that Sony tried to block", "again the hardware is stuck at the border" or "the admin is sick, who to put on the shift", "a complicated ticket of the third line." For example, at the DNA level, we decide that it is necessary to replace support with development (so that the client can decide everything from his personal account), and at the tactical level, we already determine how to do this.



Even such a trifle as a cable of different lengths at hand is a consequence of the fact that someone thought about it in advance.



And there is an intermediate tactical level, which is dealt with by the COO. This is not a solution to a specific formalized problem, but it is not a global strategy either.



The first thing to see is a clear picture of all the resources available. To do this, you need to know the marketing parameters and the parameters of the financial model. That is, you need to know the state of hosting accounts for a week, a month and six months in advance based on the payments that are planned and the expenses that are planned. That is, you need to know the percentage of hardware failure and repairs, employee payments, predict the inflow and outflow of customers, and so on. Second, you need to manage development, that is, choose features to implement and find time for refactoring.



So welcome to the "Hoster finally writes about hosting in his corporate blog" column ...



A bad COO starts collecting data for decision making. A good one already has these data. We make all decisions very quickly, because we are geeks of automation and auto-generation of reports. Every time we need a certain indicator, we do not get it by hand, but write a script that cuts reports on it or adds the indicator to an existing report. Having information about everything in a company is one of the most important things in business management.



▍ IP cleaning



IP-addresses are given to the client once and for all static. By the way, this is why promotional rates for 30 rubles are popular (when they are available) - there is also an IP address in the package, and the cost of the rate is lower than a separate service for allocating this address. But at the same time, if a client somewhere "dirty" his address, we do not change it with a snap of our fingers. Therefore, questions arise regularly:



  1. What to do if a client has a complaint.
  2. What to do if the rented IP address previously belonged to a spammer or similar.


Blocking can be of a different nature. The first thing a hoster encounters is Spamhouse. Basically, this organization should fight spam all over the world. But formally, it independently monitors spammers, botnets and in general everything that is prohibited and that can be automatically detected by not very complex algorithms. Their main function is that if they see spam from some address or find that a DDoS attack is coming from some address, they block it. But the blocking does not happen immediately. First, they send a notice to the company that owns the addresses. These are large companies that buy IP addresses.

They do not buy them for themselves, but for leasing to smaller companies. We rent our address pools from such companies. As a result, the complaint goes down the chain to us. The letter comes from a tenant of a large segment, and we need to write to Spamhouse that measures have been taken. The spam house is rarely wrong, so "measures" are usually to sort out and block. By the way, according to their letters, we periodically update the list of prohibited software in the contract so that there are no doubts at all. We keep the list of prohibited software secret so that it is more difficult to bypass it.



If we get an address that is already in the Spamhouse database, then usually after a letter within 24 hours it is excluded from there. But in the last lease, for example, there was a blacklisted subnet, which we have been buying out for almost a week. If such an address gets to the client, then not everything will work well for him on the Internet. That is, it will not chop off, but different heisenbugs will go at least.



The second key player in Russian jurisdiction is Roskomnadzor. But getting into his locks in practice is a rarity. A notification is received that such-and-such malicious activity at such and such address. We must send this notification to the person who conducts this activity. There are 24 hours to clear. Further, the owner either eliminates the violation, or the RKN sends his address to the black list. The principle of the blacklist has changed several times, but it is always one way or another sending addresses to all telecom operators in automatic mode. It is simply a pool of addresses that are loaded onto the switching equipment in the form of rules. It was this system that became the largest impromptu "advertising campaign" for the use of VPN in Russia at the time.



If a client does not respond to RKN and does not stop malicious activity), then we block such a client from our side. But in practice, there was not a single case of blocking - the client always did everything right. There was a case when we got one lease address with already applied blocking (usually the tenant of the addresses clears his subnets himself before transferring to another client), - we had to write a letter about the lease in RKN with the meaning “there is no such activity there, delete”. Removed pretty quickly.



▍ Development Aspects



We are trying to replace support with development. The first line of support, in fact, performs the functions of a script: it reads tickets, parses them and either sends them to the second line, or does something that could be done by automation. At the very, very beginning of hosting, when we just added the ability to quickly change the parameters of the machine (disk, memory, stop-start) to our personal account, this greatly relieved support. Now we want to get rid of it altogether by making it so that technical issues are resolved by buttons in the personal account.



We are talking about the first line. Complex tickets and tickets related to organization and customer relationships will certainly remain.



Based on this model, we consider development as a means of hosting development (new features) and a means of offloading support (more and more automation). Both functions are business critical, that is, they require a minimum of communication losses. Therefore, we have a flat hierarchy in development. There are no team leads, no “cell leaders” like “I'm a senior, passing two middle and a junior”, there are no managerial functions in development. Management functions are taken over directly by the hosting manager. It is clear that with multiple growth, most likely, a dedicated CTO will be needed, but for now, in any case, the technical condition of the entire company should be the director's function: this is critical for efficiency.



▍ A few words about marketing



Hosting is not our first business. I can say that in marketing, the main difference between hosting and other service enterprises is that IT specialists buy rationally. In the sense that traditional advertising mechanisms do not work, completely different criteria for evaluating "price - quality" are included, people understand the importance of stability and support. As a result, many have their favorite hosting, and it is quite difficult to "transplant" an IT specialist to a new one (ours). This "works - don't touch" creates such a healthy conservatism. Yes, our clients are not only IT specialists (especially in recent years), but in many ways we still sell to rational people.



▍ Aspects of system administration



At first, we had team leads among admins, since a lot of operations had to be done manually. With the automation of hosting, the need for this gradually began to disappear, and the same peer-to-peer network formed as in the case of development. Then it turned out that every time developers do something for internal automation, few resources are freed up for the admins, that is, it becomes easier for them to live in general. As a result of the development of this phenomenon, the development department actually became the head of the administration department. And they automated most of the administrator's functions.



This gave one more important thing: ordinary administrators now do not have direct access to the hardware, but work through our internal control panel. This made it possible to solve a lot of security issues as well. As I said, we are not interested in what guest OS is installed on the machine, and we do not see what is there. Since we did not need incidents with errors or leaks, we took great care in logging all admins' actions, and on automatic checks for their appropriateness, and on reports about it. Well, about the differentiation of rights, of course. Several years ago, one of the traditionally vulnerabilities of hosting, in which human administrators had unrestricted access to host machines, was gone.



We never liked the very technical ability to get into user data, and despite the fact that this is a given and a market standard, we left it. To gain access to a particular host machine, the administrator asks the system for a password. There are several restrictions: for example, the number of cars over a period of time, actions with these machines, and so on. We minimized actions through the console (in fact, this remained the prerogative of developers) and secured business processes from unauthorized access from employees.



As a result, qualified admins focus on hardware and infrastructure issues, while the admin functions are, in fact, performed by support. Moreover, the same support, which is gradually being replaced by scripts.



The result of this approach is that we get serious savings in the number of people needed in the data center, which allows us to keep prices low, and at the same time, we improve the quality of hosting. That is, we are not in the lowest segment of the market, but we have groped the level where the "price - quality" indicator is very popular with customers.



Plus, on the cheapest tariff plans, we do not provide some of the admin support (for example, we do not install Apache, but give a link to the internal marketplace with it and instructions), and on expensive projects we can already provide personal support. As a rule, in such a situation, the client already perfectly understands what he is doing, and the last two times he needed the administrator to select locations, hardware and configurations for the task, and then to conduct tests. But all the same, every month a Chinese client knocks on us, who writes a ticket in broken Russian about the fact that some Chinese application program does not start for him.



▍ Managing people



Any large IT system is about managing chaos with blurred risks. There is not a single completely reliable element in the systems; sooner or later they all fail. The most striking example is components that need to be replaced regularly without downtime. But at the same time, the system itself can remain quite reliable as a whole, if there are mechanisms for self-healing and redundancy. Perhaps I will say a seditious thing, but from a management point of view, people also relate to points of failure and, accordingly, require redundancy.



And this requires a redesign of the architecture of departments and the approach to hiring in general. Some time ago, we were looking for universal people who can solve any hosting problem with one hand. But from the point of view of business support, this is not true. It was difficult when a person was sick, went on vacation. It was impossible without him. Especially in terms of support and administration. We hired an additional shift worker, who normally works in 5/2 mode, and when someone is on vacation or leaves, he temporarily switches to a shift schedule. As a result, in each department we always have one more person than necessary, or these functions of the “additional person” are distributed among several other people. If someone gets sick, then not a single process and not a single task will stop. Maybe they will drop in speed, but they will not stop. It sounds a little strange in the company,which works in the lower price segment, but we counted long periods and it turned out that redundant specialists cost as a whole (in a systemic effect for the company) cheaper than their absence.



What to do with positions that require very unusual people? For example, what to do with a marketing executive? Here we also got burned a couple of times. At first, we thought that we need to take a person from a similar field, ideally with experience in hosting promotion. In principle, the approach is reasonable. The person will not need to be introduced to the topic. He sees what needs to be done. Perhaps he already has some kind of experience that he can multiply by ours and get something more in the end. It all ended sadly, since we did not take into account that if a person goes to competitors, then he already copies our business processes there. As a result, we came to the conclusion that we divide all creative tasks into the same automation as for administration. That is, each business process must be described and covered at least by instructions, or even better - by instructions,examples and tools on how to work with it. The point is that everything that works should be picked up almost seamlessly by another person.



It sounds like an attempt to get rid of the indispensability of employees and turn everything into a square-nesting work, but in fact, a different effect is achieved. It is clear that a good specialist can only be replaced by another good specialist, and at the same time, part of the company's competencies as a whole will be lost, and some new ones will appear. But the most important part is not to let anyone get bogged down in a routine. Routine is for robots. If more than half of your working time is occupied by something that you have already done, or something that, in principle, can be automated or delegated to someone with less working time, then this is a pain for the company. To grow, you need to do things that give the maximum effect, and for this you need to constantly throw off the routine, first to the level of subordination below, and then to automation.



Therefore, by the way, we have a very low turnover. Perhaps because there is no immersion in the routine and there is constant progress.



▍ Selection of specialists



Salaries in IT are a function that is constantly growing. Salaries are very high and are probably now at their maximum for the entire existence of the industry. Now imagine the situation: you have a developer on your team who gets 100 conditional dead raccoons. Such 100 raccoons are received by a developer of another company. This is the market standard. To lure a developer from another company to you, you need to offer him better conditions. As a result, it will be either 110 raccoons and some intangible buns or conditionally 120 raccoons. The process is repeated several iterations, and now we get the equalization of salaries in accordance with the market need for new people and the expectations of the richest companies. This is the normal process of balancing supply and demand.



We have been using the services of HR agencies for a long time. They paid royalties for the selection of personnel. Whoever we didn’t pick! It has gotten very expensive in recent years and you still get a pig in a poke. And in general, we are satisfied with the level of market salaries, only we want to find a person who will stay in the company for more than a year and at the same time without fundamental gaps in education. It is expensive to retrain a person for a month or two so that he leaves in a year.



As a result, we came to the conclusion that it is much more practical to train our own specialists who are already familiar with hosting. For example, a student of a technical university comes to admins. If there is a desire, we take him for an internship in development. Yes, it grows for two years, but then it turns out a person who clearly understands what and how it works. He knows all the hosting problems. Then watch your hands: with the same market salary, we get a person who is much less likely to change jobs (since he has no incentive to do so) and who understands the product much deeper.



Another principle of ours is not to take people to high positions without taking a test or internship in simpler positions.



We also have collegial management. There are companies that have established totalitarian rules of governance. They are very effective in crises, very effective until there are up to 50 people in them, but then they begin to lose people and market efficiency. Then a structure with strong department heads is chosen, or some kind of democratic process is introduced. We have collegial management. As a person who is good at developing small projects into large ones, it infuriates me in places purely emotionally, but this is a good business model.



For example, I have a market understanding that we need some new feature. I can't come into development and say, "We do it like this." I need to sell this idea to colleagues. It is annoying, in fact, the moment when you yourself understand from the very beginning that this needs to be done, but you still need to explain to others why it needs to be done. And this is a test of the ability to explain: it may turn out that the failure occurred exactly there. Nevertheless, if you do not do this, then there will be no healthy situation in the team.



When there are 100–150 of us, it is clear that these principles will be irrelevant, but now they are.



▍ Aspects of equipment supply



The supply of equipment is well automated and covered by typical processes, but only up to a certain limit. Then all this starts to fail on specific orders, and each delivery still needs to be monitored manually.



The general principle is this: we order equipment in Moscow to the office, deliver it to MSK-IX, Ostankino and Korolev after a short setup. Equipment for Russia is also ordered to Moscow. We prepare it in advance; in the terminal data centers, you just need to plug it into the rack, into the network and into the power supply. Delivery to the data center is done by DHL or CDEK by specially trained loaders. It happens that a vendor or distributor on their own immediately sends the equipment to the Russian Federation, and the documents to us. In any case, the final configuration is done from the office when the piece of iron has entered our administration loop.



In Europe, we now buy in Amsterdam and transport to all countries from there. It is not always beneficial and not always fast at first glance, but, as it turned out, this is how the best systemic effect is achieved. Firstly, once we found prices much lower in the Russian Federation and wanted to bring one specific server across the border, because even taking into account customs operations, it turned out much faster and more profitable. It turned out that not everything is so simple - in order to export iron, you need to obtain permission from the FSB. To do this, we turned to logisticians. They made mistakes in the design (well, or they decided to take a little bit of everything, now they don't understand). In any case, special forces surrounded the car right at the border and put the driver face down on the floor. Because you cannot take out server equipment without notifying the FSB. And they, apparently, did not inform. The server was lying for a year as physical evidence,then we had to pay a fine equal to the cost of the server in the declaration. Since then, we realized that moving servers abroad to and fro is not worth doing. We buy in Europe at all costs.



Within Europe, there are almost no delivery questions: servers to Frankfurt, Zurich, London and so on travel from Amsterdam. “Almost” is Switzerland, there is a different customs area. If from Amsterdam to Frankfurt the delivery goes from region to region, then Switzerland needs a separate customs broker. First, he refunds the export tax, then we pay another VAT in Switzerland. But this "returned-given" takes about a month. We had a moment when the servers in Europe were installed and filled - and the Swiss ones were still going to the data center. But it is still cheaper to buy at a single point.



▍ In general



If you know how to manage the selection of people, know all your resources and understand exactly how to implement the company's strategy, this is the ideal operational management. Obviously, in the real world, everything is not even close. In a certain ideal state, there are instructions for everything, everything is clear and predictable, the routine is automated. But many things must be done by hand before describing processes. I remember how I drove the hardware myself, how we corresponded about the first cleaning of IP addresses, how we went to the first negotiations (and then after a couple of years the negotiations were completely abolished), and then it was absolutely not clear that it was necessary to go exactly to the current state of the company. So there is another important question, what exactly are you doing and how, that is, the strategic level. There will be no tactics without a good goal.








All Articles