We create a dream company: no hype

Surely, guys in expensive suits and with a well-hung tongue have appeared in your company more than once, telling fascinatingly that the company will not live even several years without modern IT stuff!



All these data lake (swamp of data), QCD (corporate graveyard of data), data mining (look, do not undermine), data governance (become a slave to your data) and the like do not disappear from their stories, periodically replacing each other. The lifespan of another HYIP rarely exceeds a year or two, but if you wish, any almost forgotten technology will be dug up for you with great pleasure.



At the same time, the big date is sold as such a magical chest from which you can get various miracles: either a flying carpet, or walking boots, or even a shamakhan queen (who is relevant). But, as a rule, a flying carpet is eaten by a magic moth - and it no longer flies, the soles of the boots fell off - and walking in them is inconvenient, but there is nothing to say about the decrepit queen.



In this article I will try to talk about the good old technologies that still work. About what can be learned from the above HYIP technologies - and how to use all this for mere mortals, like us, without hiring a crowd of data Scientologists with salaries> 10 thousand $ per month.



image





The article continues the cycle:

Building a Dream Company: Master Data and Integration

Building a Dream Company: Managing Data Quality



Content



1. Big data: problem statement

2. Master data: an immortal classic

3. How to store data: do you need QCD

4. Normalization, or why do you need data swamps

5. Why does a data scientist get more analysts and do less?

6. Data bus vs microservices

7. How not to get into the hype at all?



1. Big data: problem statement



The role of big data in the development of modern civilization is impressive. But not for the reason that you think.



If the Internet in every village and every phone appeared thanks to porn and social networks (messengers), then big data donated trillions of dollars to manufacturers of hard drives and RAM.



The problem is that the real benefits of modern big data (in the broadest sense of the word) for all mankind are close to the benefits of pornography, i.e. with a few exceptions ... zero!



How so, you will be surprised. After all, any consultant and salesperson tells a whole dozen examples, from General Electric with their diagnostics of the condition of aircraft engines, to targeted advertising from Google!



The problem, more precisely, is the repeatability of the results. I'll tell you a secret that big data salespeople have a short bench. If you ask them for some more examples, the list will end in the second ten. I am sure that they will be able to name much more messengers and porn sites :) because there are simply physically more of them.



Of course, there is a result from the work of data scientists, only it rarely satisfies customers. Because, having spent a year of work and several million on equipment and salaries, at the end they give completely trivial conclusions and patterns that are obvious to any line manager or field specialist. For example, that the best selling product is placed at the level of the human eye.



And General Electric has built its competitive advantage based on the methods of mathematical analysis and statistics that can be found in any mathematics course for the university. The concept of big data did not exist then.



But you can't do a hype on calculus, and big managers are unlikely to hear about the two-hundred-year-old methods of Fourier and Cauchy. After all, everything there is boring, boring, you need to think a lot, and there is definitely no silver bullet and a magic pill.



What to do? Work! Long, boring and depressing, trying to create an atmosphere that would encourage active thinking. As in the canonical examples from Bell Labs or the same GE. It is quite possible, moreover, the most ordinary people, like you and me, are capable of it, if you motivate them in the right way.



And you need to start with ...



2. Master data: an immortal classic



Master data is an approach to structuring information that is in a company. If at some point you find that one or another entity is used simultaneously in two or more systems in your company (for example, a list of employees on an internal site, in the 1C-Accounting database or a CRM system), you need to put it in a separate master data system (MDM) - and force all systems to use only this directory. Along the way, it will be necessary for all participants to agree on the required fields and attributes, as well as come up with many rules to control the quality of this data.



There is a belief among data scientists under 30 that the window for MDM adoption began around 2008 and ended around 2012-15. That after that there were so many new tools (all sorts of hadoop and spark) that you no longer need to bother with master data, you do not need to go and negotiate with the owners of all systems, think about the consequences of choosing the MDM architecture and each specific attribute in each directory.



Unfortunately for them and fortunately for you, this window did not close. MDM systems are still as relevant as accounting or customer interaction systems. And you still need to think and negotiate.



3. How to store data: do you need QCD



No, you don't need corporate data graveyards.



The idea that for analytical purposes you need to have specially prepared sets of all data (QCD ideologists not only highlight this word in bold, but also underline it with a double line) in your company is absurd. The actual utilization rate of these data is minimal, 99% of them are never used.



However, the idea of ​​prefabricated datasets is fine in and of itself. Only they must be prepared before potential use, not earlier. And, of course, you need to have a working methodology for such training.



4. Normalization, or why do you need data swamps



This is the section on "data lake", or "data swamp". Legends say that you can dump all the data indiscriminately into one big heap. No need to convert all data to one format, no need to normalize and clean it up!



And that there is such special software that allows you to draw conclusions useful to you from this dump of data and get, like a magician out of his sleeve, the regularities you need.



In practice, the most "valuable" conclusion that you can draw from the data lake is that your company is almost out of work during the January holidays.



And the main question is how did some crooks manage to convince at least someone of the efficiency of this approach. I tend to hypnosis :)



5. Why does a data scientist get more analytics and do less?



Marketing, competent self-presentation, maximum self-confidence. I also don't rule out hypnosis :)



6. Data bus vs microservices



My favorite example of the misuse of technology. In any fairly large company, at a certain stage of development, a data bus appears. Not necessarily uniform and "in science", but the function itself is being implemented successfully. You can read more and systematically about the approach in the previous article .



As an alternative, young, successfully growing companies are offered to use microservices or sets of open APIs, different for each system used.



Yes, microservices come in very handy when you are writing one mono product that others can integrate with. Microservices tend to be fairly easy to write, easy to test, and don't need to be negotiated during development. For this they are loved by both developers and managers.



As practice shows, any two systems are perfectly integrated through microservices. Any three are good. Any five is tolerable if you document everything very carefully and hang it with autotests.



Already on ten systems, the architecture that looked great at the start, the approach turns into a kind of tangle, a web, when certain flows fall off and do not work for months.



image



On several dozen systems (the figure only seems impressive, in any enterprise much more information systems are used) the approach buries itself. And after a few years, there is some kind of centralization and a bus. As a rule, it is done by other people.



7. How not to get into the hype at all?



You have seen several examples of hype when some approach or technology may be useless. And this is taking into account the fact that, according to world statistics, the share of successfully completed projects for the development and implementation in IT rarely exceeds 40%.



The aftertaste of failed or useless projects may turn out to be such that the company will temporarily abandon IT initiatives altogether - until another influential manager “straddles” another hype.



In order not to get into the HYIP, before the next implementation, you need to find out the following:



- the technology has a large "bench". The number of examples of successful application should exceed a couple of dozen, and they should not give the impression that “some kind of magic is happening here”;

- the technology must pass the "grandmother test" (the explanation of the essence must be so clear that even your grandmother can master it - I repeat, no magic);

- the technology should have a specific, digitized list of achievements that your company will receive as a result. Implementers of MDM, CRM or the same 1C-accounting department can spend hours talking about the benefits of their solution using the example of your specific tasks. Big data implementers "in general" begin to tell that first we will collect a bunch of data, and then we'll see what to do with it;

- and, finally, the technology must be falsified (in the sense of Popper's criterion ), i.e. the implementer must clearly understand its scope and relevance - and be able to argue against(!) implementation. No need to hammer in nails with a microscope, and in general, for example, if you have few clients, do you need a super duper CRM?



By and large, this is already enough to continue just working and not be distracted by HYIPs.



Can you suggest any other criteria?

I invite you to the discussion!



All Articles