Apache Software Foundation has published the release of the Apache Hadoop 3.3.0 platform



The Apache Software Foundation has released a fresh release of its platform - Apache Hadoop 3.3.0 . A year and a half has passed since the last update. The platform itself is a tool for organizing distributed processing of large amounts of data using MapReduce. Hadoop includes a set of utilities, libraries, and frameworks for developing and executing distributed programs that can run on clusters of thousands of nodes.



For Hadoop, a specialized file system Hadoop Distributed File System (HDFS) has been created, which provides data backup and optimization of MapReduce applications. HDFS is designed for storing large files distributed between individual nodes of a compute cluster. Thanks to its capabilities, Hadoop is used by the largest companies and organizations. Google even granted Hadoop the right to use technologies that affect patents related to the MapReduce method.



In general, we meet Apache Hadoop 3.3.0 .





Here is a list of the most important changes in the new version:



  • Support for ARM-based platforms (by the way, Selectel has ARM servers; here's a link if you want to try).
  • Protobuf (Protocol buffers) 3.7.1. Protobuf .
  • S3A Delegation Token (), 404, S3guard .
  • ABFS.
  • Java 11.
  • Tencent Cloud COS, COS.
  • DNS Resolution, DNS . , .
  • YARN (Yet Another Resource Negotiator) .
  • Added support for scheduling OPPORTUNISTIC containers to run through the Resource Manager .


Due to the fact that Hadoop is actively developing, the market for solutions based on it is growing rapidly. If in 2019 the market volume was about $ 1.7 billion, then, according to experts, by 2024 it will reach $ 9.4 billion.



Now Hadoop ranks first among the Apache repositories in terms of the number of changes. The platform codebase is about 4 million lines. The largest repositories are Netflix, Twitter, Facebook.



All Articles