AWS re: Invent 2020, Keynotes - Analytics + Networking

Another batch of announcements and new products from the annual large-scale cloud conference AWS re: Invent 2020. This time in the field of analytics and network infrastructure. Many features are already available for preview, which ones - read below. The AWS architects will discuss the new items in more detail in a Russian-language tweet that they regularly hold on re: Invent days. Link to the twitch stream at the end of the article.







Analytics



AWS Lake Formation New Features (Preview)



New AWS Lake Formation features such as transactions, row-level security, and performance improvements are available for preview. The functionality works through new, open and public APIs for updating and accessing data lakes.



Transactions are implemented using “governed tables”, a new type of table based on Amazon S3 that supports ACID transactions. Transactions simplify data transformation scripts (ETLs) and enable different users to add, delete, and modify records in different managed tables at the same time and guaranteed.



AWS Lake Formation automatically compresses and optimizes managed table storage in the background to improve performance when querying data.



More details here



Redshift



RA3.xlplus nodes and additional announcements for Amazon Redshift



RA3.xlplus is the third and smallest node type in the RA3 family. RA3 allows you to scale compute and storage separately, expanding the compute choices for Amazon Redshift clusters.







More details here



Ability to move a cluster between Availability Zones (AZ)



The move cluster feature moves a cluster to another AZ in one step without the need to make changes to the application. When a cluster is moved to a different AZ, the new cluster will have the same endpoint so that applications can continue to run unchanged. The feature is free and available for RA3 clusters.



More details here



Automatic table optimization



Automatic Table Optimization constantly monitors how queries interact with tables and uses machine learning to select the best sort and distribution keys to optimize query performance across the cluster.



More details here



Sharing Data Between Amazon Redshift Clusters (Preview)



A new data sharing feature in Amazon Redshift is available for trial, which allows you to securely and easily share data between Redshift clusters in real time. Sharing data allows you to simplify data processing, increase productivity and reduce costs - everything that you are used to within a single Redshift cluster is now available in multiple clusters while working on data.



By using a managed data store that is separate from the compute nodes of the RA3 family, it is possible to get instant and high-performance access to data from multiple clusters without having to copy or move them. Reading outdated data is also excluded - all clusters work on a single, always up-to-date copy of the data, with all the latest changes. There is no additional cost to share data across Amazon Redshift clusters.





More details here



Amazon Redshift and Amazon RDS MySQL Databases and Amazon Aurora MySQL Federated Queries (Preview)



Amazon Redshift Federated Queries enable you to connect data from transactional databases for BI and reporting applications for operational analytics. The Amazon Redshift Optimizer offloads and distributes some of the computation to remote databases to accelerate performance by reducing network traffic. Today, we are expanding the federated query capabilities on Amazon RDS for MySQL and Amazon Aurora for MySQL. The function is available for preview.



Built-in JSON support (preview)



Today, we are introducing native support for JSON and semi-structured data in Amazon Redshift for a preview. A new data type 'SUPER' is used for storage, which allows storing semi-structured data in Redshift tables. Also added support for the PartiQL query language for querying and processing such data.



More details here



Amazon EMR Studio Preview

Amazon EMR Studio, a Jupyter-based IDE, has been announced. It supports authentication with enterprise SSO providers and enables analysts and data engineers to develop analytical applications and data processing systems in R, Python, Scala, and PySpark. Spark UI and YARN Timeline Service are also available to facilitate debugging. EMR Studio laptops will run on existing EMR clusters, or launch new ones using the ready-made CloudFormation templates for EMR.



Details here



Amazon EMR on Amazon EKS







With the new EMR ( Amazon EMR on Amazon EKS ) deployment , customers can automate the creation and management of open source big data frameworks powered by Amazon EKS. Customers can now run Spark applications in conjunction with other types of applications within the same EKS cluster and gain improvements in resource utilization and ease of infrastructure management.



Amazon EMR automatically packages your application into a big-data container and provides out-of-the-box connectors for integration with other AWS services. Then, EMR deploys the application to the EKS cluster and manages the logging and monitoring. With EMR on EKS, you can get 3x the performanceusing the performance-optimized Spark runtime included in EMR versus the standard Apache Spark on EKS.



More details here



Networking



VPC Reachability Analyzer



The new VPC Reachability Analyzer service allows you to diagnose network availability between two traffic points (endpoints) without the need to send network packets. The service reads the configuration of all resources in the VPC and uses automatic reasoning to determine the available network traffic paths. It analyzes all possible traffic paths within the network without sending network packets. To learn more about how automatic analysis algorithms work, see the re: Invent session or read this document .







More details here



AWS Transit Gateway Connect



Overlay SD-WANs (Software Defined Wide Area Networks) are used to connect offices or data centers over the public Internet. Cloud infrastructure is now required to be connected to the same networks. AWS Transit Gateway is often used at the edge of the network to connect their networks to the AWS backbone.



And with the addition of AWS Transit Gateway Connect functionality, there's an easy way to expand your SD-WAN infrastructure into the AWS Cloud. Instead of multiple IPsec VPN tunnels between Transit Gateway and SD-WAN network devices, Transit Gateway Connect uses GRE tunnels. It also supports dynamic BGP routing, integrates with AWS Transit Gateway Network Manager monitoring service and a suite of partner solutions .



All of this simplifies network design, improves performance, and makes it easier to expand SD-WANs to AWS.







More details here



IGMP support in AWS Transit Gateway



AWS Transit Gateway introduces Internet Group Management Protocol (IGMP) support, making it easier to manage applications that use IP multicast.



Customers have previously used AWS Transit Gateway to run multicast applications in the cloud. Now with IGMP support, it's easier to scale and manage multicast group membership. You no longer need to configure static multicast groups, sources and sinks, Transit Gateway automatically adds and removes group members using IGMP.



IGMP is an open standard and many multicast applications rely on it. It's now easier to migrate them to the cloud.



More details here



Russian-language Twitch session



All news in the field of analytics and network infrastructure will be discussed today in the Russian-language twitch stream. The leading AWS solution architects have chosen all the most interesting, have already used a lot and will now exchange their impressions of the new products and answer all your questions. If you have not connected to streams yet - link to registration . By the way, you can watch the recordings of previous Russian-language streams in the tweet, if you missed them.



Previous news from AWS re: Invent 2020:

AWS re: Invent. Day 1 Top Announcements (Andy Jassy, ​​Business Applications)

AWS re: Invent. AWS re: Invent 2020 Keynotes - Machine Learning Day 1 (Storage) Main Announcements




All Articles