Isolation and silo for data warehouses in multitenant solutions



In one of our previous articles, we examined several key points of setting up a Amazon EKS multi-tenant (hereinafter multitenant ) cluster . As far as security is concerned, this is a very broad topic. It is important to understand that security applies not only to the application cluster, but also to the data warehouse. AWS as a platform for SaaS solutions has great variability for data warehousing. But, as elsewhere, a competent security setting, working out a multitenant architecture for it, setting up various isolation levels require certain knowledge and understanding of the specifics of the work.







Multitenant data warehouse



Manage multitenant data conveniently via hoppers, Silo . The main feature is the separation of rental data (hereinafter tenant ) in multitenant SaaS solutions . But before talking about specific cases, let's touch on a little general theory.



Hidden text
The term "bunker" has not yet taken root in the Russian slang of IT specialists, but we will use it, by analogy with the "data lake".



Only tenant should have access



Data security is a priority for SaaS solutions. It is necessary to protect data not only from external intrusions, but also from interaction with other tenants . Even in the case when two tenants cooperate with each other, and access to shared data is controlled and configured according to business logic.



Industry Standards for Encryption and Security



Tenant standards may vary by industry. Some require data encryption with a well-defined key change frequency, while others require tenant- oriented ones instead of shared keys . By identifying datasets with specific tenants , different encryption standards and security settings can be applied to individual tenants as an exception.



Performance tuning based on tenant subscription



Usually, SaaS providers recommend a common workflow for all tenant . From a practical point of view, this may not always be convenient in relation to specific business logic. Therefore, it can be done differently. Each tenant is assigned a different set of properties and performance limits based on the TIER standard . In order for customers to get the performance that is stated in the SaaS agreement, providers have to monitor the use of individual tenants . This gives all customers equal access to resources.



Hidden text
Naturally, this will be reflected in the client's accounts. The one who uses more resources will pay more.



Data management



As SaaS services grow, so does the number of tenants . If the client changes the provider, most often he wants all the data to be uploaded to another resource, and the old ones to be deleted. If the first desire can be challenged, then the fulfillment of the second is guaranteed by the EU General Data Protection Rules. For the correct execution of the rules, the SaaS provider must initially identify the data sets of individual tenants .



Hidden text
?! , , . . .


How to turn a regular data warehouse into a multitenant



Just want to note that the magic code does not exist. You can't just go and set up a tenant data warehouse silo. The following aspects should be considered:



  • Service Agreement;
  • Access patterns for reading and writing;
  • Compliance with regulations;
  • Expenses.


But there are a number of generally accepted practices for separating and isolating data. Consider these cases using the Amazon Aurora relational database as an example .



Partitioning tenant data in shared repositories and instances





The table is used by all tenants . Individual data is separated and identified by the key tenant_id . Authorization in a relational database is implemented at the row-level security . Access to the application works based on the access policy and takes into account a specific tenant .



Pros:



  • It's not expensive.


Minuses:



  • Database authorization. This implies several authorization mechanisms within the solution: AWS IAM and database policies;
  • To identify the tenant, you will have to develop application logic;
  • Without complete isolation, the TIER service agreement cannot be enforced ;
  • Database Level Authorization limits access tracking with AWS CloudTrail . This can only be compensated for by adding information from outside. And it would be better to track and troubleshoot.


Data isolation on shared instances





Lease ( tenancy ) is still rassharivat at the instance level. But at the same time, data bunkering occurs at the database level. This makes AWS IAM authentication and authorization possible.



Pros:



  • It's not expensive;
  • AWS IAM is fully responsible for authentication and authorization;
  • AWS IAM allows you to keep audit logs on AWS CloudTrail without crutches as separate applications.


Minuses:



  • Basic instances DB sharyatsya between the tenant , in connection with the possible outflow of resources that does not fully fulfill the agreement TIER about service.


Database instance isolation for tenant





The diagram shows an implementation of a tenant database when isolating instances. Today, this is probably the best solution that combines safety and reliability. There is AWS IAM , and audit from AWS CloudTrail , and complete isolation of tenant .



Pros:



  • AWS IAM provides both authentication and authorization;
  • There is a full audit;
  • tenant.


:



  • tenant β€” .


multitenant



Ensuring that applications have the right access to data is more important than storing data in a tenant model that meets business requirements. It's not difficult if you use AWS IAM for access control (see examples above). Applications that provide data access for tenant can also use AWS IAM . This can be seen in the example of Amazon EKS .



To provide pod- level IAM access in EKS , OpenID Connect (OIDC) is perfect , along with Kubernetes account annotations . As a result, the JWT will be exchanged withSTS , which will create temporary access for applications to the necessary cloud resources. With this approach, you do not need to enter extended permissions for the basic Amazon EKS worker nodes . Instead, you can configure only IAM permissions for the account associated with the pod . This is done based on the actual permissions of the app that is running as part of the pod . As a result, we get full control of the permissions of applications and pod .



Hidden text
, AWS CloudTrail EKS pod API, .


IAM integration supports a comprehensive authorization system for tenant access to data stores. In this case, access to the database is controlled only through authentication, which means that you need to enter another level of security.



Amazon EKS accesses multitenant AWS DynamoDB







A closer look at the multitenant access, both an application running on Amazon's EKS , gains access to multitenant database Amazon's DynamoDB . In many cases, multitenant processes in Amazon DynamoDB are implemented at the table level (in the ratio of tables and tenant 1: 1). As an example, consider the AWS IAM ( aws-dynamodb-tenant1-policy ) principle , which perfectly illustrates the access pattern where all data is tied to Tenant1 .



{
   ...
   "Statement": [
       {
           "Sid": "Tenant1",
           "Effect": "Allow",
           "Action": "dynamodb:*",
           "Resource": "arn:aws:dynamodb:${region}-${account_id}:table/Tenant1"
       }
   ]
}




The next step is to associate this role with an EKS cluster account that uses OpenID .



eksctl utils associate-iam-oidc-provider \
      --name my-cluster \
      --approve \
      --region ${region}



eksctl create iamserviceaccount \
      --name tenant1-service-account \
      --cluster my-cluster \
      --attach-policy-arn arn:aws:iam::xxxx:policy/aws-dynamodb-tenant1-policy \
      --approve \
      --region ${region}


A pod definition that contains the required serviceAccountName specification will help you use the new tenant1-service-account service account .



apiVersion: v1
kind: Pod
metadata:
 name: my-pod
spec:
serviceAccountName: tenant1-service-account
 containers:
 - name: tenant1
…


While the IAM tenant account and policy is focused, static, and managed by tools like Terraform and Ansible , the pod specification can be dynamically configured. If you are using a template generator such as Helm , serviceAccountName can be set as a variable to the appropriate service tenant accounts . As a result, each tenant will have their own dedicated deployment of the same application. In fact, each tenant should have a dedicated namespace where applications will run.



Hidden text
Amazon Aurora Serverless, Amazon Neptune Amazon S3.


Conclusion



For SaaS services, it is important to think carefully about how the data will be accessed. Consider storage, encryption, performance, and tenant management requirements . In multitenant have any one of a preferred method of data partitioning. The advantage of running multitenant AWS workloads is AWS IAM , which can be used to simplify access control for tenant data. In addition, AWS IAM can help you configure application access to data dynamically.



The described features and techniques that may come in handy have affected a bit of theory. But special cases it is always necessary to independently analyze the source information and create a personalized solution.



All Articles