
[ad_1]
A multi-tenant application is a complex piece of software that can end in disaster if not designed with these things in mind

A few years ago, I was privileged to take a team of engineers and “learn how to build in the cloud”. There were (mostly) no guidelines, just a vague idea of what kind of app we should have built. Sounds like a developer’s dream, doesn’t it?
This was it.
But figuring out how to build in the cloud is a much harder task than it sounds from an enterprise perspective. We had to find out what a cloud is, what on earth CI/CD That’s all, discover the ins and outs of serverless, learn how to design NoSQL data models, make decisions about structuring microservices, learn about cloud cost-analysis and predictions, and above all how to do it – Learn what multi-tenancy was and how to build an app that can handle it.
The list goes on and on of all the things we had to deal with, and of course the nuances there were understood. But we loved it.
Of all the ideas, concepts, architectures, frameworks, and design patterns we’ve gone through, perhaps one of the most fundamentally difficult things we went through was multi-tenancy. In theory this was not a new concept for us, but putting it into practice was tedious.
When discussing multi-tenancy, I always get the same question: “What is a tenant?”
Honestly, there isn’t a single correct answer to this question. For us, a tenant was a paying customer. For others, it may be something completely different.
It’s not because our multi-tenancy implementation was unique; This is simply how we added segmentation to our app.
A tenant is a group of users who share common access to a set of data.
In a multi-tenant environment, your system has many (potentially unlimited) groups of users who share the same instance of software. These groups of users have access to sets of data similar to each other, but do not have access to data from other groups.
This paradigm is great for large applications because it reduces the complexity of release management. A shared instance implies that you don’t have deployment for each customer.
Instead, you can have one instance of your software deployed that everyone consumes. This increases the risk of disaster as there is only one release. So if one tenant sees a problem, everyone else does.
Serverless implementation of a multi-tenant environment is not significantly different from traditional software deployment. However, when we talk about scaling, there are some considerations to be made about how we authorize users, and how we structure the data.
To help illustrate the concepts, I have a GitHub repo of Fully functional multi-tenant app We will go This repo deploys a role-based serverless application that manages state parks.
Each of the system-defined roles in the app allows access to different endpoints. An end user can be assigned multiple roles to extend their access to new features.
Users can belong to multiple tenants but can only have one “active” tenant at a time. This reduces the overall complexity around authentication, while helping to maintain each tenant’s data limit. By changing their tenant, the user is effectively changing the data they have access to.
With our context application, a user can be assigned different roles depending on the tenant. It allows various access controls where a user can have elevated privileges in one tenant but not in another.
authority
whereas Amazon Cognito One Range of multi-tenancy options, our example today will focus on request-based lambda authorization to evaluate the identity of the caller and determine their active tenant. A lambda authorizer sits in front of an API gateway, evaluates the provided auth token, and returns an IAM policy to the user.

In our example, we will use the authorizer context, which is rich data passed to downstream services as part of the authorizer policy. This data can be anything you want, and in our example includes details about the calling user.
Here is an example of request context data returned by our authorization:
{
"userId": "testuserid",
"tenantId": "texas",
"email": "testuser@mailinator.com",
"roles": "[\"admin\"]",
"firstName": "Test",
"lastName": "User"
}
The business process flow of our Lambda Authorizer is outlined in the diagram below:

- Validate JWT – Validate authentication mechanism in workflow JWT Provided in the Authorization header.
- Load user details from DynamoDB — After parsing the user ID from JWT, load the full user data from the database. This includes the user’s active tenant, roles, and demographic information.
- Define access policy — Based on the active tenant roles, create an IAM policy of allowed endpoints that the user can apply.
- Create authorization context – Create data objects containing user information to provide downstream services such as Lambda, DynamoDB, and step functions.
- Return Policy and Context – Pass the access policy back to the API Gateway to evaluate and determine whether the caller has access to the endpoint they are invoking.
Once the authorizer completes and the API Gateway evaluates the IAM policy, the backing services behind the endpoint are invoked, and information is provided about the caller.
Using an authorizer context provides an additional layer of security in multi-tenant environments. This prevents malicious users from providing invalid tenant information and gaining access to data they do not own.
Since the tenant ID comes from the authorizer, we have protected our API from some malicious attacks where users try to spoof their tenant by passing in their access token. The authorization we create loads tenant information from the database, so any upgrade attempts are discarded.
The authorizer context is accessed differently depending on the downstream service. With lambda functions, we can access rich information requestContext
object in the event.
exports.handler = async (event) => {
const tenantId = event.requestContext.authorizer.tenantId;
}
However, in VTL, when we connect API Gateway directly to services like DynamoDB and step functions, it is accessed in a slightly different way.
#set($tenantId = $context.authorizer.tenantId)
Now that we have access to tenant ID for the caller, let’s discuss how we use it.
data access control
The most important aspect of multi-tenancy is preventing users of one tenant from viewing data related to the other tenant. Another way of saying it is strong data isolation.
When data for a tenant is properly isolated, attackers cannot manipulate API calls to return data to the tenant. When it comes to the structure of the data, it means that we prefix all indexes with tenant IDs.
Take the following data set as an example:

In this dataset, you have three different tenant parks: texas
, washington
And colorado
, The primary key’s partition key and the GSI’s pk both have a tenant ID prefix on the data.
In a multi-tenant application always index with the primary key’s partition key and tenant ID.
This pattern gives us a. Forces you to provide tenant ID when doing GetItem
either Query
Operation. By requiring the call to include the tenant ID, you guarantee that you will only receive data from a single tenant. You are also guaranteed that you will get the data for the correct tenant because the ID you are using is coming from the Lambda Authorizer and not spoofed by the caller.
Comment: This pattern is separated in a table scan. You cannot guarantee access to a single tenant’s data when scanning. This is another reason you should scan as often as possible,
Infrastructure scaling and service limits
As you take on more tenants, your infrastructure in a serverless environment will naturally grow. But there are some components you need to consider when doing your initial design.
in previous post. about Avoiding Serverless Service LimitsIn this article, I talk about how there is a strange discrepancy between the amount of SNS topics in your AWS account versus the amount of SNS subscription filters. You can create up to 100,000 SNS topics but only filter 200 subscriptions. Both are handy limitations, but important to consider with a multi-tenant application.
Rather than having a set of static SNS topics that allow users to subscribe with a tenant filter, a safer solution would be to create SNS topics dynamically per event type per tenant. You get a slightly more complicated solution, but you get the freedom to scale without worry.

These topics are created on-demand as subscriptions come in, so you may not need to create topics for some event/tenant combinations. This pattern is shown in subscription-webhook workflow Example in repository.
When publishing an event to a dynamic topic, a simple lookup is performed to find the topic if it exists. If it does, we publish it, and if it doesn’t exist the publication is discarded.

When we talk about augmenting the infrastructure for high traffic, Your architecture changes drastically, By and large, your application is more about batching, caching and queuing than anything else.
When you do batching and queuing, you lose the synchronous nature of the standard REST API request/response paradigms. This is an important consideration as your tenant ID is being injected into the system from your lambda authorizer. Don’t lose it!
When saving messages to a queue for later batch processing, you need to keep track of which requests come from which tenant. If you enable end users to change tenants at their discretion, viewing the active tenant at processing time is not an option.
It is possible that the user could have changed tenants between the time of the request and the time of processing. So your option is to save the tenant id and the request when it is received.
Working on a large scale from a multi-tenancy standpoint is not much different from a small scale. As long as you inject the tenant ID at the entry point of the request, not much is different.
Going multi-tenant in your application certainly has trade-offs. You get a reduction in complexity and maintainability from supporting only one instance of your application. But you get increased complexity by adding tenancy logic to your application.
Data security is also a concern with multi-tenancy. You must guarantee that there are no accidental “slip ups” returning data to the wrong tenant. However, by prefixing tenant IDs on all of our lookups and avoiding table scans at all costs, we can greatly reduce the risk.
Also watch out for service limits. It’s easy to reach some lesser-known limits as your application grows with more tenants. It is always a good practice to look at all the service limits of the services you are consuming before you start manufacturing.
A tenant is a logical construct. How you implement it is entirely up to you. As long as you keep your data isolated and build your app in such a way that it can be scaled, there is no wrong way to do it. You can use Amazon Cognito and its incredible feature set or take it upon yourself for fine-grained access control like I did.
Designing a multi-tenant application takes significant advance planning, but the end justifies the means. Hopefully, the reference architecture provided will help spark some ideas for your implementation.
Happy coding!
[ad_2]
Source link
#building #multitenant #serverless #app #Alan #Helton #August