Hello and Welcome to Azure Kubernetes Service Best Practices
Series. In this session, we will discuss cluster management and
multi-tenancy in Kubernetes.
This is an intermediate level session, so you are expected to
have familiarity with the basic Kubernetes concepts. My name is
Mohammed Nofal, I am a solution architect in the Azure
Global black belt team. My role is focused on enabling
Kubernetes into enterprise and web scale organizations.
So what's in our agenda for today?
First cluster isolation patterns, one of the top
questions we get from our customers is how many clusters
should I have or how should I isolate my workloads within the same cluster.
So part one from this session will be discussing the
isolation patterns, then I'll explain scheduling and resource
management in Kubernetes and how you
can make sure your pods are not contending for resources.
Finally we will end with a demo so let's get started.
There are 2 isolation patterns that you can use in Kubernetes.
The first pattern is physical isolation, in which you dedicate
clusters based on the environment like Dev or Staging
or cluster per team or cluster per project so each workload is
contained within its own
cluster. In this example we have 2 clusters one for Dev and one
for staging and in production we have 2 clusters for team one
and team 2.
The second isolation pattern is the logical isolation and
logical isolation, you typically group work loads and clusters
based on some commonalities between them like the
environment, the team or criticality of the workload.
In this example, we have 2 clusters. One cluster for both
Dev and staging workloads where you have different teams
sharing the same cluster. Then the second cluster is dedicated
for production workloads where you have 3 teams in it.
So what enables us to divide or subdivide our clusters in
such a way that is safe secure and efficient.
We are going to use Kubernetes Namespaces as our logical
isolation boundary. Just a quick note for those who don't know
Kubernetes Namespaces is an object that allows you to create
a virtual cluster within your cluster. Namespaces provide scope
for names, so you can use the same name for objects
like voice or services across
different namespaces. Also we need to be aware that not everything is
Namespaced. For example, nodes are not Namespaced as a vehicle for
record in this session.
Now we have multiple workloads or teams coexisting in the same
cluster. To safely isolate these workloads from each
other, there are 4 different areas that we need to consider.
First thing to think of is scheduling. We need to
protect the cluster compute resources and we need to make
sure that pods are not contending for them.
This is where resource coders and some other Kubernetes
features come in handy, this will be the focus of this system.
Then we need to think of the network isolation. Kubernetes
has a flat network by default, which means that any pod can
access any other port even across different Namespaces. To
prevent this from happening, you will need to use Kubernetes security
policies, which is a way to secure point to point
communication by applying
network security policies on group based on their labels.
The other area that we need to consider is to make sure that
the access to the Kubernetes API is protected and only
authenticated and authorized identities can interact with
the cluster. This can be achieved by using Kubernetes
all back with Azure Active Directory Integration pod
identity, secrets encryption and some other features.
Lastly, you need to protect your containers
in order to make sure that a compromised
container will not result in an elevated access to the nodes.
You need to consider things like scanning the container images
against your organization security baseline. You need to
scan the containers during runtime there are some Linux features
that you can consider like App Armor and as Identix and
some others that you can leverage to accomplish this.
The focus of this session will be only on the
scheduling part network and security related topics will
be discussed in other sessions.
So now let's delve deeper into resource management in
Kubernetes.
First, we will discuss resource
quotas. Resource quotas is a way to assign resource limits to
Namespaces with resource quotas you can assign compute resource
limits like CPU memory disk, etc. On your Namespaces and you
can assign object limits like team X can only create 100 pods
in this Namespace.
When you apply resource quotas on namespaces, only
deployments that satisfy the quota requirements and policy
will pass otherwise if the deployment doesn't advertise
resource quota it will fail and a 403 forbidden error will
be returned as violation for decoder constraints.
Here on the right we have an example for a resource
quota manifest.
Its name is mem-cpu-demo and it has requests for one
cpu and one gigabyte of ram, at request is what will be reserved
and guaranteed for the Namespace. Then it has a limit
for 2 cpu's and 2 gig of ram.
A limit is what we can burst to assuming we have resources
available in the cluster.
Now let's discuss limit range, which is an admission controller.
An admission controller for those who don't know is a piece
of code that intercepts requests to the Kubernetes API server
prior to persistence of the object. It does a couple of things,
it either validates the request mutated the request
or both validate and mutate.
And the resource quotas, we discussed that by applying them
to Namespaces all deployments that do not have resource
quotas will fail.
Now, if you don't want to be that aggressive you can use the
limit range, admission controller which will apply default limits
to ports that do not specify any.
On the example on the right any quote with no memory or resource
code as advertised will be assigned 256 M as guaranteed
and bursts up to 512 M of memory.
Limit range is loaded by default and AKS and all you
have to do is to configure your own policies.
Moving on.
Pod Disruption Budget is a way to protect our deployment from
voluntary actions. Voluntary actions are actions performed by
an admin or automation like removing a node removing a pod
or scaling down operations.
This option helps you decide the number of pods that can be
done simultaneously from voluntary disruptions.
And the example on the right. We are applying a pod
disruption budget policy that says my nginx deployment
that has enabled nginx-pdb should at least have 90%
of it's pods available.
Which means if I have 10 pods I can tolerate
the failure only of one.
Finally, note that pod disruption budget doesn't
protect from involuntary actions like node failure or shut down.
Node Selectors, now if you have an application with specific
requirements for example, gpu workload and you need it
to be deployed on the gpu nodes only, you can use node
selectors for this purpose.
With node selectors, you label the note as in our
example in the right hardware: gpu and in your
pod manifest you can say that this pod will only
land on the node that has the hardware GPU limit.
When using node selectors. The pod will only be able to be
deployed when there are nodes with the label and they have
capacity. Otherwise, the pod will fail to deploy. You should
note that node selectors are not hard rules which means other
pods with no node selector
can still land on their own.
Node Affinity is a successor for node selector, but it
introduces soft rules required during scheduling ignore during
execution, which is a hard rule. Just like the node selectors and
then the second one preferred during scheduling and ignore
during execution as the software, which means that if
you don't have capacity try to place the pods
on some other nodes that don't have a label. We
recommend using node affinity over node selectors.
Inter-pod affinity and anti-affinity is similar to the node
affinity, but now we can specify where should my pod land based
on the labels of already existing pods. For example, I
need my front end pod to land on the nodes where my back end pods
are. It also has the same
rules as node affinity.
So now let's move to the demo section, in this demo section
we're going to create a namespace apply a resource quota
in this namespace, deploy a container
beyond limits that will fail, deploy a simple container within
the limits and this should pass,
then apply a limit range policy on this namespace and then test
the limit range range policy.
So now let's create the namespace code cm-webinar
If I describe this namespace,
I can see that there are no resource quotas over limits
announced in this namespace.
Let's change the default namespace to the newly
created one.
Now I need to apply a resource quota to this namespace and
this is the manifest that I'm using and this manifest I'm
requesting for one CPU and one gig of memory and my limits
equal to my request, which means that I'll be not able to verse
in this namespace.
Right the policy was applied.
If I describe the namespace again, I can see that the
resource quotas started to be tracked.
Now let's create a pod that has no resource limits advertised in
it, and this is the pod manifest that I'll be using.
Right.
And I can see that the pod deployment has failed because.
I'm violating the quota constraints and if I look at the
pod manifest vs code is already warning me that I'm not
announcing limits in my pod manifest and this was enabled
using the Kubernetes extension for vs code, which is
highly recommended to use.
Now let's deploy the deployment that has resource limits in it.
You can see that it passed. Let's check the manifest
for this deployment.
This is the manifest it's an nginx deployment,
It has 4 replicas in it,
and then each replica is asking for 250 M of ram and a quarter
of CPU and set to limits.
If I get the pods then I can see that I got the 4 all applied or
deployed. If I describe the
namespace, I can see now that the used equal to
the hard, which means that I can't deploy more
in this namespace.
So lets see the deployment now,
Kubernetes will after the deletion of deployment will
claim back the resources to my name space so if I describe
the namespace again I can see that 3 pods already were
terminated as such I have 250 MB left this number will be
zero once the last point was deleted or terminated.
So now let's apply the limit range policy to this namespace
so we don't fail the deployments that have
no resource limits in them.
Right and this is the manifest that I used and in this
manifest I'm just saying that if there is no limit then apply
a default limit of 300M of memory and then .5, CPU and then
the request will be will default to 256 M of ram and the
quarter of CPU.
If I describe the namespace again, I can see that the
resource quotas are tracked and then resource
limits are being tracked too.
So let's create a pod that has no limit ,this
is the pod I'm using.
Right has no limits announcing it.
And I can see that it passed.
If I described the pod
I can see that the pod
was assigned default limits and requests which was
applied from the limit range policy, which is quite handy.
And this sums up, our demo let's go back to our presentation.
Just a quick note.
Now, if you want to scan your cluster to see if you have pods
with no resource code is applied, you can use Kube-advisor,
which is a pod that scans your cluster and report pods
with missing resource folders. That's the current
implementation in the future more features will be added. For
more information you can access the link here.
The tool is open source and contributions are welcome.
During the demo, I showed that if you have a pod
deployment with resource code is missing VS code would warn you
against this you can get this, if you are using the Kubernetes
extension for VS code.
So what to use physical or logical isolation? Below I'll
try to summarize the pros and cons for each option, so you can
make an informed decision.
First pod density, and logical isolation you condense small pods
as you have more users and better visibility. While on
physical isolation, you're going to end up with some
underutilized clusters since not most of the teams of workloads
will be utilizing their clusters to the fullest.
When it comes to cost, physical isolation is
typically higher because off the underutilized resources.
Now, from Kubernetes Experience Perspective a logical isolation
requires more experience as you have to apply all the isolation
and security controls they want for your production workloads to
make sure that it's safe, while in physical isolation you've
got isolation for free. This doesn't mean that you don't
have to apply security controls but it means that you have time
to do it.
On security both are high and in both you need the same
security controls. Physical has an advantage over the logical
as you have a smaller surface while on logical you have a
bigger surface. Hence, you need to be more strict in your
security controls from the very beginning.
On the blast radius of changes we just discussed the smaller
service. As such, physical has a smaller blast radius, which
make cluster level changes easier and more manageable.
The management and operations.
This one if you are enterprise is the one that normally decides
physical or logical. On physical cluster you can go for you
build it, you run It approach for each development team and
each Dev team can own their own cluster. While with logical you need
to have a single team or a cross function one to manage the cluster.
Finally, cost management and physical isolation it's
business as usual, you take the VMS with desired co center and
charge based on this. But in logical isolation it's going to
be complex. As you need to charge per namespace. The work
is still ongoing, in the community to make this easier
and we in Azure are helping and following this closely. Once
the efforts mature, we will have an implementation
forward.
So to summarize there's no one answer that fits all for how
many clusters you should use. You will need to think it
through,, to think of how critical the workload is, are you cost
sensitive, how your organization is structured who operates what
and of course, the blast radius. You can end up with a mixture
between logical and physical and that's
totally fine and what we see most of the customers are doing.
Then never use the default namespace for anything close
to production.
And lastly resource quotas must be applied even in
physical isolation.
On the resources if you are at the beginning of your
Kubernetes journey, then, please check our conceptual or 101
Docs. If you are looking for some recommendations and best
practices, then we have lined up some great docs for you, which
is based on our collective experience from dealing with
customers using Kubernetes in Azure for the past 2 years, and lastly
this session and more related sessions can be found
in the session's link.
With this, I'm going to end the session. I hope you have
enjoyed it and thanks for watching bye bye.
Không có nhận xét nào:
Đăng nhận xét