containers are like "clothes", while virtual machines are "houses"

Home    新闻资讯    containers are like "clothes", while virtual machines are "houses"

I 've been doing container research and containerization for a few years,from the initial initial understanding of the container, to the accumulation of a large number of container migration experience.


Source: 51CTO Technology Stack

links: http://mp.weixin.qq.com/s/_Zy7zV_ssdrzhnaz8K-0Nw

explained the container technology to the customer, he found that he had a lot of misunderstandings about containers, and containers are not a substitute for virtual machines. They have very specific application scenarios.


containers



Misunderstanding 1: The container starts fast and starts in seconds


this is a sentence that many people often say when preaching about containers, often people will start an application such as Nginx, and it will indeed be able to start soon.container start fast? First, there is no kernel; second, the mirror image is relatively small.However, the container has a main process, that is, a Entrypoint. Only when the main process is fully started, the container is really started. To use a metaphor:containers are more like people's clothes. People stand up, clothes stand up, people lie down, clothes also lie down.clothes have a certain degree of isolation, but the isolation is not so good. Clothes have no root (kernel), but clothes can go everywhere with people.so is it meaningful to judge the startup speed of a container according to an Nginx?for Java applications, Tomcat is installed inside, and Tomcat is started, War is loaded, and the real application is started.


, if you stare at Tomcat's log, it still takes some time, not seconds at all. If it takes a minute or two for the application to start, it is meaningless to just talk about the second start of the container. Now the startup speed of VM in OpenStack is also optimized faster and faster. When starting a VM, you originally needed to download the virtual machine image from Glance.later came up with a technology that when Glance and the system disk share Ceph storage, the virtual machine image does not need to be downloaded and the startup speed is much faster. Moreover, the reason why the container starts fast is that it is often recommended to use a very small image, such as alpine, where many things are cut out, and the startup speed is faster. The OpenStack virtual machine image can also be cut a lot to achieve fast startup.


, we can finely measure each step of the virtual machine startup, cut out the corresponding module and startup process, and greatly reduce the startup time of the virtual machine. For example, it takes 1-3 minutes to create a virtual machine https://www.ustack.com/blog/build-block-storage-service,我们可以看到这样的实现和描述:使用原生的 OpenStack in a UnitedStack blog post, while it takes less than 10 seconds to use the modified OpenStack. This is because nova-compute no longer need to download the entire image via HTTP, and the virtual machine can be started by directly reading the image data in Ceph. Therefore, the overall startup time of the virtual machine can generally be within ten seconds to half a minute under the condition of good optimization. Compared with Tomcat's startup time, this time is actually not a burden. Compared with the startup speed of the container, there is no qualitative difference. Some people may say that the startup speed is faster. Especially for the online environment hang up self-repair, not every second counts? On the issue of self-repair, we say separately below.


However, the virtual machine has one advantage, that is, it has good isolation. If the container is clothes, the virtual machine is the house. The house stands there. Whether the people inside are standing or lying, the house is always standing and the house will not follow people.using virtual machines is just like people living in apartments. Each person has one room and does not interfere with each other. Using containers is like everyone wearing clothes and squeezing inside a bus. It seems isolated. Whoever breaks the bus can't walk.


To sum up, the startup speed of the container is not enough to form an obvious advantage over the OpenStack virtual machine, but the isolation of the virtual machine kills the container.

Myth 2: Containers are lightweight, each host will run hundreds of containers.


, many people will do experiments and even tell customers how awesome the container platform is. You see, we can run hundreds of containers on one machine, and virtual machines can't do this at all.but a machine running hundreds of thousands of containers, is there such a real application scenario?, for containers, the important thing is the application inside. The core of the application is stability and high concurrent support, not density., I met many well-known lecturers dealing with Double Eleven and 618 in many speech conferences. It is generally reported that the current Java application is basically standard with 4 cores and 8G. If there is insufficient capacity, a small number of them will be carried out through vertical expansion and a large number of them will be carried out through horizontal expansion. If 4-core 8G is standard, less than 20 services can occupy a physical server, a machine running hundreds of Nginx interesting.


of course now has a very popular Serverless serverless architecture, in which all custom code is written and executed as isolated, independent, often fine-grained functions that run in stateless computing services such as AWS Lambda.these computing services can be virtual machines or containers. For stateless functions, fast creation and deletion are required, and it is likely that the time to execute a function is very short. In this case, containers still have certain advantages over virtual machines.


's current no-service architecture is more suitable for running some task-based batch operations, using process-level horizontal elasticity to offset the greater cost of process creation and destruction., in the integration of Spark and Mesos, there is a Fine-Grained mode. When big data is executed, the execution process of the task has already applied for resources. Unlike waiting to allocate resources there, this mode allocates resources only when the task is allocated. The advantage is the ability to flexibly apply and release resources. The disadvantage is that the creation and destruction of processes is too granular, so the performance of Spark in this mode will be worse. Spark's idea is similar to that of a serviceless architecture. You will find that when we originally learned the operating system, we said that the process granularity was too large, and it would be too slow to create and destroy processes every time. In order to achieve high concurrency, there were threads later. The creation and destruction of threads were much lighter, but of course they still felt slow. Therefore, there was a thread pool, which was created there in advance. When using it, it was not created now, and when not used, it was returned.later still felt slow, because the creation of threads also needs to be completed in the kernel, so later there was a coroutine, all in the user state for thread switching., for example, AKKA and Go all use coroutines, you will find that the trend is for high concurrency, and the granularity is getting finer and finer. Now many situations need process level, and there is a feeling of feng shui turning.


Myth 3: The container has a mirror image, which can maintain the version number and can be upgraded and rolled back.


containers have two characteristics, one is encapsulation and the other is standard. With a container image, you can encapsulate various configurations, file paths, and permissions of the application, and then say "set" like Sun Wukong, which is set at the moment of encapsulation.image is standard. No matter in which container is running, the same image can be run to restore the moment at that time. The image of the container also has a version number. We can upgrade the image based on the version number of the container. Once the upgrade is wrong, we can roll back the image based on the version number. After the rollback is completed, the container is still in the original state. However, OpenStack virtual machines also have mirrors. Virtual machine mirrors can also be snapshot. When snapshot, all the states at that moment will be saved. Moreover, snapshot can also have version numbers, and can also be upgraded and rolled back.it seems that containers have these features OpenStack virtual machines. What is the difference between the two?virtual machine images are large, while container images are small. Virtual machine images are always tens of g or even hundreds of g, while container images are hundreds of m more.

virtual machine images are not suitable for cross-environment migration., for example, the development environment is local, the test environment is on one OpenStack, and the development environment is on another OpenStack, it is very difficult to migrate the image of the virtual machine, and very large files need to be copied., containers are much better, because images are small and can be quickly migrated from different environments.


virtual machine images are not suitable for cross-cloud migration.there is currently no public cloud platform that supports the download and upload of virtual machine images (for security reasons and piracy reasons), an image cannot be migrated between different clouds or between different regions of the same cloud, and only a new image can be made, thus the consistency of the environment cannot be guaranteed.


, the image center of the container is independent of the cloud. As long as it can be connected to the image center, it can be downloaded from any cloud. Because the image is small and the download speed is fast, the image is layered, and only the different parts need to be downloaded each time.OpenStack for mirroring basically works in one cloud. Once across multiple environments, mirroring is much more convenient.


Myth 4: Containers can use container platform management to automatically restart to achieve self-repair


containers is often touted. Because the container is clothes, people lie down, clothes also lie down, the container platform can immediately find people lying down, so can quickly wake up people to work.while the virtual machine is the house, the person lies down and the house is still standing., the virtual machine management platform does not know whether the people inside can work, so the container will be automatically restarted when it is hung up, while the application in the virtual machine is hung up. As long as the virtual machine is not hung up, no one will know.


these statements are true, people slowly discovered another scenario, that is, the application in the container is not hung, so the container seems to be still started, but the application is no longer working and has no response., when the container is started, although the state of the container is up, it will take some time for the application inside to provide services. Therefore, for this scenario, the container platform will provide health check for applications in the container, not only whether the container is present or not, but also whether the applications in the container can be used. If not, it can be restarted automatically., it is not much different from the virtual machine, because with health check, the virtual machine can also see whether the application inside is working, and the application can be restarted if it is not working., the container starts fast and starts in seconds. If it can restart and repair automatically, it is a second-level repair, so the application is more available. This view is of course incorrect, and the high availability of the application is not directly related to the speed of restart. High availability must be achieved through multiple replicas. After any one of them is suspended, it cannot be solved by quickly restarting this application. Instead, it should be solved by other replicas taking over the task immediately during the period of suspension. Both virtual machines and containers can have multiple copies. In the case of multiple copies, it doesn't matter whether the restart is 1 second or 20 seconds. What matters is what the program did during the period of hanging up. If the program is doing an insignificant operation, it doesn't matter if it hangs for 20 seconds. If the program is making a transaction and payment, it won't do if it hangs for 1 second, and it must be able to fix it.


, the high availability of the application depends on the retry and idempotence of the application layer, rather than the quick restart of the infrastructure layer.for stateless services, there is no problem with automatic restart repair when the retry mechanism is done, because stateless services will not save very important operations. For stateful services, the restart of the container is not only not recommend, but may be the beginning of a disaster.a service has a state, such as a database, in a high concurrency scenario, once it is suspended, even if it is only 1 second, we must find out what happened in this 1 second, which data was saved and which data was lost, instead of blindly restarting, otherwise it is likely to cause data inconsistency and cannot be repaired later., for example, the database under high-frequency trading is hung up, it is supposed that DBA should strictly examine what data has been lost, instead of blindly restarting it without DBA's knowledge. DBA still feels that nothing has happened and it will take a long time to find out the problem.


, containers are more suitable for deploying stateless services and can be restarted at will.and container deployment stateful container is not impossible, but to be very careful, even is not recommend. Although many container platforms support stateful containers, platforms often cannot solve data problems unless you are very familiar with the applications in the container. When the container is hung up, you can know exactly what is lost, what matters, and what does not matter, and you have to write code to handle these situations before you can support restart. Netease's database on this side is to modify MySQL source code to ensure complete synchronization of data between the main and standby when the main and standby are synchronized, so that the standby can automatically switch to the main when the main and standby are hung up. It is very uneconomical to promote the automatic restart of stateful containers for serving customers, because customers often do not know the logic of the application, and even buy the application. If you use a stateful container and allow it to restart automatically, you will still be blamed when the customer finds that the data is lost., the automatic restart of stateful services is not unavailable and needs to be professional enough.


Myth 5: Containers can use the container platform for service discovery


container platforms Swarm,Kubernetes and Mesos all support service discovery. When one service accesses another service, the service name will be converted into VIP, and then the specific container will be accessed.However, people will find that for applications written in Java, calls between services are not discovered by services of the container platform, but by services of Dubbo or Spring Cloud. Because the service discovery of the container platform layer is still a basic comparison, it is basically a domain name mapping process, and there is no good support for fusing, current limiting, and demotion. However, since service discovery is used, it is still hoped that the service discovery middleware can do this. Therefore, the use of container platforms between service discovery between services is less, and the more highly concurrent applications are required, the more so.

Is the service of the container platform useless?not, slowly you will find that internal service discovery is on the one hand, these Dubbo and Spring Cloud can handle, while external service discovery is different.such as access to the database, cache, etc., in the end should be configured with a database service name, or IP address? If you use an IP address, it will cause the configuration is very complex, because many application configuration is complex, is dependent on too many external applications, but also the most difficult to manage on the one hand. If there is an external service discovery, the configuration will be much simpler, and you only need to configure the name of the external service. If the external service address changes, you can flexibly change the external service discovery.


Myth 6: Containers can be elastically scaled based on images


on the container platform, if the container has a number of copies, as long as the number of copies is changed from 5 to 10, the container is automatically scaled based on the image.virtual machines can also do this. AWS Autoscaling is based on virtual machine mirroring. If it is in the same cloud, there is no difference. Of course, if it is a cross-cloud stateless container elastic scaling, the container is much more convenient, can realize the hybrid cloud mode, when the high concurrency scenario, the stateless container scale-up to the public cloud, this virtual machine is not possible.


container misunderstanding summary


as shown in the above figure, on the left is the advantage of the so-called container that is often talked about, but the virtual machines can go back one by one.


If a traditional application is deployed, the application starts slowly, has a small number of processes, and is basically not updated,then the virtual machine can fully meet the requirements:

  • application starts slowly:application starts for 15 minutes, the container itself is in seconds, and many platforms of virtual machines can be optimized to more than ten seconds. There is almost no difference between the two.

  • large memory footprint:can't move 32G,64G memory, a machine can't run a few.

  • basic does not update:is updated once every six months, and the virtual machine image can still be upgraded and rolled back.

  • application has a state:, you can't recover even if you start in seconds. Moreover, you may lose data and blindly restart without repair.

  • processes:two or three processes configure each other without service discovery and configuration.


is a traditional application, there is no need to spend energy to container, because the white spent effort, do not enjoy the benefits.



containerization, microservices, DevOps trinity




under what circumstances should you consider making changes?


the traditional business was suddenly impacted by the Internet business. The application is always changing. It needs to be updated in three days and two ends, and the traffic has increased. The original payment system used to withdraw money and swipe cards. Now it has to be paid on the Internet, and the traffic has increased by n times.can't help it, one word: tear it down.is disassembled, and each sub-module changes independently, with less influence on each other.was opened. Originally, one process carried the traffic, but now multiple processes carry it together.is called microservices.microservice scenario, there are many processes and fast updates, so there are 100 processes and one mirror every day. The container was happy. Each container image was small and there was no problem. The virtual machine cried because each image of the virtual machine was too large., in the microservice scenario, you can start to consider using containers.virtual machine is angry. Lao Zi does not need a container. After the microservice is split, the automatic deployment with Ansible is the same.


say that there is no problem from a technical point of view,, however, the problem arises from an organizational point of view.the average company, there will be more development than operation and maintenance. After development and code writing, there is no need to worry about it. The deployment of the environment is entirely the responsibility of operation and maintenance. For automation, operation and maintenance write Ansible scripts to solve problems. However, with so many processes, which are split and merged, updated so quickly, the configuration is always changed, and the Ansible scripts have to be changed frequently. They go online every day and must not be exhausted in operation and maintenance.so in such a large workload, operation and maintenance is easy to make mistakes, even through automated scripts.this time, the container can be used as a very good tool., in addition to the container from a technical point of view, most of the internal configuration can be placed in the image. More importantly, from a process point of view, the environmental configuration has been pushed forward to the development. After the development is completed, the issue of environmental deployment needs to be considered instead of being the shopkeeper.doing so is that although there are many processes, configuration changes and frequent updates, this amount is very small for the development team of a certain module, because 5-10 people specialize in maintaining the configuration and updates of this module, which is not easy to make mistakes., if all these workloads are handed over to a small number of operation and maintenance teams, not only will the information transfer make the environment configuration inconsistent, but the deployment volume will be very large.


container is a very good tool, that is, it can save 200 percent of the operation and maintenance work by only 5% more work for each development, and it is not easy to make mistakes.However, the development has done what the operation and maintenance should have done, is the development boss willing? Will the development boss complain to the operation and maintenance boss? This is not a technical problem, in fact, this is DevOps,DevOps is not do not distinguish between development and operation and maintenance, but the company from organization to process, can get through, see how to cooperate, how to divide the boundary, more beneficial to the stability of the system.

so microservices, DevOps, and containers are complementary and inseparable.is not a microservice, no container at all, virtual machine can be done, no DevOps is needed, deployment once a year, development and operation and maintenance communication can be done slowly.


, the essence of containers is cross-environment migration based on images.mirroring is a fundamental invention of containers and a standard for encapsulation and operation. Other namespace,cgroup, have long existed. This is a technical aspect.


in terms of processes, mirroring is a good tool for DevOps.containers are for cross-environment migration. The first migration scenario is between development, testing, and production environments.if there is no need to migrate, or if the migration is not frequent, the virtual machine image is fine, but it always needs to be migrated, with hundreds of g of virtual machine image, which is too big. The second migration scenario is cross-cloud migration, cross-public cloud, cross-Region, and cross-OpenStack virtual machine migration is very troublesome or even impossible, because the public cloud does not provide the download and upload functions of virtual machine images, and the virtual machine images are too large to be transmitted for one day. Therefore, containers are also a good use scenario in cross-cloud scenarios and hybrid cloud scenarios. This also solves the problem that only private cloud resources are insufficient and cannot carry traffic.



Containers



Based on the above analysis, we find that containers are recommend used in the following scenarios:

  • deployment of stateless services, complementary use with virtual machines, to achieve isolation.

  • If you want to deploy stateful services, you need to know a lot about the applications inside.

  • can smoothly migrate between development, testing and production.

  • is suitable for application deployment and auto scaling in cross-cloud, cross-region, cross-data center, and hybrid cloud scenarios.

  • uses containers as the delivery of applications to maintain environmental consistency and establish the concept of immutable infrastructure.

  • a program that runs the basic task type of the process.

  • is used to manage changes. Applications with frequent changes use container images and version numbers, which is lightweight and convenient.

  • , when using containers, we must manage applications and design health check and fault tolerance.



Created on:2018年3月22日 10:56
PV:0
Collect