2020. 1. 25. 05:44ㆍ카테고리 없음
I'm using Elastic 2.2 inside a Docker container. I have a 3 nodes setup each node on a separate machine. I'm using Elasticsearch both as main data repository and off course a search engine. I run off a problem multiple times: I'm Bulk indexing a huge amount of documents via bulk processor using the.
Ask HN: Is it just me or why does Docker suck so much? Hacker News 97 points by For all the massive hype about docker, I am hugely dissapointed. Not being able to change /etc/hosts, /etc/resolv.conf of a container, ugh. Requiring some really ugly hacks just to actually provide an real 'containement' of an entire applications environment - 'uh yeah except hosts and resolv, cant do that'. The command syntax is lies, docker rmi can untag an image not really remove, and who came up with the name rmi?
Docker images already exist, docker images -rm someId would be sane. The biggest flaw though, is that its a pain in the ass to setup a private repository and actually use it.
Isnt there some saner alternatives, like lxc with images and shareing? I've been waiting for the hype to die down a bit and for the project to stabilise before properly playing with it but, from the outside looking in, I must admit I struggle to see how it's gaining as much attention as it is. I can see the advantage for dev boxes where a developer might want to setup a load of containers on their machine to emulate a staging or production environment. But I don't really understand why you'd want to base your entire production infrastructure on it. What's wrong with setting up kvm 'gold' images for your various server types (db server, redis instance, ha proxy server etc.) and then just dd'ing those images to host machines and using ansible/puppet/chef to do any final role configuration on first boot? At least that way you've got all the security and flexibility a proper vm implies with not much more admin overhead than if you'd used docker.
The principal difference is that the virtual machine contains everything whereas you can spawn a container for all the components of your infrastructure. It's easy to restart a container if it crashed and container can be tested individually before being pushed in production. An application will not mess with the configuration of another app (that's solving the problem of virtualenv, rvm and apt incompatibilities). The application is not tight to a physical machine, a container can connect to another machine instead of the local one just by changing the routing or environment variables if everything is done properly.
It's the UNIX philosophy applied to applications, everything must be as small as possible, do only one task and do it properly and I think that's part of docker's popularity. However like most things, it's not really magic and it must be designed and maintained properly to get all these benefits. What's wrong with that is that now your application-and, by extension, your developers-has (have) to care about a lot more stuff and needs to have its testing surface expanded appropriately. The potential interactions and therefore potential failures between, say, ops's cron jobs, the wire-up of stuff like logstash, changes to your chef configs, etc.means there's a ton more testing to be done. In a Docker world, the amount of stuff running in a context that the application can see is minimal and it's all related directly to the application in question. Your inputs and outputs can be clearly defined and this lets you present a stable API to the outside world (via volumes and links) while, as or even more importantly, providing a known-good, stripped-down environment for the application to run on. (chroot jails are a fine solution too, but I do prefer the relative handholding and easier application independence of Docker and AuFS.) The other problem with your suggestion - dd'ing and then running Chef or whatever - is boot time.
I can spin up an AMI ready to deploy Docker apps in under sixty seconds. We work in AWS, so I can't speak to the actual time of deploying a physical server, but I can download and deploy a Docker image from an AWS-hosted private registry in ten seconds.
Chef won't even get its act together in thirty seconds, let alone do your stuff; between downloading a full VM disk image, the startup time for that virtual machine, and the time to run chef, you've wasted so much time. Here at Localytics, we're trying to transition to an environment that is very comfortable with automatic scaling and can do so fast-seconds, rather than minutes-and Docker is a big part of that effort. Right now we deploy to AWS autoscaling groups (with AMIs defined via Packer). I'm pushing hard for us to use a clusterization system like Mesos/Deimos because it'll let us save money by introducing safe, sane multi-tenancy-and it'll let us react to changes in system load in mere seconds. (obligatory: we're hiring.
We work on fun stuff. I've been playing around with docker, so I will lend my two cents. If kvm gold images work for you, then by all means use them! What I like about docker: The layered filesystem 0 is easy to work with and seems pretty smart. I can easily tweak an image, push it to docker hub and pull it back down. Only the delta is being pushed and pulled, not the whole filesystem. This seems like a pretty big win to me!
Your db server and redis instance can share the same base system image. This makes updates and redeploying the images much faster! Also the containers are much lighter weight to run than a vm. You don't need a beefy machine to run a few containers.
I'm under the impression that one can run 20-30 containers on a modern laptop, but haven't verified this for myself. What's wrong with setting up kvm 'gold' images for your various server types (db server, redis instance, ha proxy server etc.) and then just dd'ing those images to host machines and using ansible/puppet/chef to do any final role configuration on first boot?
You're leaving out the other attractive aspects of Docker/containers: performance, lower cpu utilization, instant spin up, less disk space, etc. Those factors lead to higher density of images on servers.
Puppet/Chef + hypervisors can't compete with those particular factors. In summary: Containers have less isolation, but can be more densely packed. VMs have more isolation, but less densely packed. The 2 technologies have different tradeoffs and economics. I setup docker for our server deployments here at lever.co. We deploy our application multiple times a day. We test our changes in our staging environment first, and its important that we deploy the exact same code to production that we tested.
Its also important that we can rollback any changes we make. An application-image is the right way to do that. (With compiled dependancies and whatnot). And we could make an OS image every time we deploy code, but making a 600MB Ubuntu image to deploy a few lines of code change is ridiculous. There's certainly lots of things that docker could do better, but I haven't seen any tools that let me deploy so conveniently, easily & reliably. You can create 'base' image of your application and update only code. Part of our fabric deploy script: with api.settings(.envvars): api.run('docker run baseimage sh -c 'git pull && cp /src/webapp/settings/pathsdocker.py /src/webapp/settings/paths.py') d = datetime.datetime.now.strftime('%Y-%m-%d,%H-%M') # new image from last running container with updates api.run('docker commit $(docker ps -lq) baseimage:%s'% d) api.run('docker stop $(docker ps -aq)') api.run('docker run -d -p 127.0.0.1:8073:8083 baseimage:%s sh runindocker.sh'% d).
Doing the kvm, plus ansible/puppet/chef approach has the overhead of making testing harder. I have Docker containers that I rebuild on every test/dev run on some applications, because it is so fast. It means I know the Dockerfile is 100% up to date with my dependencies etc. By the time I'm done testing the application, I'm done testing the container, and it's ready to deploy. If you need the added security of KVM or similar to isolate your app components, nothing stops you from isolating your Docker containers in a very minimal KVM - even just one container per VM if you prefer, and still get the containerization benefits (which to me is more about having 'portable', deployable known fixed units with a repeatable build process than about how it is virtualised; the virtualisation/isolation is a bonus - we could have used Docker without it, by, as mentioned, just running one container per VM). Isn't it just as much about deployment though? The thing about docker is that deploying your new app basically means stopping your running instance and starting a new one.
You actually don't even NEED to stop the current running one if you have something above it that just knows which instances / ports to route traffic to. I think docker is about ease of deployment. You still use the same server you just run a different command. As opposed to creating a new image and spinning up said image. You can imagine roll back / migration is pretty easy in this case. We should not be too harsh with people who inadvertently post their secret key. They may be very nice, well-meaning, smart people who, you know, make a mistake and post their key.
Maybe it is in an environment file, and they post it to their public github. Could happen. Maybe they forgot and went to sleep, and found out their key got jacked. Maybe - hypothetical here - someone ran up $1000s of dollars producing bitcoins on this innocent person's account who made one mistake.
One tiny mistake. You never know. Yes I have ran that command, I have docker registry running, then I wanted to clean that port and so I put nginx infront and let it proxypass. And have a onename alias for the host that serves the registry so its used by all the machines. But then you see the hostname is confused for username by docker. Argh crap so it cant be clean, at least has to have fqdn. Or port number.
Its stuff like this man. Documentation should say it 'we dont want you to use other registry than ours docker.io its possible but nah'. Another issue is, how would you set to use private repo by default?
And never hit docker.io? The private registry story is horrible in docker.
You have to tag your image with your registry's FQDN, which is an absolutely braindead idea. It's nice to default to docker.io in the absence of options, but I should really be able to do 'docker pull -registry myregistry.local ubuntu:latest' and have it work.
Instead I have to do 'docker pull myregistry.local/ubuntu', which pulls the image and tags it as 'myregistry.local/ubuntu'. Great, now my registry FQDN is in my image tag.
For any decent automation now you have to re-tag it without the 'myregistry.local' so you don't depend on your registry FQDN everywhere. But then you better remember to re-tag it with your registry before you push! In our case we wrote an HDFS driver for the registry so we could store images everywhere, and a service discovery layer to discover registry endpoints (using a load balancer just plain didn't work.) It's an unholy nightmare (but at least we've automated it) to continually re-tag images to go to the endpoint you want. Libvirt has support for lxc these days, if memory serves. I'd recommend it - docker just seems heavily marketed.
What you're mentioning with hosts/resolv etc is a problem that has been 'solved' with tools like etcd and zookeeper as someone else mentioned. I tried docker with a couple of things, and found that it is an environment that (at the time I experienced it, maybe six months ago) was so unhelpful as to appear completely broken. It isn't for systems administrators, or anyone who knows how to do things the unix way; it's for developers who can't be bothered to learn how to do things sensibly.
Half the unix ecosystem has been reimplemented, probably not that well, by people who didn't know it existed in the first place. That's my conclusion so far. Prepares to be flamed. I'm with you on the re-inventing the wheel thing. It happens, over and over again, pretty much everywhere. Heck, a significantly large portion of technology we see here on HN these days is, to put it bluntly, a lot of re-invention. But this is really a normal aspect of a healthy, technological ecosphere.
Kids grow up, they get interested in a subject, they ignore all the prior art, and they get on with doing things that they think are interesting - including fixing 'whats broke' (which often translates to 'whats not well-known'). All technology culture suffers this factor. Why complain: its a principle driver of the state of the art, because only the good technology survives this onslaught. If its known-about in the first place, it rarely gets re-invented. Discussion about the issue is here: That said: With a DHCP server, you get this warning when you try to edit /etc/resolv.conf: 'Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) DO NOT EDIT THIS FILE BY HAND - YOUR CHANGES WILL BE OVERWRITTEN' Docker works in a similar fashion to assign IPs so this shouldn't be surprising. You are supposed to modify /etc/default/docker and use a consistent group of DNS servers per-host. Its simple and it works, honestly.
DOCKEROPTS='-dns 8.8.8.8' Can you tell I disagree?;) /etc/hosts shouldn't need modification if you control your dns server.since you can just place whatever you need there. As for alternatives. Docker is popular because all the alternatives are much larger PIA to manage. I'd suggest, if you dislike their private repo system, you just use git to manage the files for each docker image and create it locally on the host. Git clone, cd, docker build. I honestly fine that works well enough and means I don't have to maintain more than a single gitlab instance for my projects. Some problems I currently have with docker.
Dockerfiles are to static You cannot start the docker build process with an variable (eg. A var that holds a specific branch to checkout) in your dockerfiles. Managing the build process,starting and linking of multiple images/containers I started with bash scripts then switched a tool called fig. Even though fig keeps things in a simple config file for whole setups I cannot use it because it does not wait for the DB container until it's ready to accept connections before starting a container that links to it. So I'm back writing bash scripts. Replacing running containers Restarted containers get a new IP, so all the linking does not work anymore.
I had to setup a DNS server and wrap the restarting of containers in bash scripts again. I had no big issues creating a private registry after the ssl certificate was created correctly. Docker certainly has its pain points but I've actually really enjoyed working it into our build and deployment process.
The command syntax is sometimes a bit clunky but they are making regular improvements. Not to repeat lgbr, but the registry is pretty easy to get running.
I had some problems initially, but it was mostly because I didn't really understand how to use a container all that well. That sounds somewhat silly, but it ultimately was true. Finally, the hype is ultimately a good thing.
There's a lot of focus on the project right now, which (hopefully) means we can expect a good deal of improvement and stability in the near future. After having used docker for 'real' things for the past 8 months or so, I definitely agree with you that it kinda sucks. Docker's strengths come from the workflow you get to use when you use it. 'Run /bin/bash in ubuntu' and it just works.
For developers that's great. For a backend that does the heavy lifting for you when you're developing a lot of operations automation (like a PaaS), it starts to break down. Just some of the things I've come across:. Running a private registry is awkward. You have to tag images with the FQDN of your registry as a prefix (which is braindead) for it to 'detect' that it's supposed to use your registry to push the image. 'Tags' as an abstraction shouldn't work that way. They should be independent of where you want to store them.
Pushes and pulls, even over LAN (hell, even to localhost) are god-awful slow. I don't know whether they're doing some naive I/O where they're only sending a byte at a time, or what, but it's much, much, much slower than a cURL download to the same endpoint. Plus if you're using devmapper then there's a nice 10-second pause between each layer that downloads. Btrfs and aufs are better but good luck getting those into a CentOS 6 install. This is a major drawback because if you want to use docker as a mesos containerizer or otherwise for tasks that require fast startup time on a machine that hasn't pulled your image yet (ie. A PaaS), you have to wait far too long for the image to download.
Tarballs extracted into a read-only chroot/namespace are faster and simpler. Docker makes a huge horrible mess of your storage.
In the devmapper world (where we're stuck in if we're using centos) containers take up tons of space (not just the images, but the containers themselves) and you have to be incredibly diligent about 'docker rm' when you're done. You can't do 'docker run -rm' when using '-d' either, since the flags conflict. In a similar vein, images are way bigger than they ought to be (my dockerfile should have spit out maybe 10 megs tops, why is this layer 800MB?). The docker daemon.
I hate using a client/server model for docker. Why can't the lxc/libcontainer run command be a child process of my docker run command? Why does docker run have to talk to a daemon that then runs my container? It breaks a lot of expectations for things like systemd and mesos. Now we have to go through hoops to get our container in the same cgroup as the script running docker run. It also becomes a single point of failure. If the docker daemon crashes so do all of your containers.
They 'fix' this by forwarding signals from the run command to the underlying container but it's all a huge horrible hack when they should just abandon client/server and decentralize it. (It's all the same binary anyway).
The other issues we've seen have mostly just been bugs we've seen that have been fixed over time. Things like containers just not getting network connections at all any more (nc -vz shows them as listening but no data gets sent or received), changing the 'ADD /tarball.tgz' behavior repeatedly throughout releases, random docker daemon hangs, etc. If as we're using docker for more and more serious things, we end up getting an odd suspicion that we're outgrowing it. We're sticking with it for now because we don't have the time to develop an alternative but I really wish it was faster and more mature.
Docker Loses Ability To Connect (multiple Times A Day)
Yeah, docker is god damn slow. The layer file system is a great idea, but its implementation sucks. When you push an image which its base image is already pushed, you will see tons of Image already pushed, skipping Image already pushed, skipping Image already pushed, skipping. This is really stupid, it just cannot compare the list of layer image and push or pull the missing parts. And the pushing and pulling operations are slow like hell.
It's really painful to use it in production. It slows down your whole deployment process, and it eventually becomes the bottleneck. It's really funny they pick go language which advertises for performance, but they failed to make very basic task works efficiently. I personally would say 'It's just you.'
Docker has been huge for us. We can run the same containers locally that we run in production.
Dockerfiles are SIMPLE and easy for people to create and understand. Fig has really made using docker for local dev rather pleasant. Are there some hiccups here and there? The project is young and they are actively trying to smooth over a lot of issues and pain points. I feel like most people who dislike docker have not actually tried it or used it. That could just be a wrong opinion though.