BigW Consortium Gitlab

Commit d8f33c0a by Achilleas Pipinellis

Move operations/ to new location

[ci skip]
parent 6602b917
......@@ -34,7 +34,7 @@
- [Libravatar](customization/libravatar.md) Use Libravatar instead of Gravatar for user avatars.
- [Log system](administration/logs.md) Log system.
- [Environment Variables](administration/environment_variables.md) to configure GitLab.
- [Operations](operations/README.md) Keeping GitLab up and running.
- [Operations](administration/operations.md) Keeping GitLab up and running.
- [Raketasks](raketasks/README.md) Backups, maintenance, automatic webhook setup and the importing of projects.
- [Repository checks](administration/repository_checks.md) Periodic Git repository checks.
- [Repository storages](administration/repository_storages.md) Manage the paths used to store repositories.
......
# GitLab operations
- [Sidekiq MemoryKiller](operations/sidekiq_memory_killer.md)
- [Cleaning up Redis sessions](operations/cleaning_up_redis_sessions.md)
- [Understanding Unicorn and unicorn-worker-killer](operations/unicorn.md)
- [Moving repositories to a new location](operations/moving_repositories.md)
# Cleaning up stale Redis sessions
Since version 6.2, GitLab stores web user sessions as key-value pairs in Redis.
Prior to GitLab 7.3, user sessions did not automatically expire from Redis. If
you have been running a large GitLab server (thousands of users) since before
GitLab 7.3 we recommend cleaning up stale sessions to compact the Redis
database after you upgrade to GitLab 7.3. You can also perform a cleanup while
still running GitLab 7.2 or older, but in that case new stale sessions will
start building up again after you clean up.
In GitLab versions prior to 7.3.0, the session keys in Redis are 16-byte
hexadecimal values such as '976aa289e2189b17d7ef525a6702ace9'. Starting with
GitLab 7.3.0, the keys are
prefixed with 'session:gitlab:', so they would look like
'session:gitlab:976aa289e2189b17d7ef525a6702ace9'. Below we describe how to
remove the keys in the old format.
First we define a shell function with the proper Redis connection details.
```
rcli() {
# This example works for Omnibus installations of GitLab 7.3 or newer. For an
# installation from source you will have to change the socket path and the
# path to redis-cli.
sudo /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket "$@"
}
# test the new shell function; the response should be PONG
rcli ping
```
Now we do a search to see if there are any session keys in the old format for
us to clean up.
```
# returns the number of old-format session keys in Redis
rcli keys '*' | grep '^[a-f0-9]\{32\}$' | wc -l
```
If the number is larger than zero, you can proceed to expire the keys from
Redis. If the number is zero there is nothing to clean up.
```
# Tell Redis to expire each matched key after 600 seconds.
rcli keys '*' | grep '^[a-f0-9]\{32\}$' | awk '{ print "expire", $0, 600 }' | rcli
# This will print '(integer) 1' for each key that gets expired.
```
Over the next 15 minutes (10 minutes expiry time plus 5 minutes Redis
background save interval) your Redis database will be compacted. If you are
still using GitLab 7.2, users who are not clicking around in GitLab during the
10 minute expiry window will be signed out of GitLab.
# Moving repositories managed by GitLab
Sometimes you need to move all repositories managed by GitLab to
another filesystem or another server. In this document we will look
at some of the ways you can copy all your repositories from
`/var/opt/gitlab/git-data/repositories` to `/mnt/gitlab/repositories`.
We will look at three scenarios: the target directory is empty, the
target directory contains an outdated copy of the repositories, and
how to deal with thousands of repositories.
**Each of the approaches we list can/will overwrite data in the
target directory `/mnt/gitlab/repositories`. Do not mix up the
source and the target.**
## Target directory is empty: use a tar pipe
If the target directory `/mnt/gitlab/repositories` is empty the
simplest thing to do is to use a tar pipe. This method has low
overhead and tar is almost always already installed on your system.
However, it is not possible to resume an interrupted tar pipe: if
that happens then all data must be copied again.
```
# As the git user
tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
tar -C /mnt/gitlab/repositories -xf -
```
If you want to see progress, replace `-xf` with `-xvf`.
### Tar pipe to another server
You can also use a tar pipe to copy data to another server. If your
'git' user has SSH access to the newserver as 'git@newserver', you
can pipe the data through SSH.
```
# As the git user
tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
ssh git@newserver tar -C /mnt/gitlab/repositories -xf -
```
If you want to compress the data before it goes over the network
(which will cost you CPU cycles) you can replace `ssh` with `ssh -C`.
## The target directory contains an outdated copy of the repositories: use rsync
If the target directory already contains a partial / outdated copy
of the repositories it may be wasteful to copy all the data again
with tar. In this scenario it is better to use rsync. This utility
is either already installed on your system or easily installable
via apt, yum etc.
```
# As the 'git' user
rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
/mnt/gitlab/repositories
```
The `/.` in the command above is very important, without it you can
easily get the wrong directory structure in the target directory.
If you want to see progress, replace `-a` with `-av`.
### Single rsync to another server
If the 'git' user on your source system has SSH access to the target
server you can send the repositories over the network with rsync.
```
# As the 'git' user
rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
git@newserver:/mnt/gitlab/repositories
```
## Thousands of Git repositories: use one rsync per repository
Every time you start an rsync job it has to inspect all files in
the source directory, all files in the target directory, and then
decide what files to copy or not. If the source or target directory
has many contents this startup phase of rsync can become a burden
for your GitLab server. In cases like this you can make rsync's
life easier by dividing its work in smaller pieces, and sync one
repository at a time.
In addition to rsync we will use [GNU
Parallel](http://www.gnu.org/software/parallel/). This utility is
not included in GitLab so you need to install it yourself with apt
or yum. Also note that the GitLab scripts we used below were added
in GitLab 8.1.
** This process does not clean up repositories at the target location that no
longer exist at the source. ** If you start using your GitLab instance with
`/mnt/gitlab/repositories`, you need to run `gitlab-rake gitlab:cleanup:repos`
after switching to the new repository storage directory.
### Parallel rsync for all repositories known to GitLab
This will sync repositories with 10 rsync processes at a time. We keep
track of progress so that the transfer can be restarted if necessary.
First we create a new directory, owned by 'git', to hold transfer
logs. We assume the directory is empty before we start the transfer
procedure, and that we are the only ones writing files in it.
```
# Omnibus
sudo mkdir /var/opt/gitlab/transfer-logs
sudo chown git:git /var/opt/gitlab/transfer-logs
# Source
sudo -u git -H mkdir /home/git/transfer-logs
```
We seed the process with a list of the directories we want to copy.
```
# Omnibus
sudo -u git sh -c 'gitlab-rake gitlab:list_repos > /var/opt/gitlab/transfer-logs/all-repos-$(date +%s).txt'
# Source
cd /home/git/gitlab
sudo -u git -H sh -c 'bundle exec rake gitlab:list_repos > /home/git/transfer-logs/all-repos-$(date +%s).txt'
```
Now we can start the transfer. The command below is idempotent, and
the number of jobs done by GNU Parallel should converge to zero. If it
does not some repositories listed in all-repos-1234.txt may have been
deleted/renamed before they could be copied.
```
# Omnibus
sudo -u git sh -c '
cat /var/opt/gitlab/transfer-logs/* | sort | uniq -u |\
/usr/bin/env JOBS=10 \
/opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \
/var/opt/gitlab/transfer-logs/success-$(date +%s).log \
/var/opt/gitlab/git-data/repositories \
/mnt/gitlab/repositories
'
# Source
cd /home/git/gitlab
sudo -u git -H sh -c '
cat /home/git/transfer-logs/* | sort | uniq -u |\
/usr/bin/env JOBS=10 \
bin/parallel-rsync-repos \
/home/git/transfer-logs/success-$(date +%s).log \
/home/git/repositories \
/mnt/gitlab/repositories
`
```
### Parallel rsync only for repositories with recent activity
Suppose you have already done one sync that started after 2015-10-1 12:00 UTC.
Then you might only want to sync repositories that were changed via GitLab
_after_ that time. You can use the 'SINCE' variable to tell 'rake
gitlab:list_repos' to only print repositories with recent activity.
```
# Omnibus
sudo gitlab-rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\
sudo -u git \
/usr/bin/env JOBS=10 \
/opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \
success-$(date +%s).log \
/var/opt/gitlab/git-data/repositories \
/mnt/gitlab/repositories
# Source
cd /home/git/gitlab
sudo -u git -H bundle exec rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\
sudo -u git -H \
/usr/bin/env JOBS=10 \
bin/parallel-rsync-repos \
success-$(date +%s).log \
/home/git/repositories \
/mnt/gitlab/repositories
```
# Sidekiq MemoryKiller
The GitLab Rails application code suffers from memory leaks. For web requests
this problem is made manageable using
[unicorn-worker-killer](https://github.com/kzk/unicorn-worker-killer) which
restarts Unicorn worker processes in between requests when needed. The Sidekiq
MemoryKiller applies the same approach to the Sidekiq processes used by GitLab
to process background jobs.
Unlike unicorn-worker-killer, which is enabled by default for all GitLab
installations since GitLab 6.4, the Sidekiq MemoryKiller is enabled by default
_only_ for Omnibus packages. The reason for this is that the MemoryKiller
relies on Runit to restart Sidekiq after a memory-induced shutdown and GitLab
installations from source do not all use Runit or an equivalent.
With the default settings, the MemoryKiller will cause a Sidekiq restart no
more often than once every 15 minutes, with the restart causing about one
minute of delay for incoming background jobs.
## Configuring the MemoryKiller
The MemoryKiller is controlled using environment variables.
- `SIDEKIQ_MEMORY_KILLER_MAX_RSS`: if this variable is set, and its value is
greater than 0, then after each Sidekiq job, the MemoryKiller will check the
RSS of the Sidekiq process that executed the job. If the RSS of the Sidekiq
process (expressed in kilobytes) exceeds SIDEKIQ_MEMORY_KILLER_MAX_RSS, a
delayed shutdown is triggered. The default value for Omnibus packages is set
[in the omnibus-gitlab
repository](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/attributes/default.rb).
- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defaults 900 seconds (15 minutes). When
a shutdown is triggered, the Sidekiq process will keep working normally for
another 15 minutes.
- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defaults to 30 seconds. When the grace
time has expired, the MemoryKiller tells Sidekiq to stop accepting new jobs.
Existing jobs get 30 seconds to finish. After that, the MemoryKiller tells
Sidekiq to shut down, and an external supervision mechanism (e.g. Runit) must
restart Sidekiq.
- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_SIGNAL`: defaults to `SIGKILL`. The name of
the final signal sent to the Sidekiq process when we want it to shut down.
# Understanding Unicorn and unicorn-worker-killer
## Unicorn
GitLab uses [Unicorn](http://unicorn.bogomips.org/), a pre-forking Ruby web
server, to handle web requests (web browsers and Git HTTP clients). Unicorn is
a daemon written in Ruby and C that can load and run a Ruby on Rails
application; in our case the Rails application is GitLab Community Edition or
GitLab Enterprise Edition.
Unicorn has a multi-process architecture to make better use of available CPU
cores (processes can run on different cores) and to have stronger fault
tolerance (most failures stay isolated in only one process and cannot take down
GitLab entirely). On startup, the Unicorn 'master' process loads a clean Ruby
environment with the GitLab application code, and then spawns 'workers' which
inherit this clean initial environment. The 'master' never handles any
requests, that is left to the workers. The operating system network stack
queues incoming requests and distributes them among the workers.
In a perfect world, the master would spawn its pool of workers once, and then
the workers handle incoming web requests one after another until the end of
time. In reality, worker processes can crash or time out: if the master notices
that a worker takes too long to handle a request it will terminate the worker
process with SIGKILL ('kill -9'). No matter how the worker process ended, the
master process will replace it with a new 'clean' process again. Unicorn is
designed to be able to replace 'crashed' workers without dropping user
requests.
This is what a Unicorn worker timeout looks like in `unicorn_stderr.log`. The
master process has PID 56227 below.
```
[2015-06-05T10:58:08.660325 #56227] ERROR -- : worker=10 PID:53009 timeout (61s > 60s), killing
[2015-06-05T10:58:08.699360 #56227] ERROR -- : reaped #<Process::Status: pid 53009 SIGKILL (signal 9)> worker=10
[2015-06-05T10:58:08.708141 #62538] INFO -- : worker=10 spawned pid=62538
[2015-06-05T10:58:08.708824 #62538] INFO -- : worker=10 ready
```
### Tunables
The main tunables for Unicorn are the number of worker processes and the
request timeout after which the Unicorn master terminates a worker process.
See the [omnibus-gitlab Unicorn settings
documentation](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/doc/settings/unicorn.md)
if you want to adjust these settings.
## unicorn-worker-killer
GitLab has memory leaks. These memory leaks manifest themselves in long-running
processes, such as Unicorn workers. (The Unicorn master process is not known to
leak memory, probably because it does not handle user requests.)
To make these memory leaks manageable, GitLab comes with the
[unicorn-worker-killer gem](https://github.com/kzk/unicorn-worker-killer). This
gem [monkey-patches](https://en.wikipedia.org/wiki/Monkey_patch) the Unicorn
workers to do a memory self-check after every 16 requests. If the memory of the
Unicorn worker exceeds a pre-set limit then the worker process exits. The
Unicorn master then automatically replaces the worker process.
This is a robust way to handle memory leaks: Unicorn is designed to handle
workers that 'crash' so no user requests will be dropped. The
unicorn-worker-killer gem is designed to only terminate a worker process _in
between requests_, so no user requests are affected.
This is what a Unicorn worker memory restart looks like in unicorn_stderr.log.
You see that worker 4 (PID 125918) is inspecting itself and decides to exit.
The threshold memory value was 254802235 bytes, about 250MB. With GitLab this
threshold is a random value between 200 and 250 MB. The master process (PID
117565) then reaps the worker process and spawns a new 'worker 4' with PID
127549.
```
[2015-06-05T12:07:41.828374 #125918] WARN -- : #<Unicorn::HttpServer:0x00000002734770>: worker (pid: 125918) exceeds memory limit (256413696 bytes > 254802235 bytes)
[2015-06-05T12:07:41.828472 #125918] WARN -- : Unicorn::WorkerKiller send SIGQUIT (pid: 125918) alive: 23 sec (trial 1)
[2015-06-05T12:07:42.025916 #117565] INFO -- : reaped #<Process::Status: pid 125918 exit 0> worker=4
[2015-06-05T12:07:42.034527 #127549] INFO -- : worker=4 spawned pid=127549
[2015-06-05T12:07:42.035217 #127549] INFO -- : worker=4 ready
```
One other thing that stands out in the log snippet above, taken from
GitLab.com, is that 'worker 4' was serving requests for only 23 seconds. This
is a normal value for our current GitLab.com setup and traffic.
The high frequency of Unicorn memory restarts on some GitLab sites can be a
source of confusion for administrators. Usually they are a [red
herring](https://en.wikipedia.org/wiki/Red_herring).
# GitLab operations
- [Sidekiq MemoryKiller](sidekiq_memory_killer.md)
- [Cleaning up Redis sessions](cleaning_up_redis_sessions.md)
- [Understanding Unicorn and unicorn-worker-killer](unicorn.md)
This document was moved to [administration/operations](../administration/operations.md).
# Cleaning up stale Redis sessions
Since version 6.2, GitLab stores web user sessions as key-value pairs in Redis.
Prior to GitLab 7.3, user sessions did not automatically expire from Redis. If
you have been running a large GitLab server (thousands of users) since before
GitLab 7.3 we recommend cleaning up stale sessions to compact the Redis
database after you upgrade to GitLab 7.3. You can also perform a cleanup while
still running GitLab 7.2 or older, but in that case new stale sessions will
start building up again after you clean up.
In GitLab versions prior to 7.3.0, the session keys in Redis are 16-byte
hexadecimal values such as '976aa289e2189b17d7ef525a6702ace9'. Starting with
GitLab 7.3.0, the keys are
prefixed with 'session:gitlab:', so they would look like
'session:gitlab:976aa289e2189b17d7ef525a6702ace9'. Below we describe how to
remove the keys in the old format.
First we define a shell function with the proper Redis connection details.
```
rcli() {
# This example works for Omnibus installations of GitLab 7.3 or newer. For an
# installation from source you will have to change the socket path and the
# path to redis-cli.
sudo /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket "$@"
}
# test the new shell function; the response should be PONG
rcli ping
```
Now we do a search to see if there are any session keys in the old format for
us to clean up.
```
# returns the number of old-format session keys in Redis
rcli keys '*' | grep '^[a-f0-9]\{32\}$' | wc -l
```
If the number is larger than zero, you can proceed to expire the keys from
Redis. If the number is zero there is nothing to clean up.
```
# Tell Redis to expire each matched key after 600 seconds.
rcli keys '*' | grep '^[a-f0-9]\{32\}$' | awk '{ print "expire", $0, 600 }' | rcli
# This will print '(integer) 1' for each key that gets expired.
```
Over the next 15 minutes (10 minutes expiry time plus 5 minutes Redis
background save interval) your Redis database will be compacted. If you are
still using GitLab 7.2, users who are not clicking around in GitLab during the
10 minute expiry window will be signed out of GitLab.
This document was moved to [administration/operations/cleaning_up_redis_sessions](../administration/operations/cleaning_up_redis_sessions.md).
# Moving repositories managed by GitLab
Sometimes you need to move all repositories managed by GitLab to
another filesystem or another server. In this document we will look
at some of the ways you can copy all your repositories from
`/var/opt/gitlab/git-data/repositories` to `/mnt/gitlab/repositories`.
We will look at three scenarios: the target directory is empty, the
target directory contains an outdated copy of the repositories, and
how to deal with thousands of repositories.
**Each of the approaches we list can/will overwrite data in the
target directory `/mnt/gitlab/repositories`. Do not mix up the
source and the target.**
## Target directory is empty: use a tar pipe
If the target directory `/mnt/gitlab/repositories` is empty the
simplest thing to do is to use a tar pipe. This method has low
overhead and tar is almost always already installed on your system.
However, it is not possible to resume an interrupted tar pipe: if
that happens then all data must be copied again.
```
# As the git user
tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
tar -C /mnt/gitlab/repositories -xf -
```
If you want to see progress, replace `-xf` with `-xvf`.
### Tar pipe to another server
You can also use a tar pipe to copy data to another server. If your
'git' user has SSH access to the newserver as 'git@newserver', you
can pipe the data through SSH.
```
# As the git user
tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
ssh git@newserver tar -C /mnt/gitlab/repositories -xf -
```
If you want to compress the data before it goes over the network
(which will cost you CPU cycles) you can replace `ssh` with `ssh -C`.
## The target directory contains an outdated copy of the repositories: use rsync
If the target directory already contains a partial / outdated copy
of the repositories it may be wasteful to copy all the data again
with tar. In this scenario it is better to use rsync. This utility
is either already installed on your system or easily installable
via apt, yum etc.
```
# As the 'git' user
rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
/mnt/gitlab/repositories
```
The `/.` in the command above is very important, without it you can
easily get the wrong directory structure in the target directory.
If you want to see progress, replace `-a` with `-av`.
### Single rsync to another server
If the 'git' user on your source system has SSH access to the target
server you can send the repositories over the network with rsync.
```
# As the 'git' user
rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
git@newserver:/mnt/gitlab/repositories
```
## Thousands of Git repositories: use one rsync per repository
Every time you start an rsync job it has to inspect all files in
the source directory, all files in the target directory, and then
decide what files to copy or not. If the source or target directory
has many contents this startup phase of rsync can become a burden
for your GitLab server. In cases like this you can make rsync's
life easier by dividing its work in smaller pieces, and sync one
repository at a time.
In addition to rsync we will use [GNU
Parallel](http://www.gnu.org/software/parallel/). This utility is
not included in GitLab so you need to install it yourself with apt
or yum. Also note that the GitLab scripts we used below were added
in GitLab 8.1.
** This process does not clean up repositories at the target location that no
longer exist at the source. ** If you start using your GitLab instance with
`/mnt/gitlab/repositories`, you need to run `gitlab-rake gitlab:cleanup:repos`
after switching to the new repository storage directory.
### Parallel rsync for all repositories known to GitLab
This will sync repositories with 10 rsync processes at a time. We keep
track of progress so that the transfer can be restarted if necessary.
First we create a new directory, owned by 'git', to hold transfer
logs. We assume the directory is empty before we start the transfer
procedure, and that we are the only ones writing files in it.
```
# Omnibus
sudo mkdir /var/opt/gitlab/transfer-logs
sudo chown git:git /var/opt/gitlab/transfer-logs
# Source
sudo -u git -H mkdir /home/git/transfer-logs
```
We seed the process with a list of the directories we want to copy.
```
# Omnibus
sudo -u git sh -c 'gitlab-rake gitlab:list_repos > /var/opt/gitlab/transfer-logs/all-repos-$(date +%s).txt'
# Source
cd /home/git/gitlab
sudo -u git -H sh -c 'bundle exec rake gitlab:list_repos > /home/git/transfer-logs/all-repos-$(date +%s).txt'
```
Now we can start the transfer. The command below is idempotent, and
the number of jobs done by GNU Parallel should converge to zero. If it
does not some repositories listed in all-repos-1234.txt may have been
deleted/renamed before they could be copied.
```
# Omnibus
sudo -u git sh -c '
cat /var/opt/gitlab/transfer-logs/* | sort | uniq -u |\
/usr/bin/env JOBS=10 \
/opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \
/var/opt/gitlab/transfer-logs/success-$(date +%s).log \
/var/opt/gitlab/git-data/repositories \
/mnt/gitlab/repositories
'
# Source
cd /home/git/gitlab
sudo -u git -H sh -c '
cat /home/git/transfer-logs/* | sort | uniq -u |\
/usr/bin/env JOBS=10 \
bin/parallel-rsync-repos \
/home/git/transfer-logs/success-$(date +%s).log \
/home/git/repositories \
/mnt/gitlab/repositories
`
```
### Parallel rsync only for repositories with recent activity
Suppose you have already done one sync that started after 2015-10-1 12:00 UTC.
Then you might only want to sync repositories that were changed via GitLab
_after_ that time. You can use the 'SINCE' variable to tell 'rake
gitlab:list_repos' to only print repositories with recent activity.
```
# Omnibus
sudo gitlab-rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\
sudo -u git \
/usr/bin/env JOBS=10 \
/opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \
success-$(date +%s).log \
/var/opt/gitlab/git-data/repositories \
/mnt/gitlab/repositories
# Source
cd /home/git/gitlab
sudo -u git -H bundle exec rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\
sudo -u git -H \
/usr/bin/env JOBS=10 \
bin/parallel-rsync-repos \
success-$(date +%s).log \
/home/git/repositories \
/mnt/gitlab/repositories
```
This document was moved to [administration/operations/moving_repositories](../administration/operations/moving_repositories.md).
# Sidekiq MemoryKiller
The GitLab Rails application code suffers from memory leaks. For web requests
this problem is made manageable using
[unicorn-worker-killer](https://github.com/kzk/unicorn-worker-killer) which
restarts Unicorn worker processes in between requests when needed. The Sidekiq
MemoryKiller applies the same approach to the Sidekiq processes used by GitLab
to process background jobs.
Unlike unicorn-worker-killer, which is enabled by default for all GitLab
installations since GitLab 6.4, the Sidekiq MemoryKiller is enabled by default
_only_ for Omnibus packages. The reason for this is that the MemoryKiller
relies on Runit to restart Sidekiq after a memory-induced shutdown and GitLab
installations from source do not all use Runit or an equivalent.
With the default settings, the MemoryKiller will cause a Sidekiq restart no
more often than once every 15 minutes, with the restart causing about one
minute of delay for incoming background jobs.
## Configuring the MemoryKiller
The MemoryKiller is controlled using environment variables.
- `SIDEKIQ_MEMORY_KILLER_MAX_RSS`: if this variable is set, and its value is
greater than 0, then after each Sidekiq job, the MemoryKiller will check the
RSS of the Sidekiq process that executed the job. If the RSS of the Sidekiq
process (expressed in kilobytes) exceeds SIDEKIQ_MEMORY_KILLER_MAX_RSS, a
delayed shutdown is triggered. The default value for Omnibus packages is set
[in the omnibus-gitlab
repository](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/attributes/default.rb).
- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defaults 900 seconds (15 minutes). When
a shutdown is triggered, the Sidekiq process will keep working normally for
another 15 minutes.
- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defaults to 30 seconds. When the grace
time has expired, the MemoryKiller tells Sidekiq to stop accepting new jobs.
Existing jobs get 30 seconds to finish. After that, the MemoryKiller tells
Sidekiq to shut down, and an external supervision mechanism (e.g. Runit) must
restart Sidekiq.
- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_SIGNAL`: defaults to `SIGKILL`. The name of
the final signal sent to the Sidekiq process when we want it to shut down.
This document was moved to [administration/operations/sidekiq_memory_killer](../administration/operations/sidekiq_memory_killer.md).
# Understanding Unicorn and unicorn-worker-killer
## Unicorn
GitLab uses [Unicorn](http://unicorn.bogomips.org/), a pre-forking Ruby web
server, to handle web requests (web browsers and Git HTTP clients). Unicorn is
a daemon written in Ruby and C that can load and run a Ruby on Rails
application; in our case the Rails application is GitLab Community Edition or
GitLab Enterprise Edition.
Unicorn has a multi-process architecture to make better use of available CPU
cores (processes can run on different cores) and to have stronger fault
tolerance (most failures stay isolated in only one process and cannot take down
GitLab entirely). On startup, the Unicorn 'master' process loads a clean Ruby
environment with the GitLab application code, and then spawns 'workers' which
inherit this clean initial environment. The 'master' never handles any
requests, that is left to the workers. The operating system network stack
queues incoming requests and distributes them among the workers.
In a perfect world, the master would spawn its pool of workers once, and then
the workers handle incoming web requests one after another until the end of
time. In reality, worker processes can crash or time out: if the master notices
that a worker takes too long to handle a request it will terminate the worker
process with SIGKILL ('kill -9'). No matter how the worker process ended, the
master process will replace it with a new 'clean' process again. Unicorn is
designed to be able to replace 'crashed' workers without dropping user
requests.
This is what a Unicorn worker timeout looks like in `unicorn_stderr.log`. The
master process has PID 56227 below.
```
[2015-06-05T10:58:08.660325 #56227] ERROR -- : worker=10 PID:53009 timeout (61s > 60s), killing
[2015-06-05T10:58:08.699360 #56227] ERROR -- : reaped #<Process::Status: pid 53009 SIGKILL (signal 9)> worker=10
[2015-06-05T10:58:08.708141 #62538] INFO -- : worker=10 spawned pid=62538
[2015-06-05T10:58:08.708824 #62538] INFO -- : worker=10 ready
```
### Tunables
The main tunables for Unicorn are the number of worker processes and the
request timeout after which the Unicorn master terminates a worker process.
See the [omnibus-gitlab Unicorn settings
documentation](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/doc/settings/unicorn.md)
if you want to adjust these settings.
## unicorn-worker-killer
GitLab has memory leaks. These memory leaks manifest themselves in long-running
processes, such as Unicorn workers. (The Unicorn master process is not known to
leak memory, probably because it does not handle user requests.)
To make these memory leaks manageable, GitLab comes with the
[unicorn-worker-killer gem](https://github.com/kzk/unicorn-worker-killer). This
gem [monkey-patches](https://en.wikipedia.org/wiki/Monkey_patch) the Unicorn
workers to do a memory self-check after every 16 requests. If the memory of the
Unicorn worker exceeds a pre-set limit then the worker process exits. The
Unicorn master then automatically replaces the worker process.
This is a robust way to handle memory leaks: Unicorn is designed to handle
workers that 'crash' so no user requests will be dropped. The
unicorn-worker-killer gem is designed to only terminate a worker process _in
between requests_, so no user requests are affected.
This is what a Unicorn worker memory restart looks like in unicorn_stderr.log.
You see that worker 4 (PID 125918) is inspecting itself and decides to exit.
The threshold memory value was 254802235 bytes, about 250MB. With GitLab this
threshold is a random value between 200 and 250 MB. The master process (PID
117565) then reaps the worker process and spawns a new 'worker 4' with PID
127549.
```
[2015-06-05T12:07:41.828374 #125918] WARN -- : #<Unicorn::HttpServer:0x00000002734770>: worker (pid: 125918) exceeds memory limit (256413696 bytes > 254802235 bytes)
[2015-06-05T12:07:41.828472 #125918] WARN -- : Unicorn::WorkerKiller send SIGQUIT (pid: 125918) alive: 23 sec (trial 1)
[2015-06-05T12:07:42.025916 #117565] INFO -- : reaped #<Process::Status: pid 125918 exit 0> worker=4
[2015-06-05T12:07:42.034527 #127549] INFO -- : worker=4 spawned pid=127549
[2015-06-05T12:07:42.035217 #127549] INFO -- : worker=4 ready
```
One other thing that stands out in the log snippet above, taken from
GitLab.com, is that 'worker 4' was serving requests for only 23 seconds. This
is a normal value for our current GitLab.com setup and traffic.
The high frequency of Unicorn memory restarts on some GitLab sites can be a
source of confusion for administrators. Usually they are a [red
herring](https://en.wikipedia.org/wiki/Red_herring).
This document was moved to [administration/operations/unicorn](../administration/operations/unicorn.md).
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment