Josh's Blog

GitLab CE Upgrade Woes

One of the first things I did when I joined my current employer was stand up a GitLab instance for my work. The organization at that point had not standardized on a code versioning technology, outside of their occasional use of RCS (yep). Since I make heavy use of puppet in my environments, and I leverage r10k to automate my module deployments, I made getting some form of centralized, on-premise, git repo in place.

Fast forward a few months, and I now have a GitLab instance in place hosting my puppet modules, various configuration files, operations scripts, and even some random in-house applications that previously rested in some corner of a random server somewhere.

This week I was asked to share one of these projects with one of our developers. “No problem!”, I thought. I’ll establish a gitlab account for him, send the link to the repo, and we’ll be on our way. Unfortunately I got the itch to upgrade the GitLab platform first. Note that I now also use the GitLab Continuous Integration system as well, so there are a few components to consider when performing any update. Especially when I have observed odd issues between the gitlab server and it’s CI instance when the two are not running the same release version.

Upgrading GitLab

The first step to this process was to upgrade the main GitLab instance. I run this on a CentOS 6.6 VM (we use VMware, but that doesn’t really matter). The upgrade procedure is straightforward:

1
# yum install gitlab-ce

That right there will pull down the updated rpms, install, and then execute the gitlab reconfigure process, which rebuilds the stack. Note that I was upgrading to GitLab 7.13.5. This process ultimately failed during the reconfigure step, where one of gitlab’s chef recipes attempts to apply updated sysctl settings. Here’s the snippet of error I saw:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
================================================================================
Error executing action `run` on resource 'execute[sysctl]'
================================================================================

Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '255'
---- Begin output of /sbin/sysctl -p /etc/sysctl.conf ----
STDOUT: net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.core.somaxconn = 1024
STDERR: error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
---- End output of /sbin/sysctl -p /etc/sysctl.conf ----
Ran /sbin/sysctl -p /etc/sysctl.conf returned 255

Resource Declaration:
---------------------
# In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/unicorn.rb

 39:   execute "sysctl" do
 40:     command "/sbin/sysctl -p /etc/sysctl.conf"
 41:     action :nothing
 42:   end
 43: 

Compiled Resource:
------------------
# Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/unicorn.rb:39:in `from_file'

execute("sysctl") do
  action :nothing
  retries 0
  retry_delay 2
  default_guard_interpreter :execute
  command "/sbin/sysctl -p /etc/sysctl.conf"
  backup 5
  returns 0
  declared_type :execute
  cookbook_name "gitlab"
  recipe_name "unicorn"
end

[2015-08-18T16:55:54+00:00] INFO: Running queued delayed notifications before re-raising exception
[2015-08-18T16:55:54+00:00] INFO: template[/var/opt/gitlab/gitlab-rails/config.ru] sending restart action to service[unicorn] (delayed)
[2015-08-18T16:55:54+00:00] INFO: service[unicorn] restarted
[2015-08-18T16:55:54+00:00] INFO: template[/var/opt/gitlab/gitlab-rails/etc/gitlab.yml] sending run action to execute[clear the gitlab-rails cache] (delayed)
[2015-08-18T16:56:08+00:00] INFO: execute[clear the gitlab-rails cache] ran successfully
[2015-08-18T16:56:08+00:00] INFO: remote_file[/var/opt/gitlab/gitlab-rails/VERSION] sending run action to bash[migrate gitlab-rails database] (delayed)
[2015-08-18T16:56:17+00:00] INFO: bash[migrate gitlab-rails database] ran successfully
[2015-08-18T16:56:17+00:00] INFO: template[/var/opt/gitlab/gitlab-ci/etc/application.yml] sending run action to execute[clear the gitlab-ci cache] (delayed)
[2015-08-18T16:56:21+00:00] INFO: execute[clear the gitlab-ci cache] ran successfully
[2015-08-18T16:56:21+00:00] INFO: remote_file[/var/opt/gitlab/gitlab-ci/VERSION] sending run action to bash[migrate gitlab-ci database] (delayed)
[2015-08-18T16:56:25+00:00] INFO: bash[migrate gitlab-ci database] ran successfully
[2015-08-18T16:56:25+00:00] ERROR: Running exception handlers
[2015-08-18T16:56:25+00:00] ERROR: Exception handlers complete
[2015-08-18T16:56:25+00:00] FATAL: Stacktrace dumped to /opt/gitlab/embedded/cookbooks/cache/chef-stacktrace.out
[2015-08-18T16:56:25+00:00] ERROR: Found 1 errors, they are stored in the backtrace
[2015-08-18T16:56:25+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
warning: %posttrans(gitlab-ce-7.13.5-ce.0.el6.x86_64) scriptlet failed, exit status 1
  Verifying  : gitlab-ce-7.13.5-ce.0.el6.x86_64                                                                                                1/2 
  Verifying  : gitlab-ce-7.12.0~omnibus.1-1.x86_64                                                                                             2/2 

Updated:
  gitlab-ce.x86_64 0:7.13.5-ce.0.el6                                                                                                               

Complete!

With this the gitlab reconfigure process failed, and the gitlab system was left in an offline state. GitLab’s issue tracker has a case open regarding this issue, which you can find here. Basically the upgrade inserts the following entries into /etc/sysctl.conf:

1
2
3
4
# Disable netfilter on bridges.
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

….followed by attempting to apply these new settings to the system. The problem is that these values apply to the bridge module, which I do not use. If you do not have this bridge module in place, sysctl errors, and the reconfigure process notes the failure, and in turn fails. There are a variety of ways to deal with this, referenced here by the GitLab team. One is to modify the chef recipe to exclude those new sysctl values. The other is to simply load the bridge module and re-run the reconfigure, which is what I did.

1
2
3
4
5
[jpreston@gitlab ~]$ sudo modprobe bridge
[jpreston@gitlab ~]$ sudo lsmod|grep bridge
bridge                 82775  0 
stp                     2218  1 bridge
llc                     5578  2 bridge,stp

After that, I executed the reconfigure and the process completed successfully.

1
2
3
4
5
6
7
[jpreston@gitlab ~]$ sudo gitlab-ctl reconfigure
....
....
Running handlers:
Running handlers complete
Chef Client finished, 5/231 resources updated in 6.093207744 seconds
gitlab Reconfigured!

GitLab was now updated and ready for use. On to upgrading the CI system.

Upgrading the GitLab-CI Platform

I should note that we run the GitLab Continuous Integration component in a separate VM from the main GitLab code repository. The procedure starts out the same:

1
# yum install gitlab-ce

Again, during the gitlab reconfigure stage, the process errors out. This time complaining about missing the git group.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
Recipe: gitlab::redis
  * group[gitlab-redis] action create (up to date)
  * user[gitlab-redis] action create (up to date)
  * directory[/var/opt/gitlab/redis] action create
    * cannot determine group id for 'git', does the group exist on this system?
    ================================================================================
    Error executing action `create` on resource 'directory[/var/opt/gitlab/redis]'
    ================================================================================
    
    Chef::Exceptions::GroupIDNotFound
    ---------------------------------
    cannot determine group id for 'git', does the group exist on this system?
    
    Resource Declaration:
    ---------------------
    # In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/definitions/redis_service.rb
    
     38:   directory redis_dir do
     39:     owner redis_user
     40:     group params[:socket_group]
     41:     mode "0750"
     42:   end
     43: 
    
    Compiled Resource:
    ------------------
    # Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/definitions/redis_service.rb:38:in `block in from_file'
    
    directory("/var/opt/gitlab/redis") do
      params {:socket_group=>"git", :name=>"redis"}
      action :create
      retries 0
      retry_delay 2
      default_guard_interpreter :default
      path "/var/opt/gitlab/redis"
      declared_type :directory
      cookbook_name "gitlab"
      recipe_name "redis"
      owner "gitlab-redis"
      group "git"
      mode "0750"
    end
    

Running handlers:
[2015-08-18T17:22:33+00:00] ERROR: Running exception handlers
Running handlers complete
[2015-08-18T17:22:33+00:00] ERROR: Exception handlers complete
Chef Client failed. 1 resources updated in 3.824717689 seconds
[2015-08-18T17:22:33+00:00] FATAL: Stacktrace dumped to /opt/gitlab/embedded/cookbooks/cache/chef-stacktrace.out
[2015-08-18T17:22:33+00:00] ERROR: Found 1 errors, they are stored in the backtrace
[2015-08-18T17:22:33+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

I took a look, and there wasn’t a git user or group on my system, most likely due to me only installing the CI component of the GitLab omnibus package. The simple resolution was to just add the git user manually:

1
2
3
4
5
6
[root@gitlab-ci jpreston]# id git
id: git: No such user

[root@gitlab-ci jpreston]# useradd git
[root@gitlab-ci jpreston]# id git
uid=502(git) gid=502(git) groups=502(git)

Now manually execute the reconfigure

1
2
3
4
5
6
7
8
9
10
[root@gitlab-ci jpreston]# gitlab-ctl reconfigure
Starting Chef Client, version 12.4.0.rc.2

.....
.....

Running handlers:
Running handlers complete
Chef Client finished, 7/155 resources updated in 6.390756993 seconds
gitlab Reconfigured!

Yay, it all appears to have completed without issue! I hop back on to the gitlab-ci web interface, but find that the synchronization with the gitlab system just spinning, never importing the project/repo lists. After scratching my head for a bit, checking the OAuth settings between the two systems to ensure that it was still in place, I simply logged out of the CI instance, then jumped back in, and the list synced. Silly that it took me about an hour before I tried that.

Cool. Now that the projects are in place, I decide to poke around and look at some of the previous commits. I click on the build ID link and get met with an Error 500 page. Each link fails similarly. For the next couple hours, I tail the gitlab logs, I remove and add projects from the CI interface. I try everything, but nothing seems to work. I do see one error in the logs whenever I attempted to test the build process:

1
2
3
4
5
6
7
8
9
10
ActiveRecord::UnknownAttributeError (unknown attribute: allow_failure):
  app/models/commit.rb:108:in `block in create_builds_for_type'
  app/models/commit.rb:107:in `map'
  app/models/commit.rb:107:in `create_builds_for_type'
  app/models/commit.rb:136:in `block in create_builds'
  app/models/commit.rb:135:in `each'
  app/models/commit.rb:135:in `any?'
  app/models/commit.rb:135:in `create_builds'
  app/services/create_commit_service.rb:43:in `execute'
  app/controllers/projects_controller.rb:90:in `build'

Researching the “unknown attribute: allow_failure” error got me one result in google. Fortunately, the conversation between the submitter and the developer led me down the right path: check the database migration. To see the status of a gitlab database migration post upgrade, one would run:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
[root@gitlab-ci nginx]# gitlab-ci-rake db:migrate:status

database: gitlab_ci_production

 Status   Migration ID    Migration Name
--------------------------------------------------
   up     20121004140911  Create projects
   up     20121004165038  Create builds
   up     20121101091638  Devise create users
   up     20121101121639  Add token to project
   up     20121106143042  Add ref functionality
   up     20121108160657  Add gitlab url to project
   up     20121108174237  Add started at to build
   up     20121115094430  Increate trace colunm limit
   up     20121115132252  Add tmp file to build
   up     20121116144312  Add before sha to build
   up     20121224092350  Add schedule to projects
   up     20130114153451  Change schedule invertal
   up     20130129121754  Add public flag to project
   up     20130531112551  Add data field to build
   up     20130531122131  Remove path field from project
   up     20130531125905  Create runners
   up     20130531133603  Add runner id to build
   up     20130603130920  Remove users table
   up     20130603144030  Add more fields to project
   up     20130603144959  Create runner projects
   up     20130603161449  Add project gitlab id to project
   up     20130628142321  Add index project id to builds
   up     20130705171042  Add description to runner
   up     20130710164015  Add db index
   up     20130816201200  Change push data limit
   up     20130906175737  Add sessions table
   up     20131023103430  Add allow git fetch to project
   up     20131120155545  Add email notification fields to project
   up     20140130121538  Rename project fields
   up     20140222210357  Create web hook
   up     20140506091853  Remove public key from runner
   up     20140823225019  Create commits from builds
   up     20140909142245  Add skip refs to projects
   up     20141001125939  Add coverage parser
   up     20141001132129  Add coverage to build
   up     20141028162820  Add sha index to build
   up     20141031114419  Migrate build to commits
   up     20141031141708  Add commit indicies
   up     20141103135037  Add parallel to build
   up     20141103151359  Add commands to build
   up     20141103162726  Add job id to build
   up     20141104130024  Migrate jobs
   up     20141104153744  Add name to job
   up     20141127153745  Remove scripts from project
   up     20141201153755  Remove invalid build
   up     20141204133321  Create service
   up     20150111062026  Add filter to jobs
   up     20150113001832  Acts as taggable on migration.acts as taggable on engine
   up     20150113001833  Add missing unique indices.acts as taggable on engine
   up     20150113001834  Add taggings counter cache to tags.acts as taggable on engine
   up     20150113001835  Add missing taggable index.acts as taggable on engine
   up     20150204001035  Build missing services
   up     20150226001835  Add job type to job
   up     20150306131416  Add contacted at to runner
   up     20150306135341  Add active to runner
   up     20150310001733  Rename committer to pusher
   up     20150320001810  Create event table
   up     20150324001123  Add settings for shared runners
   up     20150324001227  Migrate shared runners
   up     20150330001111  Disable shared runners
   up     20150415142013  Add deleted at to jobs
   up     20150417000045  Cleanup the build model
   up     20150504010150  Migrate url to path
   up     20150504010250  Rename gitlab url to path
   up     20150508011360  Add info fields to runner
   up     20150528011001  Add fields to builds
   up     20150528011012  Move job name to build
   up     20150529012113  Add tag to commits
   up     20150601043220  Add yaml to projects
   up     20150601043231  Migrate jobs to yaml
   up     20150602000240  Change default build timeout
   up     20150605002131  Create variables
   up     20150616001155  Add errors to commit
  down    20150630091815  Add options to build
  down    20150703125244  Add encrypted value to variables
  down    20150703125325  Encrypt variables
  down    20150707134456  Add allow failure to builds
  down    20150710113836  Add job type to builds
  down    20150710113851  Migrate deploy to job type for builds
  down    20150721204649  Truncate sessions

Note that towards the end of the output, there are about seven “down” entries, one of which references “allow failure”….the same verbiage found in my error message. To run the migration, I executed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[root@gitlab-ci nginx]# gitlab-ci-rake db:migrate
== 20150630091815 AddOptionsToBuild: migrating ================================
-- add_column(:builds, :options, :text)
   -> 0.0334s
== 20150630091815 AddOptionsToBuild: migrated (0.0335s) =======================

== 20150703125244 AddEncryptedValueToVariables: migrating =====================
-- add_column(:variables, :encrypted_value, :text)
   -> 0.0013s
-- add_column(:variables, :encrypted_value_salt, :string)
   -> 0.0010s
-- add_column(:variables, :encrypted_value_iv, :string)
   -> 0.0003s
== 20150703125244 AddEncryptedValueToVariables: migrated (0.0028s) ============

== 20150703125325 EncryptVariables: migrating =================================
== 20150703125325 EncryptVariables: migrated (0.0426s) ========================

== 20150707134456 AddAllowFailureToBuilds: migrating ==========================
-- add_column(:builds, :allow_failure, :boolean, {:default=>false, :null=>false})
   -> 0.1768s
== 20150707134456 AddAllowFailureToBuilds: migrated (0.1769s) =================

== 20150710113836 AddJobTypeToBuilds: migrating ===============================
-- add_column(:builds, :job_type, :string)
   -> 0.0008s
== 20150710113836 AddJobTypeToBuilds: migrated (0.0008s) ======================

== 20150710113851 MigrateDeployToJobTypeForBuilds: migrating ==================
-- execute("UPDATE builds SET job_type='test' WHERE NOT deploy")
   -> 0.0019s
-- execute("UPDATE builds SET job_type='deploy' WHERE deploy")
   -> 0.0032s
== 20150710113851 MigrateDeployToJobTypeForBuilds: migrated (0.0052s) =========

== 20150721204649 TruncateSessions: migrating =================================
-- execute("DELETE FROM sessions")
   -> 0.3063s
== 20150721204649 TruncateSessions: migrated (0.3064s) ========================

Restarted the gitlab services

1
2
3
4
5
6
7
8
[root@gitlab-ci nginx]# gitlab-ctl restart
ok: run: ci-redis: (pid 31667) 0s
ok: run: ci-sidekiq: (pid 31674) 0s
ok: run: ci-unicorn: (pid 31683) 0s
ok: run: logrotate: (pid 31685) 0s
ok: run: nginx: (pid 31699) 0s
ok: run: postgresql: (pid 31705) 0s
ok: run: redis: (pid 31713) 0s

….and with that, all my build links worked. Commits to git triggered gitlab-ci, which performed it’s scripted procedure as expected. Everything works now.

I hope that this helps anyone else that finds themselves in the same position.

-Josh