BigW Consortium Gitlab

  1. 03 Nov, 2015 1 commit
  2. 30 Oct, 2015 2 commits
    • Adjusted ips/sec for find_by_any_email benchmarks · 6d3068be
      Yorick Peterse authored
      While these benchmarks run at roughly 1500 i/sec setting the threshold
      to 1000 leaves some room for deviations (e.g. due to different DB
      setups).
    • Improve performance of User.find_by_any_email · 49c081b9
      Yorick Peterse authored
      This query used to rely on a JOIN, effectively producing the following
      SQL:
      
          SELECT users.*
          FROM users
          LEFT OUTER JOIN emails ON emails.user_id = users.id
          WHERE (users.email = X OR emails.email = X)
          LIMIT 1;
      
      The use of a JOIN means having to scan over all Emails and users, join
      them together and then filter out the rows that don't match the criteria
      (though this step may be taken into account already when joining).
      
      In the new setup this query instead uses a sub-query, producing the
      following SQL:
      
          SELECT *
          FROM users
          WHERE id IN (select user_id FROM emails WHERE email = X)
          OR email = X
          LIMIT 1;
      
      This query has the benefit that it:
      
      1. Doesn't have to JOIN any rows
      2. Only has to operate on a relatively small set of rows from the
         "emails" table.
      
      Since most users will only have a handful of Emails associated
      (certainly not hundreds or even thousands) the size of the set returned
      by the sub-query is small enough that it should not become problematic.
      
      Performance of the old versus new version can be measured using the
      following benchmark:
      
          # Save this in ./bench.rb
          require 'benchmark/ips'
      
          email = 'yorick@gitlab.com'
      
          def User.find_by_any_email_old(email)
            user_table = arel_table
            email_table = Email.arel_table
      
            query = user_table.
              project(user_table[Arel.star]).
              join(email_table, Arel::Nodes::OuterJoin).
              on(user_table[:id].eq(email_table[:user_id])).
              where(user_table[:email].eq(email).or(email_table[:email].eq(email)))
      
            find_by_sql(query.to_sql).first
          end
      
          Benchmark.ips do |bench|
            bench.report 'original' do
              User.find_by_any_email_old(email)
            end
      
            bench.report 'optimized' do
              User.find_by_any_email(email)
            end
      
            bench.compare!
          end
      
      Running this locally using "bundle exec rails r bench.rb" produces the
      following output:
      
          Calculating -------------------------------------
                      original     1.000  i/100ms
                     optimized    93.000  i/100ms
          -------------------------------------------------
                      original     11.103  (± 0.0%) i/s -     56.000
                     optimized    948.713  (± 5.3%) i/s -      4.743k
      
          Comparison:
                     optimized:      948.7 i/s
                      original:       11.1 i/s - 85.45x slower
      
      In other words, the new setup is 85x faster compared to the old setup,
      at least when running this benchmark locally.
      
      For GitLab.com these improvements result in User.find_by_any_email
      taking only ~170 ms to run, instead of around 800 ms. While this is
      "only" an improvement of about 4.5 times (instead of 85x) it's still
      significantly better than before.
      
      Fixes #3242
  3. 29 Oct, 2015 1 commit
  4. 20 Oct, 2015 1 commit
  5. 19 Oct, 2015 1 commit
    • Improve performance of sorting milestone issues · 4ff75e31
      Yorick Peterse authored
      This cuts down the time it takes to sort issues of a milestone by about
      10x. In the previous setup the code would run a SQL query for every
      issue that had to be sorted. The new setup instead runs a single SQL
      query to update all the given issues at once.
      
      The attached benchmark used to run at around 60 iterations per second,
      using the new setup this hovers around 600 iterations per second. Timing
      wise a request to update a milestone with 40-something issues would take
      about 760 ms, in the new setup this only takes about 130 ms.
      
      Fixes #3066
  6. 15 Oct, 2015 2 commits
    • Improve ProjectTeam#max_member_access performance · 3025b711
      Yorick Peterse authored
      By comparing objects in Ruby we can greatly improve the performance of
      this method. In the worst case (should no data be eager loaded) this
      will run the same amount of queries as before, in the best case (when
      data _is_ eager loadeD) it requires no queries at all.
      
      The added benchmark used to produce around 273 iterations per second.
      With this commit this has been increased to almost 40 000 iterations per
      second: a speedup of roughly 145 times.
      
      Combined with eager loading Note associations this results in about 30
      queries less when viewing a single issue, this in turn cuts down the
      loading time by 30-40%.
    • Improve performance of User.by_login · 72f428c7
      Yorick Peterse authored
      Performance is improved in two steps:
      
      1. On PostgreSQL an expression index is used for checking lower(email)
         and lower(username).
      2. The check to determine if we're searching for a username or Email is
         moved to Ruby. Thanks to @haynes for suggesting and writing the
         initial implementation of this.
      
      Moving the check to Ruby makes this method an additional 1.5 times
      faster compared to doing the check in the SQL query.
      
      With performance being improved I've now also tweaked the amount of
      iterations required by the User.by_login benchmark. This method now runs
      between 900 and 1000 iterations per second.
  7. 08 Oct, 2015 1 commit
    • Revamp finding projects by namespaces · 03417456
      Yorick Peterse authored
      By using a JOIN we can remove the need for using 2 separate queries to
      find a project by its namespace. Combined with an index (only needed for
      PostgreSQL) this reduces the query time from ~245 ms (~520 ms for the
      first call) down to roughly 10 ms (~15 ms for the first call).
  8. 06 Oct, 2015 1 commit
  9. 05 Oct, 2015 1 commit
  10. 02 Oct, 2015 1 commit
    • Basic setup for an RSpec based benchmark suite · 19893a1c
      Yorick Peterse authored
      This benchmark suite uses benchmark-ips
      (https://github.com/evanphx/benchmark-ips) behind the scenes. Specs can
      be turned into benchmark specs by setting "benchmark" to "true" in the
      top-level describe block like so:
      
          describe SomeClass, benchmark: true do
      
          end
      
      Writing benchmarks can be done using custom RSpec matchers, for example:
      
          describe MaruTheCat, benchmark: true do
            describe '#jump_in_box' do
              it 'should run 1000 iterations per second' do
                maru = described_class.new
      
                expect { maru.jump_in_box }.to iterate_per_second(1000)
              end
            end
          end
      
      By default the "iterate_per_second" expectation requires a standard
      deviation under 30% (this is just an arbitrary default for now). You can
      change this by chaining "with_maximum_stddev" on the expectation:
      
          expect { maru.jump_in_box }.to iterate_per_second(1000)
            .with_maximum_stddev(10)
      
      This will change the expectation to require a maximum deviation of 10%.
      
      Alternatively you can use the it block style to write specs:
      
          describe MaruTheCat, benchmark: true do
            describe '#jump_in_box' do
              subject { -> { described_class.new } }
      
              it { is_expected.to iterate_per_second(1000) }
            end
          end
      
      Because "iterate_per_second" operates on a block, opposed to a static
      value, the "subject" method must return a Proc. This looks a bit goofy
      but I have been unable to find a nice way around this.