BigW Consortium Gitlab

  1. 18 Nov, 2015 8 commits
    • Use "GitLab.com" instead of "gitlab.com" · efd5d937
      Yorick Peterse authored
    • Don't pluck project IDs in User#owned_projects · 26482bdd
      Yorick Peterse authored
      This won't work efficiently if you happen to have a lot of projects.
    • Apply CI scope changes to the User model · 73f302ed
      Yorick Peterse authored
      These changes are based on those from commit
      03f5ff75, except they use a UNION
      instead of plucking IDs into memory.
    • Refactor User#authorized_groups/projects · e116a356
      Yorick Peterse authored
      These methods no longer include public groups/projects (that don't
      belong to the actual user) as this is handled by the various finder
      classes now. This also removes the need for passing extra arguments.
      
      Note that memoizing was removed _explicitly_. For whatever reason doing
      so messes up the users controller to a point where it claims a certain
      user does _not_ have access to certain groups/projects when it does have
      access. Existing code shouldn't be affected as these methods are only
      called in ways that they'd run queries anyway (e.g. a combination of
      "any?" and "each" which would run 2 queries regardless of memoizing).
    • Refactor getting user groups/projects/contributions · 5fcd9986
      Yorick Peterse authored
      This new setup no longer loads any IDs into memory using "pluck",
      instead using SQL UNIONs to merge the various datasets together. This
      results in greatly improved query performance as well as a reduction of
      memory usage.
      
      The old setup was in particular problematic when requesting the
      authorized projects _including_ public/internal projects as this would
      result in roughly 65000 project IDs being loaded into memory. These IDs
      would in turn be passed to other queries.
    • Prefix table names for User UNIONs · bfd9855a
      Yorick Peterse authored
    • Use SQL::Union for User#authorized_groups · 189c40c3
      Yorick Peterse authored
      This removes the need for plucking any IDs into Ruby.
    • Use SQL::Union for User#authorized_projects · 028bd227
      Yorick Peterse authored
      This allows retrieving of the list of authorized projects using a single
      query, without having to load any IDs into Ruby. This in turn also means
      we can remove the method User#authorized_projects_id.
  2. 16 Nov, 2015 1 commit
  3. 13 Nov, 2015 1 commit
  4. 30 Oct, 2015 4 commits
    • Use a subquery with IDs only for find_by_any_email · a9df7147
      Yorick Peterse authored
      This further improves performance of User.find_by_any_email and is
      roughly twice as fast as the previous UNION setup.
      
      Thanks again to @dlemstra for suggesting this.
    • Fixed UNION syntax for MySQL · bba46623
      Yorick Peterse authored
      MySQL doesn't support the previous syntax.
    • Use a UNION for User.find_by_any_email · 24c8f422
      Yorick Peterse authored
      This is significantly faster than using a sub-query, at least when run
      on the GitLab.com production database. The benchmarks are a lot slower
      now with these changes, most likely due to PostgreSQL choosing a
      different (and less efficient) plan based on the amount of data present
      in the test database.
      
      Thanks to @dlemstra for suggesting the use of a UNION.
    • Improve performance of User.find_by_any_email · 49c081b9
      Yorick Peterse authored
      This query used to rely on a JOIN, effectively producing the following
      SQL:
      
          SELECT users.*
          FROM users
          LEFT OUTER JOIN emails ON emails.user_id = users.id
          WHERE (users.email = X OR emails.email = X)
          LIMIT 1;
      
      The use of a JOIN means having to scan over all Emails and users, join
      them together and then filter out the rows that don't match the criteria
      (though this step may be taken into account already when joining).
      
      In the new setup this query instead uses a sub-query, producing the
      following SQL:
      
          SELECT *
          FROM users
          WHERE id IN (select user_id FROM emails WHERE email = X)
          OR email = X
          LIMIT 1;
      
      This query has the benefit that it:
      
      1. Doesn't have to JOIN any rows
      2. Only has to operate on a relatively small set of rows from the
         "emails" table.
      
      Since most users will only have a handful of Emails associated
      (certainly not hundreds or even thousands) the size of the set returned
      by the sub-query is small enough that it should not become problematic.
      
      Performance of the old versus new version can be measured using the
      following benchmark:
      
          # Save this in ./bench.rb
          require 'benchmark/ips'
      
          email = 'yorick@gitlab.com'
      
          def User.find_by_any_email_old(email)
            user_table = arel_table
            email_table = Email.arel_table
      
            query = user_table.
              project(user_table[Arel.star]).
              join(email_table, Arel::Nodes::OuterJoin).
              on(user_table[:id].eq(email_table[:user_id])).
              where(user_table[:email].eq(email).or(email_table[:email].eq(email)))
      
            find_by_sql(query.to_sql).first
          end
      
          Benchmark.ips do |bench|
            bench.report 'original' do
              User.find_by_any_email_old(email)
            end
      
            bench.report 'optimized' do
              User.find_by_any_email(email)
            end
      
            bench.compare!
          end
      
      Running this locally using "bundle exec rails r bench.rb" produces the
      following output:
      
          Calculating -------------------------------------
                      original     1.000  i/100ms
                     optimized    93.000  i/100ms
          -------------------------------------------------
                      original     11.103  (± 0.0%) i/s -     56.000
                     optimized    948.713  (± 5.3%) i/s -      4.743k
      
          Comparison:
                     optimized:      948.7 i/s
                      original:       11.1 i/s - 85.45x slower
      
      In other words, the new setup is 85x faster compared to the old setup,
      at least when running this benchmark locally.
      
      For GitLab.com these improvements result in User.find_by_any_email
      taking only ~170 ms to run, instead of around 800 ms. While this is
      "only" an improvement of about 4.5 times (instead of 85x) it's still
      significantly better than before.
      
      Fixes #3242
  5. 26 Oct, 2015 1 commit
  6. 17 Oct, 2015 1 commit
  7. 15 Oct, 2015 1 commit
    • Improve performance of User.by_login · 72f428c7
      Yorick Peterse authored
      Performance is improved in two steps:
      
      1. On PostgreSQL an expression index is used for checking lower(email)
         and lower(username).
      2. The check to determine if we're searching for a username or Email is
         moved to Ruby. Thanks to @haynes for suggesting and writing the
         initial implementation of this.
      
      Moving the check to Ruby makes this method an additional 1.5 times
      faster compared to doing the check in the SQL query.
      
      With performance being improved I've now also tweaked the amount of
      iterations required by the User.by_login benchmark. This method now runs
      between 900 and 1000 iterations per second.
  8. 05 Oct, 2015 3 commits
  9. 03 Oct, 2015 1 commit
  10. 02 Oct, 2015 1 commit
  11. 01 Oct, 2015 1 commit
  12. 29 Sep, 2015 2 commits
  13. 26 Sep, 2015 2 commits
  14. 15 Sep, 2015 1 commit
  15. 06 Sep, 2015 1 commit
  16. 27 Aug, 2015 1 commit
  17. 21 Aug, 2015 1 commit
  18. 20 Aug, 2015 1 commit
  19. 04 Aug, 2015 1 commit
  20. 02 Aug, 2015 1 commit
  21. 30 Jul, 2015 1 commit
  22. 29 Jul, 2015 1 commit
  23. 28 Jul, 2015 1 commit
  24. 23 Jul, 2015 1 commit
  25. 13 Jul, 2015 1 commit
  26. 10 Jul, 2015 1 commit