BigW Consortium Gitlab

  1. 19 Nov, 2015 12 commits
    • Use a JOIN in IssuableFinder#by_project · 8591cc02
      Yorick Peterse authored
      When using IssuableFinder/IssuesFinder to find issues for multiple
      projects it's more efficient to use a JOIN + a "WHERE project_id IN"
      condition opposed to running a sub-query.
      
      This change means that when finding issues without labels we're now
      using the following SQL:
      
          SELECT issues.*
          FROM issues
          JOIN projects ON projects.id = issues.project_id
      
          LEFT JOIN label_links ON label_links.target_type = 'Issue'
                                AND label_links.target_id  = issues.id
      
          WHERE (
              projects.id IN (...)
              OR projects.visibility_level IN (20, 10)
          )
          AND issues.state IN ('opened','reopened')
          AND label_links.id IS NULL
          ORDER BY issues.id DESC;
      
      instead of:
      
          SELECT issues.*
          FROM issues
          LEFT JOIN label_links ON label_links.target_type = 'Issue'
                                AND label_links.target_id  = issues.id
      
          WHERE issues.project_id IN (
              SELECT id
              FROM projects
              WHERE id IN (...)
              OR visibility_level IN (20,10)
          )
          AND issues.state IN ('opened','reopened')
          AND label_links.id IS NULL
          ORDER BY issues.id DESC;
      
      The big benefit here is that in the last case PostgreSQL can't properly
      use all available indexes. In particular it ends up performing a
      sequence scan on the "label_links" table (processing around 290 000
      rows). The new query is roughly 2x as fast as the old query.
    • Memoize IssuableFinder#projects · e9cd58f5
      Yorick Peterse authored
      Since this method's returned data doesn't change between calls on the
      same IssuableFinder instance we can just memoize this similar to the
      "project" method.
    • Added benchmark for IssuesFinder · 45840426
      Yorick Peterse authored
    • Added index on projects.visibility_level · d63da890
      Yorick Peterse authored
    • Added index on issues.state · 50f65416
      Yorick Peterse authored
      This field is queried when filtering issues and due to the lack of an
      index would end up triggering a sequence scan.
    • Merge branch 'atom-feed-latest-update' into 'master' · a42d469a
      Yorick Peterse authored
      Improve performance of user profiles, finding groups, and finding projects
      
      This MR improves the following:
      
      * Rendering of profile pages and Atom feeds
      * Finding groups (using GroupsFinder & friends)
      * Finding projects (using ProjectsFinder & friends)
      
      Initially this MR was intended to only improve rendering of Atom feeds, but over time other fixes were introduced as well as the same code was the cause of all these problems.
      
      See merge request !1790
    • Merge branch 'check_if_it_should_be_archived_in_backup' into 'master' · 5b302854
      Dmitriy Zaporozhets authored
      Check which folders and archives should be packed before passing to tar command.
      
      If user uses backup task with SKIP and skips one of the archives listed(uploads, builds, artifacts) backup create will give an error: `Cannot stat: No such file or directory`.
      
      This MR fixes that by checking for skipped items.
      Additionally, compact everything to avoid `TypeError: no implicit conversion of nil into String` errors.
      
      See merge request !1824
    • Fix CHANGELOG · caf1c9d8
      Kamil Trzcinski authored
  2. 18 Nov, 2015 28 commits
    • Use "GitLab.com" instead of "gitlab.com" · efd5d937
      Yorick Peterse authored
    • USe reject. · bf3c0aaf
      Marin Jankovski authored
    • Added an index on namespaces.public · ee2739e6
      Yorick Peterse authored
    • Merge branch 'fix-diff-stats-ui' into 'master' · 61867abe
      Dmitriy Zaporozhets authored
      Fix huge line height for diff files list
      Signed-off-by: 's avatarDmitriy Zaporozhets <dmitriy.zaporozhets@gmail.com>
      
      See merge request !1826
    • Merge branch 'ce-mirror-backport' into 'master' · c7fde6a0
      Douwe Maan authored
      Backport relevant changes from gitlab-org/gitlab-ee!51
      
      To do:
      
      - [x] Update gitlab-shell
      
      See merge request !1822
    • Return internal projects in PersonalProjectsFinder · f486b06c
      Yorick Peterse authored
      When getting the projects of a user we should get the public _and_
      internal projects, not just the public ones.
    • Fix UNION syntax for MySQL · 9eefae69
      Yorick Peterse authored
      Apparently MySQL doesn't support this syntax:
      
          (...) UNION (...)
      
      instead it only supports:
      
          ...
          UNION
          ...
    • Align hash literals to keep Rubocop happy · cc11c44b
      Yorick Peterse authored
    • Specs that failed before the fix. · d9c4625c
      Marin Jankovski authored
    • Don't pluck project IDs in User#owned_projects · 26482bdd
      Yorick Peterse authored
      This won't work efficiently if you happen to have a lot of projects.
    • Apply CI scope changes to the User model · 73f302ed
      Yorick Peterse authored
      These changes are based on those from commit
      03f5ff75, except they use a UNION
      instead of plucking IDs into memory.
    • Refactor UsersController to not kill the database · fbdf3767
      Yorick Peterse authored
      Previously this controller would in multiple places load tons (read:
      around 65000) project and/or group IDs into memory. These changes in
      combination with the previous commits significantly cut down loading
      times of user profile pages and the Atom feeds of users.
    • Refactor User#authorized_groups/projects · e116a356
      Yorick Peterse authored
      These methods no longer include public groups/projects (that don't
      belong to the actual user) as this is handled by the various finder
      classes now. This also removes the need for passing extra arguments.
      
      Note that memoizing was removed _explicitly_. For whatever reason doing
      so messes up the users controller to a point where it claims a certain
      user does _not_ have access to certain groups/projects when it does have
      access. Existing code shouldn't be affected as these methods are only
      called in ways that they'd run queries anyway (e.g. a combination of
      "any?" and "each" which would run 2 queries regardless of memoizing).
    • Added Project.visible_to_user · a4fc8112
      Yorick Peterse authored
      This method can be used to filter projects to those visible to a given
      user.
    • Group methods for filtering public/visible groups · a74d6d20
      Yorick Peterse authored
      These methods will be used to get a list of groups, optionally
      restricted to only those visible to a given user.
    • Added Event.limit_recent · 01620dd7
      Yorick Peterse authored
      This will be used to move some querying logic from the users controller
      to the Event model (where it belongs).
    • Refactor ProjectsFinder to not pluck IDs · fbcf3bd3
      Yorick Peterse authored
      This class now uses a UNION (when needed) instead of plucking tens of
      thousands of project IDs into memory. The tests have also been
      re-written to ensure all different use cases are tested properly
      (assuming I didn't forget any cases).
      
      The finder has also been broken up into 3 different finder classes:
      
      * ContributedProjectsFinder: class for getting the projects a user
        contributed to.
      * PersonalProjectsFinder: class for getting the personal projects of a
        user.
      * ProjectsFinder: class for getting generic projects visible to a given
        user.
      
      Previously a lot of the logic of these finders was handled directly in
      the users controller.
    • Refactoed GroupsFinder into two separate classes · 2110247f
      Yorick Peterse authored
      In the previous setup the GroupsFinder class had two distinct tasks:
      
      1. Finding the projects user A could see
      2. Finding the projects of user A that user B could see
      
      Task two was actually handled outside of the GroupsFinder (in the
      UsersController) by restricting the returned list of groups to those the
      viewed user was a member of. Moving all this logic into a single finder
      proved to be far too complex and confusing, hence there are now two
      finders:
      
      * GroupsFinder: for finding groups a user can see
      * JoinedGroupsFinder: for finding groups that user A is a member of,
        restricted to either public groups or groups user B can also see.
    • Refactor getting user groups/projects/contributions · 5fcd9986
      Yorick Peterse authored
      This new setup no longer loads any IDs into memory using "pluck",
      instead using SQL UNIONs to merge the various datasets together. This
      results in greatly improved query performance as well as a reduction of
      memory usage.
      
      The old setup was in particular problematic when requesting the
      authorized projects _including_ public/internal projects as this would
      result in roughly 65000 project IDs being loaded into memory. These IDs
      would in turn be passed to other queries.
    • Prefix table names for User UNIONs · bfd9855a
      Yorick Peterse authored
    • Use SQL::Union for User#authorized_groups · 189c40c3
      Yorick Peterse authored
      This removes the need for plucking any IDs into Ruby.
    • Make it easier to re-apply default sort orders · 656d9ff6
      Yorick Peterse authored
      By moving the default sort order into a separate scope (and calling this
      from the default scope) we can more easily re-apply a default order
      without having to specify the exact column/ordering all over the place.
    • Use SQL::Union for User#authorized_projects · 028bd227
      Yorick Peterse authored
      This allows retrieving of the list of authorized projects using a single
      query, without having to load any IDs into Ruby. This in turn also means
      we can remove the method User#authorized_projects_id.
    • Added Gitlab::SQL::Union class · d769596a
      Yorick Peterse authored
      This class can be used to join multiple AcitveRecord::Relation objects
      together using a SQL UNION statement. ActiveRecord < 5.0 sadly doesn't
      support UNION and existing Gems out there don't handle prepared
      statements (e.g. they never incremented the variable bindings).
    • Faster way of obtaining latest event update time · 054f2f98
      Yorick Peterse authored
      Instead of using MAX(events.updated_at) we can simply sort the events in
      descending order by the "id" column and grab the first row. In other
      words, instead of this:
      
          SELECT max(events.updated_at) AS max_id
          FROM events
          LEFT OUTER JOIN projects   ON projects.id   = events.project_id
          LEFT OUTER JOIN namespaces ON namespaces.id = projects.namespace_id
          WHERE events.author_id IS NOT NULL
          AND events.project_id IN (13083);
      
      we can use this:
      
          SELECT events.updated_at AS max_id
          FROM events
          LEFT OUTER JOIN projects   ON projects.id   = events.project_id
          LEFT OUTER JOIN namespaces ON namespaces.id = projects.namespace_id
          WHERE events.author_id IS NOT NULL
          AND events.project_id IN (13083)
          ORDER BY events.id DESC
          LIMIT 1;
      
      This has the benefit that on PostgreSQL a backwards index scan can be
      used, which due to the "LIMIT 1" will at most process only a single row.
      This in turn greatly speeds up the process of grabbing the latest update
      time. This can be confirmed by looking at the query plans. The first
      query produces the following plan:
      
          Aggregate  (cost=43779.84..43779.85 rows=1 width=12) (actual time=2142.462..2142.462 rows=1 loops=1)
            ->  Index Scan using index_events_on_project_id on events  (cost=0.43..43704.69 rows=30060 width=12) (actual time=0.033..2138.086 rows=32769 loops=1)
                  Index Cond: (project_id = 13083)
                  Filter: (author_id IS NOT NULL)
          Planning time: 1.248 ms
          Execution time: 2142.548 ms
      
      The second query in turn produces the following plan:
      
          Limit  (cost=0.43..41.65 rows=1 width=16) (actual time=1.394..1.394 rows=1 loops=1)
            ->  Index Scan Backward using events_pkey on events  (cost=0.43..1238907.96 rows=30060 width=16) (actual time=1.394..1.394 rows=1 loops=1)
                  Filter: ((author_id IS NOT NULL) AND (project_id = 13083))
                  Rows Removed by Filter: 2104
          Planning time: 0.166 ms
          Execution time: 1.408 ms
      
      According to the above plans the 2nd query is around 1500 times faster.
      However, re-running the first query produces timings of around 80 ms,
      making the 2nd query "only" around 55 times faster.