BigW Consortium Gitlab

  1. 15 May, 2016 1 commit
  2. 12 May, 2016 1 commit
    • Removed tracking of total method execution times · 945c5b3f
      Yorick Peterse authored
      Because method call timings are inclusive (that is, they include the
      time of any sub method calls) this would lead to the total method
      execution time often being far greater than the total transaction time.
      Because this is incredibly confusing it's best to simply _not_ track the
      total method execution time, after all it's not that useful to begin
      with.
      
      Fixes gitlab-org/gitlab-ce#17239
  3. 18 Apr, 2016 2 commits
  4. 11 Apr, 2016 1 commit
    • Store block timings as transaction values · 16926a67
      Yorick Peterse authored
      This makes it easier to query, simplifies the code, and makes it
      possible to figure out what transaction the data belongs to (simply
      because it's now stored _in_ the transaction).
      
      This new setup keeps track of both the real/wall time _and_ CPU time
      spent in a block, both measured using milliseconds (to keep all units
      the same).
  5. 08 Apr, 2016 2 commits
    • Instrument Rails cache code · c56f702e
      Yorick Peterse authored
      This allows us to track how much time of a transaction is spent in
      dealing with cached data.
    • Use more accurate timestamps for InfluxDB. · aa7cddc4
      Yorick Peterse authored
      This changes the timestamp of metrics to be more accurate/unique by
      using Time#to_f combined with a small random jitter value. This
      combination hopefully reduces the amount of collisions, though there's
      no way to fully prevent any from occurring.
      
      Fixes gitlab-com/operations#175
  6. 25 Jan, 2016 1 commit
    • Correct arity for instrumented methods w/o args · b74308c0
      Yorick Peterse authored
      This ensures that an instrumented method that doesn't take arguments
      reports an arity of 0, instead of -1.
      
      If Ruby had a proper method for finding out the required arguments of a
      method (e.g. Method#required_arguments) this would not have been an
      issue. Sadly the only two methods we have are Method#parameters and
      Method#arity, and both are equally painful to use.
      
      Fixes gitlab-org/gitlab-ce#12450
  7. 13 Jan, 2016 2 commits
    • Randomize metrics sample intervals · 057eb824
      Yorick Peterse authored
      Sampling data at a fixed interval means we can potentially miss data
      from events occurring between sampling intervals. For example, say we
      sample data every 15 seconds but Unicorn workers get killed after 10
      seconds. In this particular case it's possible to miss interesting data
      as the sampler will never get to actually submitting data.
      
      To work around this (at least for the most part) the sampling interval
      is randomized as following:
      
      1. Take the user specified sampling interval (15 seconds by default)
      2. Divide it by 2 (referred to as "half" below)
      3. Generate a range (using a step of 0.1) from -"half" to "half"
      4. Every time the sampler goes to sleep we'll grab the user provided
         interval and add a randomly chosen "adjustment" to it while making
         sure we don't pick the same value twice in a row.
      
      For a specified timeout of 15 this means the actual intervals can be
      anywhere between 7.5 and 22.5, but never can the same interval be used
      twice in a row.
      
      The rationale behind this change is that on dev.gitlab.org I'm sometimes
      seeing certain Gitlab::Git/Rugged objects being retained, but only for a
      few minutes every 24 hours. Knowing the code of Gitlab and how much
      memory it uses/leaks I suspect we're missing data due to workers getting
      terminated before the sampler can write its data to InfluxDB.
  8. 12 Jan, 2016 2 commits
    • Stop tracking call stacks for instrumented views · 355c341f
      Yorick Peterse authored
      Where a vew is called from doesn't matter as much. We already know what
      action they belong to and this is more than enough information. By
      removing the file/line number from the list of tags we should also be
      able to reduce the number of series stored in InfluxDB.
    • Track memory allocated during a transaction · 5679ee01
      Yorick Peterse authored
      This gives a very rough estimate of how much memory is allocated during
      a transaction. This only works reliably when using a single-threaded
      application server and a Ruby implementation with a GIL as otherwise
      memory allocated by other threads might skew the statistics. Sadly
      there's no way around this as Ruby doesn't provide a reliable way of
      gathering accurate object sizes upon allocation on a per-thread basis.
  9. 11 Jan, 2016 1 commit
    • Tag all transaction metrics with an "action" tag · 35b501f3
      Yorick Peterse authored
      Without this it's impossible to find out what methods/views/queries are
      executed by a certain controller or Sidekiq worker. While this will
      increase the total number of series it should stay within reasonable
      limits due to the amount of "actions" being small enough.
  10. 07 Jan, 2016 3 commits
  11. 06 Jan, 2016 1 commit
  12. 04 Jan, 2016 5 commits
  13. 31 Dec, 2015 4 commits
    • Removed tracking of hostnames for metrics · cafc784e
      Yorick Peterse authored
      This isn't hugely useful and mostly wastes InfluxDB space. We can re-add
      this whenever needed (but only once we really need it).
    • Use separate series for Rails/Sidekiq transactions · bd9f86bb
      Yorick Peterse authored
      This removes the need for tagging all metrics with a "process_type" tag.
    • Removed tracking of raw SQL queries · a6c60127
      Yorick Peterse authored
      This particular setup had 3 problems:
      
      1. Storing SQL queries as tags is very inefficient as InfluxDB ends up
         indexing every query (and they can get pretty large). Storing these
         as values instead means we can't always display the SQL as easily.
      2. We already instrument ActiveRecord query methods, thus we already
         have timing information about database queries.
      3. SQL obfuscation is difficult to get right and I'd rather not expose
         sensitive data by accident.
    • Removed various default metrics tags · c936e4e3
      Yorick Peterse authored
      While it's useful to keep track of the different versions (Ruby, GitLab,
      etc) doing so for every point wastes disk space and possibly also RAM
      (which InfluxDB is all to eager to gobble up). If we want to see the
      performance differences between different GitLab versions simply looking
      at the performance since the last release date should suffice.
  14. 29 Dec, 2015 2 commits
    • Write to InfluxDB directly via UDP · 620e7bb3
      Yorick Peterse authored
      This removes the need for Sidekiq and any overhead/problems introduced
      by TCP. There are a few things to take into account:
      
      1. When writing data to InfluxDB you may still get an error if the
         server becomes unavailable during the write. Because of this we're
         catching all exceptions and just ignore them (for now).
      2. Writing via UDP apparently requires the timestamp to be in
         nanoseconds. Without this data either isn't written properly.
      3. Due to the restrictions on UDP buffer sizes we're writing metrics one
         by one, instead of writing all of them at once.
    • Strip newlines from obfuscated SQL · 03478e6d
      Yorick Peterse authored
      Newlines aren't really needed and they may mess with InfluxDB's line
      protocol.
  15. 28 Dec, 2015 1 commit
  16. 17 Dec, 2015 11 commits