Improve AutolinkFilter#text_parse performance
By using clever XPath queries we can quite significantly improve the
performance of this method. The actual improvement depends a bit on the
amount of links used but in my tests the new implementation is usually
around 8 times faster than the old one. This was measured using the
following benchmark:
require 'benchmark/ips'
text = '<p>' + Note.select("string_agg(note, '') AS note").limit(50).take[:note] + '</p>'
document = Nokogiri::HTML.fragment(text)
filter = Banzai::Filter::AutolinkFilter.new(document, autolink: true)
puts "Input size: #{(text.bytesize.to_f / 1024 / 1024).round(2)} MB"
filter.rinku_parse
Benchmark.ips(time: 15) do |bench|
bench.report 'text_parse' do
filter.text_parse
end
bench.report 'text_parse_fast' do
filter.text_parse_fast
end
bench.compare!
end
Here the "text_parse_fast" method is the new implementation and
"text_parse" the old one. The input size was around 180 MB. Running this
benchmark outputs the following:
Input size: 181.16 MB
Calculating -------------------------------------
text_parse 1.000 i/100ms
text_parse_fast 9.000 i/100ms
-------------------------------------------------
text_parse 13.021 (±15.4%) i/s - 188.000
text_parse_fast 112.741 (± 3.5%) i/s - 1.692k
Comparison:
text_parse_fast: 112.7 i/s
text_parse: 13.0 i/s - 8.66x slower
Again the production timings may (and most likely will) vary depending
on the input being processed.
Showing
Please
register
or
sign in
to comment