I recently had a need to parallel process some tasks for a non-critical app, but really didn’t want it to get out of hand. I wrote this as a simple way to multi-thread without the pain of worrying about synchronizing and queues and being thread safe.
It achieves ghetto thread pooling by chunking the result set into whatever size slice is needed and spins up some threads, once that set is done it will repeat until the data is all processed before moving on.
It ended up working out really well as a direct drop-in, and sped up the processing greatly. I wouldn’t recommend this for production code, however.
require 'thread'
# Stupid simple "multi-threading" - it doesn't use mutex or queues but
# it does have access to local variables, which is convenient. This will
# break a data set into equal slices and process them, but it is not
# perfect in that it will not start the next set until the first is
# completely processed -- so, if you have 1 slow item it loses benefit
# NOTE: this is not thread-safe!
class ThreadPool
def self.process!(data, size = 2, &block)
Array(data).each_slice(size) do |slice|
slice.map do |item|
Thread.new{ block.call(item) }
end.map{ |t| t.join }
end
end
def initialize(size)
@size = size
end
def process!(data, &block)
self.class.process!(data, @size, &block)
end
end
# Playing around with it on the alphabet
# adjust the +size+ to adjust how many threads are
# being used at once
if $0 == __FILE__
require 'pp'
require 'benchmark'
size = 10
words = ('a'..'z').to_a
list = []
pool = ThreadPool.new(size)
puts "Starting (P: %d W: %d)" % [size, words.size]
b = Benchmark.realtime{
pool.process!(words) do |word|
b = Benchmark.realtime{
list << word
sleep rand*2
}
puts "\t\tFinished: #{word} -- %0.2f seconds" % b
end
}
# Output some times
puts "Finished all: %0.2f seconds" % b
puts "\nList: %s\n\n" % list.join(', ')
end