This is a followup to yesterday’s ramblings about initial work screwing with std::thread.
In talking with Quintosh on Twitter, he asked why I didn’t go with some sort of thread pool. Frankly, it’s because I’m lazy. However, it did give me an idea. The data that I’m working on is fairly parallelizable without memory collisions. Rather than creating a thread pool where I hand work to threads, it would be a lot lower amount of work for a simple program if I let the threads directly request work themselves. This gave me a few things:
- A unit of work could be much smaller, so cores that are running slower because of other system processes don’t present blocks.
- Once I implement other things that scale the program non-linearly (ex: reflection/refraction), I don’t have to worry about intelligently breaking up the work into equal sizes.
- The only member that is accessed across many threads is the piece of data controlling the work request. This keeps my actual locks to a minimum.
Much to my surprise, I also managed to get Intel VTune working completely. This helped confirm some of my assumptions from yesterday’s article, so I’ll cover that in some detail later on.