Overview
Precise and accurate performance measurement is hard. As the figure below shows, even a seemingly insignificant difference in the measurement context can significantly affect the measurement results.
The figure shows how the measured execution time (y axis) of a benchmark can vary by more than 5% across different measurement contexts (x axis). Each point with whiskers corresponds to a set of measurements in the same context. The point represents the mean over 15 runs, while the whiskers represent the 95% confidence interval for the mean.
In this figure, the difference between the measurement contexts is simply a slight change in the size of their UNIX environment variables: the environment between two adjacent points (measurement contexts) differs by exactly one character (a byte) in size.
The state-of-the-art in performance evaluation prescribes multiple measurement runs and the use of confidence intervals to account for the variability between runs. In the above figure, each point with whiskers represents the outcome of using that approach. However, the figure shows that depending on the context in which the experiment was performed, that outcome can vary by over 5%. Which of these results is correct? What if a researcher used the state-of-the-art approach to evaluate the speedup of a compiler optimization, and if that researcher found a speedup of 4%? Could we trust these results?
Research Questions
We are interested in approaches to improve the accuracy and precision of performance measurements. We aim to answer questions such as:
- How can we overcome the above measurement context bias?
- Which aspects of measurement contexts (beyond environment variable sizes) affect measurements, and how?
- How do measurement infrastructures perturb system behavior?
- How accurate and precise are commonly used performance measurement infrastructures?
We are particularly focusing on measurements of:
- the benefit of compiler optimizations
- the overhead of instrumentation and dynamic analyses
- the cost and benefit of online optimizations in virtual execution environments