Evaluate 2010
Workshop on Experimental Evaluation of Software and Systems in Computer Science
Co-located with SPLASH'10 in Reno/Tahoe, Nevada, October 17-21, 2010.
News
- See you on Monday, October 18, in Reno!
- Now open: Evaluate Collaboratory (a collaborative web site for workshop participants and the broader community)
- Check out, update, and comment on the position statements (account required)
- Browse or contribute to the Experimental Evaluation Bibliography
- Check out the Evaluate 2010 schedule, list of participants, location, lunch and dinner options, and more
- Keynote: Cliff Click
- Panel: Experimental evaluation in different areas of computer science. The panelists are leaders in evaluation methodologies in their respective fields: Chris Drummond (machine learning), Ioana Manolescu (databases), Ellen Voorhees (information retrieval)
Call for Position Statements
We call ourselves 'computer scientists', but are we scientists? If we are scientists, then we must practice the scientific method. This includes a solid experimental evaluation. In our experience, our experimental methodology is ad hoc at best, and nonexistent at worst.
In the last few years, researchers have identified disturbing flaws in the way that experiments are performed in computer science. For example, in the area of performance evaluation of computer systems, our measurements on one system are rarely reproducible on another. As hardware and software grow more complex, this problem just gets worse.
This workshop brings together experts from different areas of computer science to discuss, explore, and attempt to identify the principles of sound experimental evaluation.
The workshop will consist of discussion sessions, which focus on themes such as data collection, data analysis, and reproducibility, with the goal to answer the following questions:
- What are the issues that are preventing proper experimental evaluation?
- How can we resolve these issues?
- We need more research in evaluation methodology. What should that research be?
- We need better tools to do sound experimental evaluation. How do we encourage investment in such tools?
- What are the principles and best practices that people are using in the different areas of computer science?
- How does the computer science curriculum need to be changed to prepare the next generation of computer scientists?
Submission
Submissions are closed.
You do not need to write a fully formatted paper. Just submit your short, text-only position statement in the "Abstract" field on the EasyChair submission system: http://www.easychair.org/conferences/?conf=evaluate2010. Do not upload any file; check the "Abstract Only" check box instead. In the "Keywords" section, specify the topics you would be most interested in discussing during the workshop.
The organizing committee will use the submitted position statements to determine who to invite to the workshop. In your position statement, please also answer the following two questions: (1) Why do I want to participate in the workshop? (2) What can I contribute to the workshop?
If you are invited to attend the workshop, your position statement will be released on a password-protected web page that will be accessible to all workshop participants.
Moreover, if you would like to voice your opinion already before this workshop,
please leave a comment on our
LinkedIn
,
Facebook
,
or
Xing
groups, or contact any of the organizers.
Important Dates
Submission deadline: | Monday, August 9, 2010, 23:59:59 (Apia, Samoa Time) |
Notification of acceptance: | Monday, August 30, 2010 |
Early registration: | mid September, 2010 |
Workshop: | Monday, October 18, 2010 |
Keynote
Cliff Click, Chief JVM Architect and Distinguished Engineer at Azul Systems, will give the keynote on experimental evaluation.
Panel Discussion
To learn from the experience and insights on evaluation methodology of our colleagues in different areas of computer science, we will host a virtual panel discussion with renowned experts on experimental evaluation in three distinct areas:
-
Machine Learning
Chris Drummond (National Research Council of Canada and University of Ottawa)
Co-Organizer of Workshop Series on Evaluation Methods for Machine Learning. -
Databases
Ioana Manolescu (INRIA Saclay -- Ile-de-France)
Co-Chair of the SIGMOD Repeatability and Workability Initiative. Founding co-chair of the ExpDB workshop (Performance and Evaluation of Database Management Systems), in conjunction with ACM SIGMOD. -
Information Retrieval
Ellen Voorhees (NIST)
Manager of the Retrieval Group at NIST that is home to the TREC, TRECVid, and TAC evaluation workshop series.
Organizing Committee
- Steve Blackburn (Australian National University)
- Amer Diwan (University of Colorado / Google)
- Matthias Hauswirth (University of Lugano, Switzerland)
- Atif Memon (University of Maryland)
- Peter F. Sweeney (IBM Research)
Background
Here is a short initial list of prior work related to the
Experimental Evaluation of Software and Systems in Computer Science.
This list is by no means meant to be exhaustive.
One goal of our workshop is to initiate work towards a resource,
maybe in the form of a Wiki or an annotated bibliography,
that can serve as a reference for researchers
who need to experimentally evaluate Software and Systems.
If you would like to contribute, post to one of our groups
(,
,
),
contact one of the workshop organizers, or submit a position statement to the workshop.
Papers
-
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The Dacapo Benchmarks: Java Benchmarking Development and Analysis. OOPSLA'06
This paper pointed out that the complex interaction between Java applications and the architecture, compiler, virtual machine, and memory management required more extensive evaluation than C, C++ and Fortan applications. The authors took steps towards improving methodologies for choosing and evaluating benchmarks.
-
A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. OOPSLA'07
This paper pointed out that the current state-of-the-art approach to evaluating Java applications was flawed. The authors went on to propose a more rigorous methodology based on confidence intervals from statistics and then used this methodology to demonstrate the previous reported evaluations were erroneous.
-
T. Mytkowicz, A. Diwan, M. Hauswirth, and P.F. Sweeney. Producing Wrong Data Without Doing Anything Obviously Wrong! ASPLOS'09
This paper pointed out that there are common artifacts in one's experimental setup, such as the environment size and link order of binary files, that are typically ignored, but can significantly impact the outcome of one's experiment for C applications.
-
T. Mytkowicz, A. Diwan, M. Hauswirth, and P.F. Sweeney. Evaluating the Accuracy of Java Profilers. PLDI'10
This paper demonstrated that the current state-of-the-art Java profilers, which everyone is using to understand the performance of Java applications, are fundamentally broken. Specifically, the profilers often disagree on the identity of hot methods. If two profilers disagree, at least one, if not both of them, must be incorrect.
Books
-
R. Jain. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, Wiley- Interscience, New York, NY, USA, April 1991, ISBN 0471503361.
-
D. J. Lilja. Measuring Computer Performance: A Practitioner's Guide, Cambridge University Press, New York, NY, USA, 2000, ISBN 0-521-64105-5.
-
Lieven Eeckhout. Computer Architecture Performance Evaluation Methods, Morgan & Claypool Publishers, San Rafael, CA, USA, June 2010, doi:10.2200/S00273ED1V01Y201006CAC010.
Others
- 2007 Workshop on Experimental Computer Science
-
D. Feitelson. Performance Evaluation Links and Resources on Experimental Computer Science
- COMET: COMmunity Event-based Testing, an infrastructure to provide uniformity in experimentation and benchmarking in event-driven software testing