SAMPL submissions and evaluation
Goals of SAMPL
SAMPL focuses on advancing computational methods, so we focus on ensuring participants and the community can learn as much as possible from the challenges we pose. We seek to ensure the focus remains on advancing science rather than on “winning”. However, participants who excel at SAMPL do attract considerable attention, so we are making some changes to ensure fairness while still maximizing opportunities to learn from participation.
Multiple submissions
Previously, we allowed participants to submit multiple sets of predictions (for SAMPL1-6). Partly, this was to ensure optimal opportunities for learning from participation; participants could run several methods to compare and contrast their results.
However, for SAMPL7 onwards, we are modestly changing our policy on multiple submissions to distinguish two categories of blind submissions:
- Ranked submissions: Formal, judged entries in SAMPL (only one per research group or research organization)
- Verified submissions: Blind predictions (unlimited per research group or research organization) which are not formally judged.
Ranked submissions are intended to be the single entry each participant expects to be best performing, and only these methods will receive overall rankings in SAMPL challenges. This will help to alleviate any concerns that a participant might gain an unfair advantage from having “multiple shots on goal” which could involve small variations in method or parameters with relatively little variation in science. Only ranked submissions will receive formal ranking within the relevant SAMPL challenge.
Verified submissions, in contrast, allow us to serve as essentially a custodian of predictions, verifying that they were done in a blind manner in advance of data release. Verified submissions will not be judged or ranked as formal submissions, but we will still attempt to provide performance metrics for these submissions.
We believe that having these two categories will allow us to continue to play a valuable role in helping participants maximize what they learn by trying multiple methods when warranted, while also ensuring that participants applying multiple methods or method variations do not receive an unfair advantage in judging.
If you are involved in multiple projects or teams, should your submissions be ranked? In some cases, participants may be involved in multiple distinct teams participating in SAMPL, which raises the question as to whether submissions from multiple such teams violates the restriction on ranked submissions. Several guidelines may be helpful in deciding whether multiple submissions from distinct teams can all be ranked:
- Are the methods significantly different? If partially overlapping teams employ dramatically different approaches in a challenge, this would not violate the restriction on multiple submissions. But if the teams are similar and the methods are only very slightly different (perhaps what an expert in the specific area would see as minor variations on the same method) these would violate the restriction.
- Are the submissions being done to compare approaches? If the teams in question are comparing approaches deliberately, then only one submission should be ranked. But if the teams are separately doing the best job possible and seeking to compete with one another, then multiple submissions might be ranked.
The spirit of this rule is to ensure that no participant gets an unfair advantage by having multiple shots on goal, so decisions as to what ought to be a “ranked” submission should be made in that light, especially when teams of participants are involved.
External evaluation
For SAMPL7 onwards, we seek to shift to using (or being aided by) external evaluators in judging SAMPL performance. If you are willing to assist with this, please contact David Mobley.
Anonymous submissions
Some historical SAMPL challenges allowed anonymous submissions. However, for SAMPL6 onwards, anonymous submissions are no longer allowed. Participants submissions and methods descriptions will be publicly disclosed via the relevant SAMPL GitHub repository. This helps to ensure the community is able to learn as much as possible from these challenges, and also assists with fairness (e.g. no participant can choose to stay anonymous until their performance becomes clear).
Focused virtual workshops and follow-ups
To help ensure participants learn as much as possible, we now plan virtual workshops focused on each challenge component shortly after the release of the results of each challenge. These are to help participants exchange early ideas and make connections for potential follow-up work well in advance of publishing their results. For example, participants who used similar methods can find out about it at a virtual workshop then compare and contrast and perhaps plan follow-up calculations to resolve any discrepancies, etc. In the past, such work has often been where SAMPL yielded some of its most important lessons.