Credit: StockSnap

Milestone 2: Pilot Study

Benjamin Xie & Gregory L. Nelson

Designing a data science study is an iterative process. When running an experiment for the first time, like testing software for the first time, you will identify a range of problems. These problems range from improper assumptions, realization of new complexity, technical issues, confusion about the question to answer, and messiness in the data.

An important step before running the main study is to perform a pilot study to help inform the design and implementation of your experiment. By running a small scale analysis, you will be able to identify you will be able to test assumptions and identify problems and adjust your study accordingly. For this milestone, you will perform pilot studies on 2 questions you identified in Milestone 1.

Step 1: Identify uncertainties from 2 questions

Pick two questions where at least 1 of them is from the 3 questions you identified in Milestone 1. For each question, identify 5 uncertainties your team has or assumption that you are making. By "key" uncertainty or assumption, we mean that this unknown can have a drastic effect on your analysis moving forward.

Step 2: Develop a pilot study to clarify key uncertainty from each question you wish to clarify

For that assumption, identify a small scale experiment you can do to gain some insights as to your key uncertainty or assumption. You only have a few days to perform 2 pilot studies, so be ensure your pilot study is reasonably scoped but also useful to your group!

Step 3: Execute your pilot study

Conduct your pilot studies and commit the code to GitHub (you may be referring to it for your main study).

Step 4: Identify your findings

A pilot study is supposed to identify problems so you can make adjustments for your main study. Interpret the findings from your pilot study and decide how you will use this new information to adjust your main study.

If your pilot study confirms what you previously believed, no major changes may be required! More than likely, you will end up adjusting the scope of your question so it is feasible to make progress towards answering it this quarter, adjust the direction of your question to better inform the decision context you're interested in, tossing your question because of newly identified insurmountable issues, etc..

Grading Criteria

This assignment is out of 3 points and you will submit it by pushing changes to you team GitHub repository:

Your shared GitHub space will be graded on the following scale:

1 point: For 2 data science questions, identify 5 specific uncertainties for each question and note them in your GitHub wiki.
2 points: Planning, conducting, and reporting findings of pilot studies for an uncertainty from 2 questions