Usually, you can identify what problem you are facing by carefully listening to the questions that are being asked. The following questions all point to distributions:
- What values does this variable take? Are there many values, or just a few typical values? How many times do these particular values appear?
- Are values distributed evenly throughout the data? Or do one or two appear with greater frequency than the others?
- What is the dominant (most frequent) value?
- What is the range (smallest to largest value) of this variable?
- Are there any untypical (unexpected) values? If so, are they mistakes or outliers (values that are valid but outside the normal range of data)?
at least one
of these questions is posed during the initial data analysis, focus on the
distribution of variables