Deals Of The Week - hours only!Up to 80% off on all courses and bundles.-Close
What we will learn
Know your problem
5. Interpreting time's dual nature
Know your data
Visualize your data - bar chart
Work with your chart
Check yourself

Instruction

Time has a dual nature. No, we haven't switched from data visualization to philosophy – this has to do with how we measure time: as a period or as a point.

If you have a limited data plan on your smartphone or mobile, you know what we're talking about. Suppose you have 3 GB a month included in your phone plan. If you exceed 3 GB of data on any month, you have to pay an extra charge for each additional gigabyte.

If you want to avoid these extra charges, you probably keep a close eye on your data usage. You can do this in two ways: by looking at your daily usage, which resets every 24 hours, or by looking at your cumulative usage, which resets every month. As a dataset, this would look like:

     date       daily_usage cum_sum
1 2012-02-01           5       5
2 2012-02-02        1242    1247
3 2012-02-03           0    1247
4 2012-02-04        2424    3671
5 2012-02-05       44000   47671

This dataset has two strictly numerical variables (representing daily usage and cumulative usage) and one time variable that indexes both. You can treat these as two different time series. That's because the type of date is different for each one. This depends on whether you treat the date as a period or a point in time.

  • If your time variable serves as a description of a period of time, you treat it as a categorical (specifically, an ordinal) variable. Here, the variable daily_usage describes how many Kb you've used in one day (from 12:00 a.m to 11:59 p.m, or 00:00 to 23:59). Say you've used 5 Kb for February 1; you reset this value at the end of the day and start counting again from zero. There's no continuity here; the value is always reset when the period of time ends.
  • If your time variable serves as a description of a point in time, you treat it as a numerical variable. Here, the variable cum_sum describes how many Kb you used from February 1 until 11.59 p.m on a particular day. In this case, 5 Kb is the usage for 11.59 p.m on February 1. When 12.00 a.m. February 2 comes around, you don't reset this value - it just keeps on counting your data usage. We have a continuous change going on.

Being able to distinguish between these two types of time helps us choose the best chart for a time series. However, we cannot always easily distinguish between these situations. If you're not sure how you're using your time variable, you can probably safely use either a bar chart or a line chart.