Storing and processing imprecise data/Dates and ranges

From Wikiversity
Jump to navigation Jump to search

Representation, storage and processing of dates and date ranges are a common topic for research and development in computer science.

Up to this point, though, I have found very little that relates to imprecisely defined dates.

References[edit | edit source]

  • w:GEDCOM - this is a genealogy file data format created by the LDS (Mormons). Dates are allowed to be listed as "approximate" and may have attached sources that justify a particular date and/or explain the limitations or the approximation.
  • Developing Time-Oriented Database Applications in SQL by Richard T. Snodgrass.
  • Analysis Patterns: Reusable Object Models by Martin Fowler. Chapter 3 talks about observations and measurements.

Definitions[edit | edit source]

We are going to lay out some terms that we will be using. I don't want to redefine any terms that are in common use but I do want to clarify some of the side effects and implications.

time
A discrete point at which some instantaneous or relatively short event occurred. The dictionary definition of time includes intervals but for purposes of this discussion, we will exclude that definition to avoid confusion. The short requirement is relative to the precision and margin of error. In the measurement of some phenomena in physics, an entire second may be an eternity while in carbon dating of artifacts, an single year may be irrelevant.
interval
The period between two specific times during which a relatively long event occurs.
duration
The length of an interval.
calculated time
A value that is calculated from other measurements (including at least one time). It may be difficult or even impossible to calculate the margin of error or confidence value.
estimated time
A value that was derived from some understanding of the event. An estimated time must include a probability function so that users may understand how tight the estimate is believed to be.
guessed time
A value created by a person using their (possibly incomplete) knowledge and understanding of the event. Guessed times must include a probability function so that users may understand how tight the guess is believed to be. It is understood that the probability function is also a guess.
measured time
A value that was measured and recorded. If no error or confidence information is given then the value is assumed to be sufficiently correct for its original intended use and/or precision. For example, the birthday of a historical figure may be recorded as just the day, month and year. The margin of error is smaller than a day and so, for all intents and purposes, would be zero and the confidence value would be 100%.
scheduled time
A value that is or was planned for a point in what is or was the future, such as the time and date for a meeting. While it is rare that events start at the precise instant called for in the schedule, the error is usually irrelevant.

User stories[edit | edit source]

Here we will write some scenarios describing things that we want to be able to do.

  • We want to store a time with 1 second precision. The margin of error will assumed to be 0 and the confidence will assumed to be 100%. While these assumptions are, in fact, false, any actual errors will have no effect within the context of the application.