Data
Data is a collection of properties that explicitly or through interpretation of their format and context are able to fully specify the entity, quality and value of the property.
The completeness of data in this sense is important to distinguish it from mere noise. Qualifiers without values can't be useful. Values mapping to entities without qualifiers or qualifiers mapping to values without entities are dubious. They might, depending on context, be interpreted into a useful set, but that is not given.
For example, consider the following data sample:
| Entity | Quality | Value |
|---|---|---|
| Guitar | 100 | |
| Price | 100 | |
| Guitar | Price | |
| Guitar |
In the first entry, we know what entity we are referring to, and we have the number 100. With a large enough data set, perhaps we could attempt to determine these are guitar prices, but it is precisely this potential that makes this not be simply noise.
In the second entry, we have "Price" and "100". Perhaps we could infer what is the entity from the price and other context. If the quality were not price but something less generic, we might as well easily infer what it is given we also have values. Notice however that all hinges on this potential.
Next, we have only the entity and the quality descriptor, but no value. This is even less useful, since no interpretation is needed but the actual values are missing.
Finally, we have simply the entity and no quality nor value. Likewise for any other single field, this is most definitely noise.
In this work, data is defined as such that if interpretation from context for a given data set is possible, we can consider it data. But data that can't be made meaningful is simply noise because it lacks the potential to be interpreted further.