Data types for statistical propose
Since the revolution of the internet, all the world has the capability to capture the most amount of data as possible, this allow us to make interesting analysis in order to find patterns that help to make decisions.
Most of these data are data that do not make sense at first, that is, it is difficult to understand and analyze them, unless we begin to categorize them according to their content and meaning.
For many organizations, data has become the raw material of the present and the future, since it is possible to analyze it in large quantities, companies are trying to understand it and find patterns that help them be one step ahead of the competition and be more attractive to potential consumers.
The situation is not simple, to achieve this it is necessary to have trained personnel to do so, that is, those professionals who are capable of performing the analyzes correctly to obtain good results. A bad analysis can be counterproductive for the company, because poorly made decisions result in poor execution processes.
The main thing is to understand the data with which you are working and then categorize them according to their nature, this is important, because it is decisive when choosing which type of analysis to apply and which not.
The data sets can be divided into two large sub groups, numerical data and categorical data, these in turn are divided into sub groups, let’s analyze below what each of them is about and thus learn to identify them.
Numerical data.
As its name indicates, this type of data is that whose value is a real number, the range can go from minus infinity to infinity. Everything that can be represented by a number falls into this group, for example, quantities, percentages, frequencies, etc.
The numerical data in turn is divided into two:
Continuous numerical data.
This type of data are quantities which can admit decimal points in their value, so it is impossible to give an exact value for this. Some examples can be:
- Age: We can say that someone is 18 years old, but this is not 100% correct, the person can be 18 years old, 3 months, 19 days, 34 minutes, 12 seconds, and so we can continue with the thousandths of a second, etc. The point here is that, saying that you are 18 years old does not give us the exact age, it would be practically impossible to obtain it.
- Weight: Like age we cannot guarantee that the weight of an object is 870 grams, this measurement can generate infinite decimals when trying to find the exact value of the weight.
- Temperature: Another clear example of how it is not possible to be accurate with this measurement, 36.5 degrees Celsius can be considered as a measure for the temperature of the human body, but it is not really the exact.
When we are talking about exact measurements and that it is impossible to obtain them, does not mean that this type of data is not useful for analysis, it is simply considered that there is no accuracy in it.
Discrete numerical data.
Unlike the previous ones, here we have an exact measurement, it is considered a discrete numerical data when the measure has to be a positive or negative integer that does not accept the incursion of decimals in it. Some examples can be:
- Number of children in a population: This data must be an exact number, as there are 1,508 children, it is impossible to say that there are 1,508.65 children.
- How many earthquakes have occurred during 2022: It may be 4, or maybe 5, but it is impossible to speak of 3.4 earthquakes.
Categorical data.
The name of this data set is clear, we are talking about variables where one of its characteristics is what is being analyzed and described, for example, if we want to categorize the climate as “cold”, “warm” or “hot”. We are not considering a number, but a text string that describes the variable, in this case, the weather variable can be described as “cold”, “warm” or “hot”.
Categorical data is divided into two:
Nominal categorical data:
They are those data whose descriptions of the variable are arbitrary, that is, they do not follow a specific order, to exemplify them we have the following:
- Gender: This data can be, “female”, “male”, “other”. Never in a specific order, that is, the order of the descriptions does not affect or does not make sense, the same meaning is for “female”, “male”, “other” than for “male”, “other”, “female”.
- Marital status: This data can be divided into “married”, “single”, “widowed”, “divorced”, regardless of the order.
Ordinal categorical data:
They are those data whose descriptions of the variable follow a specific order, to exemplify them we have the following:
- Stage of life: “child”, “adolescent”, “adult”, “elderly”, here the variables follow an ascending pattern that is focused on the age of the person.
- Difficulty level of a course: “basic”, “intermediate”, “advanced”. Here the variable is conditioned to follow a pattern according to the level of difficulty of the topics taught in the course.