Transformation is the act of taking a set of values from a dataset, processing them in some way (depending on the aims of the research) and arriving at a new set of values with the goal of revealing some aspect of the data from a new perspective.
(This article is the fourth part in the Deconstructing Analysis Techniques series.)
This technique is characterised by the fact that the values are changed; that someone looking at the new values will be unable to work backwards to the original values; and that for each original data point there is a single, new data point.
In mathematical parlance, (and you can skip this part if you like) the difference between a manipulation technique and a transformation technique is that manipulated data sets are congruent with the original, whereas transformed data only maintains cardinality (i.e. the same number of elements).
So, what does that all mean? We’re talk here about analysis methods like:
- Scaling – taking one set of data and massaging them to fit a distribution or ‘shape’ of values.
- Moving averages – taking a number of consecutive values and averaging them as way of ‘smoothing’ the last value in the series.
- Weighted averages – calculate an average value where more importance – ‘weight’ – is given to some values.
- Weighted indexes – calculate an indexed score (against a baseline) where more importance – ‘weight’ is given to some values.
- Seasonal adjustments – an adjustment made to a data point to account for cyclical peaks and troughs to highlight the ‘real’ shift
- Differences – a method of looking at the changes between one value and the next.
Now, initially, most of these methods may feel pretty technical, quantitative and removed from standard design research analysis. However, they form a powerful collection of analysis methods that will better equip you in undertaking design research. They also represent fairly low-level mathematical/quantitative methods and are available in a standard spreadsheet program. More importantly, used properly, these methods – and transformation techniques generally – open up new avenues for understanding the people who will use the services and products we design.
Used properly, these methods – and transformation techniques generally – open up new avenues for understanding the people who will use the services and products we design.”
In “Deconstructing Analysis Techniques” we used the example of fitting test scores to a pre-determined probability distribution – scaling – as the example for Transformation techniques.
When we measure a population characteristic – such as height, or a test score – we create a sample set of data for that characteristic (unless we are measuring the entire population). There are times when the raw distribution (the frequency of occurrence for each value in our data) of results is not what we’re after. We may wish to compare the shape and attributes of two separate samples – two groups of test participants, for example – and so we transform the two sets of data so that they share a common mean (the average value for the data set).
Usually this is done to bring both sets of data to what is known as a ‘normalized’ distribution with a mean of 0. Of course, in our test/exam result example, we want to adjust the scores so that the class as a whole receives a pre-determined number of A, B, C, D & F. What we’re doing here is to adjust the overall shape of the data. (In these cases a plot of the raw data will look different to the scaled data.) When graphed the scaled data will look roughly bell-shaped, with the middle – or ‘hump’ – representing average performance, and the two thin tails representing high-performance (at the top end) and failure (at the bottom end).
A moving average is used to smooth out day-to-day fluctuations with time series data. It is, literally, the average of the previous x days’ worth of data. A good example would be the number of page views received by a site. Each day the data will jump up and down, creating a sense of “noise” that makes analysis difficult, and, when a small number of observations are looked at in isolation, can create a false impression. A moving average is useful in time-series or longitudinal studies where we measure the value of a characteristic for a single object (person, server, site etc) over time.
One rather well-publicised and important example of this is the series of global temperature readings that have been used by both sides of the climate change debate. Skeptics of global warming point to a recent period of observations (2002 – 2007) which show a decline in global average temperatures. When the same data is looked at using a moving average, smoothing out the peaks and troughs, a clear upward movement is seen.
The choice of time period to use when calculating a moving average is based on the specific circumstances of the data. However, common sense is usually all that’s required. For example, when looking at Web traffic, a moving average calculated over 7 days is sufficient to counter spikes that occur during a given week. You might also calculate a moving average over a month if fluctuations occur over a longer cycle.
Weighted averages aim to address one of the criticisms of a moving average – and other types of averages – that being all values in the average are treated equally. It is often the case that one observation is more significant or important that another.
Let’s say for example we’re measuring the time to complete a task in a user evaluation session. We have representatives from each of our personas (or other audience segments): 2 primary personas, 3 secondary personas, and one tertiary persona. In this case, the performance of the two primary persona representatives is far more significant than that of the tertiary participant.
When we calculate the mean time-to-complete value, we can weight the results so as to reflect the relative importance of each participant. We may assign (and the exact values will vary for you) a weighting as follows:
Primary: multiply by 9
Secondary: multiply by 3
Tertiary: no multiplier
What we’re essentially saying is that our secondary personas are three times more important than our tertiary persona; and that our primary persona are three times more important than our secondary. We could just as easily use a factor of 2 (instead of 3) leading to values of 4, 2 & 1 in the example above; what matters is that we use weighted averages to adjust the dataset to account for the relative importance of some measurable data set by some exogenous variable.
An indexed value is one measured in terms of some baseline figure. The aim is to convey movement around a starting point when there is no way to specify a zero.
An example of an index might be a satisfaction score. Since satisfaction a largely subjective measure, there is no way to define a zero point. Instead we typically measure a ‘pre’ figure and map that over time. Common values for an index are zero and 100. The choice is arbitrary and is typically chosen for clarity in communication.
Indexes are often calculated as an aggregate of a number of measurements. But it is also the case that we sometimes need to treat the data we receive from one group as being more important than another. This is where a weighted index comes in handy. A weighted index – like our weighted average – treats different values as more or less important.
So, if it is common practice to design a product or service to better meet the needs of our primary audience segments; it also makes sense for our satisfaction index to put more stock in the satisfaction of our primary segments. We do this by applying a weighting (some multiplier) to each piece of data collected based on its relative importance.
We could easily do the same with responses to a question like “Would you recommend this service to a friend?”
This technique provides us a with a convenient way to build positive bias – towards the needs of our important audience segments – directly into our research methods.
Some of the things we observe in design research are subject to cyclical variations. We may not, however, want to include a change in our data due to “seasonal” fluctuations, instead wanting to identify “real” changes (in frequency of use, for example).
In order to look at the real changes in our observed data we need to account for the seasonal variability first.
A familiar example might be to look at the number of page views or unique visits received by a site. We might see a big lift in traffic between Sunday & Monday; and a big drop between Friday & Saturday. In order to tell whether an observed drop in traffic on some Saturday is “normal”, we need to look at the regular pattern of changes and “adjust” the Saturday figure.
One way to do this is to calculate the average drop in traffic over time (between Friday & Saturday) and then apply this to the current observation for Friday. This as a predictor or estimator for the current Saturday, which we can then compare against the actual observed data. The average difference acts as our seasonal adjustment.
There are times when what we’re interested in knowing is not the raw value of an observation but the change between one observation and the next.
Consider a test of a new design in which we test first the time to complete a task with the current design; and then the same task with a new design. Across all participants in the test the raw observations (i.e. time to complete) is far less interesting than the change in that time as a result of the new design. (Note that we may wish to express that change as a percentage rather than a raw value.)
We can use the same technique to highlight the variability of some observation over time. For example, we may be tracking the number of connections or ‘friends’ a person has in some social network to understand the relationship between the current number of connections and the rate at which new connection requests come in. To identify the number of new connection we simply calculate the difference between successive observations.
Although primarily applied to quantitative data, transformation techniques are useful in a wide range of design research activities beyond the quantitative.
Transformation of our research data can act as a way of reducing noise and bringing into sharp relief characteristics of the underlying user behaviour. The act of transforming removes us from the raw, original data, but in doing so we can gain the opportunity to uncover meaningful insights hidden from us otherwise.