Curation is king – The 7 R’s Of Smart Data

At AQOIA one of our key beliefs and key differentiations are that everything to do with Analytics from Visualisation and Results starts and ends and lies in the data… Being able to curate the data through the lifecycle and keep it current and fit for purpose is a key differentiated core competence and source of competitive advantage for us…we have the Seven Rs of Big Data for effective Business Utility aimed at Business Users… that covers the full lifecycle and principle that this is a living / continuously  adapting process… (to extend the 5 Vs of Big Data)


To Perform Advanced Analytics that yields fast and effective business results we need to factor:
questions blueR=Rich: The source data for analytics must be Rich: This cannot be understated. Without an appropriate  level of richness the “questions” and subsequent “answers” cannot yield themselves.

It’s a journey very often, as during the discovery process, a lack of Richness, or a need for more Richness is uncovered and if sufficiently valuable will be procured over time. For more advanced Analytics scenarios – sufficiently prescribed Richness is required for Automation and Prediction.

For Discovery processes a broader richness is required to enable monitoring, detection, or inquiry of potential questions patterns andtrends, and then scrutiny to understand more. Dimensional Richness, Attribute Richness and Context Richness around Key Items of Interest / Objects of Inquiry is critical for Speed and Quality of Pattern detection, and Problem/Opportunity Resolution.  The key to exploiting the Richness attribute of Business Analytics data is many objects in high dimensionality and attributes – easily accessible on demand.

R=Relevant: The data must be relevant to the problem area to hand. This may seem obvious, but with many “big data” initiatives working from the bottom up, there can be a tendency to have a lot of irrelevant data and activity that serves no useful purpose. On the other hand, Relevance is subjective, purpose and time based, so the question goes beyond just the statement relevant. Data becomes relevant depending on the purpose and quest of the inquirer. It must therefore be available “on tap” to serve its purpose on demand. Data to fulfil its relevance calling, must therefore be context based and linked in context so that it may become “relevant” “on demand” to pursue a line of inquiry or support a specific problem.

The key to exploiting the Relevance attribute is to have a broad and deep dataset around themes and linked to interrelated themes to support the multi-facated aspects of Relevance.


tableR=Reliable: Reliability is dependability and ultimately trust – from the perspective of the business consumer.

The Rich, Relevant information source – to remain rich and relevant and serve its purpose better and better must grow to be reliably trustworthy and remain there.  This is a critical function directly linked to achievable results.  If the dataset is not reliable in the eyes of the business user, it will not be used extensively or at all. The utility of Business Analytics to yield results  is directly correlated to to the reliability of the dataset. The more powerful the proposition or insights generated, the more impact they will have, the more contention there is, the more reliable they must be be, or they will be dismissed, used to distort other behaviour patterns as reliability will be brought into question…and an undesired course of action can ensue.

Reliability is the one key attribute the business users will home in on to either embrace or dismiss the insight. The key to exploiting Reliability attribute is to engage business upfront and easily show the inner workings – so an “insight” can be deconstructed. The findings and success will build upon themselves and “Trust” is earned and re-enforced over time.


23148237 - robust business stamp with stars isolated on a white background.R=Robust: Robustness is a quality that is essential to secure the ongoing quality and utility of the solution. The more dynamic an organisation becomes, as a direct consequence of its advanced analytics capabilities and competitive advantage, the more changes will ensue, in a faster and faster cycle. In order to have this high quality impact and insight- the analytics set is necessarily granular and detailed, and will necessarily continually at the fringes at least be prone to need updating. The forces are coming from various sources – deeper insight required, new innovation, patterns, competition, market forces, internal improvements. Contrast this to a static set of high level reports that do not track details, leading forces, or anything that really changes. How useful is this information – it may not change, does not need to change, but its inert and impotent.

Robustness is a critical function to any advanced dataset. The key to exploiting Robustness is to ensure a sufficient “operating framework” for Analytics which cuts through business users, business support, technology, governance and process. Not factoring these in or over emphasis on the wrong pillars will lead to non robustness – fragility, the end to end solution must strive for “anti-fragility”


R=Reconcilable: Reconcilable is the underpinning attribute that supports and enables speedy and timely action. The ability to reconcile drives trust, and with trust we get action. Reconciliation is the force that enables broader engagement oversight monitoring and true enterprise engagement scaleability. Implicit in reconciliation is context to something that is already known or understood and embraced.

Its therefore an essential ingredient to gaining broader commitment and securing trust and confidence to act. Its also a potent weapon in cutting through organisation politics and headwinds as the broader context, known and trust issues that are essential to overcome for big actions and indeed a series of small actions – “acting on insight” are systemically addressed with reconciliation and validation procedures.

The reconciliation mechanisms underpin the whole analytics strategy and tactics for the whole lifecycle if they are to be effective.  The key to addressing this reconciliation is to work from the bottom up with the lowest common denominator possible. This lowest level of detail that can be depended upon and against which all other numbers are derived, is “the Transaction”

50120008 - ready blue square grunge stamp on whiteR=Ready: Ready is the enabling attribute that transforms data into insight then action. With advances in technology alongside increasingly dynamic forces and broader deeper context for data driven insight, Speed to action is fast becoming the competitive essence of differentiation. However with the increasing proliferation of “RealTime” datasets, which truly do offer new streaming context for realtime decisions, this is in fact only half the story. Realtime datasets offer distinct value and are relatively easy with todays technology to govern, control and automate – making Fast and Ready a simple proposition.

The real challenge however to the Ready attribute and to making maximum value impact in the context of big data comes from the new class of elapsed time “TimeWarp” series datasets that must first be curated in a timely way. Speed, precision and effort is therefore a significant contributory factor for timely “ ThroughTime” enabled decisions. In an effort to wrangle more and more insight from a grain of valuable data we seek to build context. If this context is built on rich foundations we can significantly amplify the potency and value of the dataset to the information consumer complimenting the Realtime insight.

This quest can however come at the cost of extreme latency and effort and risk jeopardising all the other Rs in the process. Take curation in Excel as the most simple case in point; Excel takes the very least “time and effort” to curate but in the process, violates all the prior R principles. The Key therefore to maximising the Ready attribute for maximum decision impact is to have a very fast, reliable, and robust Capture Curation and Consumption process – in essence to get the most timely context and potent data into the hands of the decision makers that without question ascribe to all the other 6Rs

R=Repeat: Repeat is significantly the most powerful BigData enabling attribute of them all – once you have mastered the baseline for the prior 6 Rs. Repeat is the ability to reproduce results in a coherent way in such a way as to capture the value of not needing to second guess the curated results as they are continuously repeated on an event or period basis.

However, as with the other Rs, there is a critical complication that makes this Repeat process not just essential to capturing rapid gains on an ongoing basis but crucially constantly prone to risk of failure. Dynamics and the rapid change of business – particularly at the granular level we operate at to have impact, means the Repeat process across all other 6Rs must readily enable adaptation evolution and change – without breaking the essence of the 7R cycle.

The key to remaining current to the business and driving  new insight is to continually adapt to change and this force naturally puts the need to change and therefore the ability Repeat in question. The key therefore to harnessing the power of the Repeat attribute and to enable the solution data to continually mirror the business as the business shifts to drive new possibilities, is to apply and embrace capabilities fashioned on “agile” principles to make the whole Data Capture Curate Consume 7 R process “dynamic”.