Transparency

Home  / About  / Data Transformation

Data Transformation

A tried and trusted data transformation methodology

Significant effort is required to improve the original raw financial data such that it is accessible, relevant and of value to the general public. Data transformation for spotlightonspend leverages the same tried and trusted methodology used by Spikes Cavell to convert raw financial data from a public body’s financial management systems into actionable intelligence to support their efforts to transform the way they procure goods & services.

Cleanse

A validated data extract is processed using a specially developed engine to standardise the data, remove duplicates, identify and fix errors and prepare the file for subsequent processing. We routinely trap more than 200 common data errors in raw data extracts uploaded for processing – unresolved, these errors would result in the provision of data to the public that is inaccurate and potentially misleading.  

Redact

To minimise the risk of inadvertent breach of Data Protection legislation, the cleansed and standardised raw data is further processed to identify payments made to individuals (for example foster carers and vulnerable adults in local government) or where national security, personal security or foreign relations might be compromised.

Our redaction algorithms are sophisticated and leverage unique reference databases and rule-sets designed to ensure that bone-fide payments are not inadvertently obscured. Once a redaction candidate has been identified and validated by a data analyst, any identifying information is overwritten to ensure that the recipient of the payment cannot be identified.

Classify

Every public body’s financial management system is broadly similar, but when it comes to delivering meaningful visibility of spending on goods & services there are significant differences that mean that it is not possible to make meaningful like-for-like comparisons (for example by department, cost centre or subjective).

To overcome the absence of uniform and reliable classification, a sophisticated matching & inference engine is used to match the supplier record or item description to our unique reference datasets and allocate the supplier to a primary category derived from our universal taxonomy (called ‘The V Code’).

The V Code is a hierarchical classification system, specific to the public sector, that facilitates the classification of suppliers of goods & services to enable the comparison of spend data across the public sector so that, for example a local authority could be meaningfully compared to a hospital trust.

All classifications that represent 97% by value of spend are validated by classification experts. It is this classification and validation effort that is used to provide analysis of 'Spend by Category' in spotlightonspend.

spotlightonspend includes tools that allow you to easily reallocate suppliers to a different category where that category might better reflect what you purchase from them.

Enrich

Our matching & inference engine is also used to match the supplier record to our unique reference datasets and append a range of attributes to each record. Standard attributes include: the Number of Employees, Annual Revenue (Actual or Modelled), Date of Incorporation (Birth Year), Geographic Location and Risk Classification (Modelled). Several of these attributes are used to deliver 'Spend in Summary' in spotlightonspend.

"I'm really excited about the opportunities for transparency and it's something this government is utterly committed to. spotlightonspend demonstrates that, when innovative businesses work with far-sighted public bodies, we can inform the public, reduce costs and improve democracy both locally and nationally."

Eric Pickles, Secretary of State for Communities & Local Government.

  • Follow us on Twitter
  • Share this page