Common Data Model for Financial Auditors : Part 1

When I first approached the idea of a common data model (CDM), it was because our team and the teams in other audit agencies were all separately developing audit tools in R, and our respective directors were keen that we collaborated and achieved some efficiencies. Senior management love efficiencies. If we had a CDM, then we could swap chunks of R code with each other or easily work on one a single chunk together (for example, a fixed asset register analyser).

I think this simplistic understanding of mine did a disservice to the value of devising a CDM. I should know this from my days as a philosophy student at QUB - philosophy of language (particularly Saussure, but also Quine) is one of the most interesting things anyone can spend time on. A CDM provides us with a common language that can not only easily be understood by humans but also by different chunks of the code. By creating an equivalency in definition, we create useful boundaries of meaning that allow the analysis to occur without too much additional deliberation and conjecture.

Another aspect of a CDM is that you can decide just how many descriptions you want to include. That old chestnut about eskimos having thirty words for snow (a sentiment which is factually incorrect, deploys a rusty uninformed stereotype and includes an pejorative racial slur no longer in popular use, which is also an important notion in the evolution of language, but getting off the point a bit) has a kernel of value at its core. A contemporary sentiment would be that an Ulsterman has thirty different words to describe the rain. You need to recognise how many different variables and use cases are to be employed in the tool, then you allocate them descriptors. This also means that you are explicitly excluding some concepts from your analysis.

You'd probably want to do this as, if you want to make your financial audit tool into a God-like behemoth, then you'll not live long enough to get the data wrangling bit of the script done. Some ERPs churn out general ledgers with hundreds of columns of information. Fortunately, not all of it is relevant to our task at hand.

The good people at Engine B (https://engineb.com/) have been working with Microsoft and a number of audit stakeholders to devise a CDM for auditing. If I have read their notes correctly, the CDM (available on Github - https://github.com/Engine-B/EngineB-CDM) is open source and the intention is that auditors and auditees are employing the CDM to help pull together the diverse systems employed at mid to large tier entities. If the whole world got to the point where all financial systems were using the same CDM, this would be a colossal advance. But perhaps that's why I'm a little sceptical about the likelihood of that happening. If it did, it would hugely contribute to financial transparency, therefore some managers won't like that. Some others will insist that they are totally different from ever other organisation in the world and need custom inputs and variables and descriptors.

This is why I think the responsibility will continue to lie with the auditor to understand the client and make the correct decisions on how to analyse the data based on the nature of the variable and the substance of the transaction.

There are not (as far as I can tell) many folk out there working on an audit or accounting CDM. I initially intended to use Engine B's CDM as a starting point for Lupin but, at almost two thousand unique variable types, it's simply far too comprehensive for what I'm trying to achieve here. It is my belief (stop me if I'm wrong) that most auditors are barely getting past calculating ratios of debtors days or cleansing GL data to do some substantive fraud and error evaluation based on average costs per purchase order per supplier. I mean, that might not be a useful way to identify a fraud anyway, but that's for another day.

So I'm going to devise my own based on my initial assessment of what auditors would want to analyse. Fortunately, the key documents (excluding the general ledger) tend to be fairly consistent in the information they contain; a fixed asset register typically includes a discrete set of relevant variables. Likewise, an aged debt schedule or a sales day book. This will help shape the specific types of analysis we would want to do on these documents and also to identify the keys which will tie the transactions back to the GL.

However, the general ledger is going to be more complicated and so, rather than the document driving the analysis, we'll have to identify the most useful types of analysis first and then decide what variables from the GL we want to include in the CDM. I think this approach will mean that we can add additional analytics later once we have the script(s) working.

Next step is to devise the CDM and get it onto Github. Chances are that some of my CDM will overlap with what Engine B have come up with (there's a fairly limited number of ways to describe accounting things), so I will yet again point out that they are the ones to have done the heavy lifting and I am much obliged to them for that. At the very least, their work illustrates the sheer volume and variety of transactional data that is generated by modern digital accounting systems, and the need for a contemporary financial auditor to have the relevant data science skillset to perform an efficient and high quality audit.

Comments

Popular posts from this blog

Data Wrangling : General Ledger as CSV : Import and data types

Common Data Model for Financial Auditors : Part 2