How Big Data Can Combat VAT Fraud

How Big Data Can Combat VAT Fraud

Like any tax, VAT is vulnerable to fraud. But with its unique credit and refund mechanism, it is especially susceptible to abuse. And abused it is.

In 2016, the EU VAT Gap – the difference between expected VAT revenues and VAT actually collected – amounted to €147.1 billion, representing a significant total revenue loss of 12.3% across the EU. Considering that VAT is a major source of tax revenue in the EU, currently at a staggering 21.5% average, the issue of VAT fraud has become an urgent concern in the EU.  

Traditional means of fraud detection

Traditional measures to combat fraud – including subjecting new VAT registrations, refund claims and a sample of intra-EU traffic flows to additional checks, analyzing filings and tracing money flows through subsidiaries – are extremely time consuming and quickly saps enforcement resources. Since 2010, the EU has introduced more than 100 measures to tackle tax avoidance, evasion and non-compliance, such as greater information exchange between tax authorities and enhanced collaboration between Member States towards better direct tax and VAT collection. Tax authorities are increasingly implementing electronic reporting systems, including real-time transaction reporting systems, in the hope that increased collection and evaluation of data on a real time basis will help to reduce the levels of VAT fraud while increasing the efficiency of VAT compliance and the reclaim processes. And while the move to digital has delivered results – with PwC Paying Taxes 2017 reporting that it takes 27% less time on average to comply with VAT obligations in countries where businesses pay and file VAT online –  the VAT Gap remains.

Challenges of prevention and detection

With an eye on further closing that gap, Member States’ tax administrations have placed high hopes in the use of new technologies to aid in the battle against VAT fraud –  whether through transaction tracking or more rapid exchanges of information. However, the mass amount of structured and unstructured data available from multiple sources and channels is overwhelming, and often proves too complex to properly analyze. This data, referred to as Big Data –  includes information generated continuously through a variety of digital channels, including financial transactions, invoices, customer data streamed from global transparency initiatives, tax data from mass media, the Internet and third-party sources.

How Big Data techniques can help

Recently tax authorities around the globe have turned to Big Data for its potential in combating fraud. Artificial intelligence (AI) and machine learning have emerged as powerful fraud-fighting technologies. Machine learning— technology that allows computers to uncover hidden patterns in existing data —serves as a practical tool for tax data analysts.

Machine learning applied to Big Data enables sophisticated data analytics that combine tax technical knowledge, large data sets, and advanced tools to generate alerts that prevent tax fraud and lower risk. Machine learning eliminates the use of pre-programmed rules in favor of a much more flexible and autonomous approach – one that combats fraud with improved precision.

Predictions  – Supervised learning
Supervised learning – also referred to as predictive modeling –  is a type of machine learning that uses data mining to enable tax authorities to draw on past fraud and audit cases in order to determine which attributes of these cases are most highly correlated with a successful outcome. Known as classification, the algorithms operate by associating the historical data into classes or groups of similar characteristics. This type of supervised learning is used to help with audit selection. The case data used to teach the machine must also include failures, such as no fault audits, to learn what attributes to disregard in order to avoid choosing the wrong target for audit.

Tax authorities can use this technique to learn from large datasets of tax events, and even from unstructured data such as tax footnotes, emails and other supporting tax documentation. For example, when flagging fraudulent meal and entertainment expenses based on past history, the algorithm will take into account the human feedback received, and reduce false positives as time goes on.

Clustering – Unsupervised Learning
While supervised learning helps pinpoint which potential targets to audit, the technology can only base its findings on known cases.  Unsupervised machine learning does not require pre-existing evidence. It is used when previous case data is not available and the tax authorities don’t have a pre-determined idea of what they are looking for.  They are simply looking to flag anything out of the ordinary. A technique called clustering has the machine group similar types of VAT data together, and then identifies anomalies in the clustered data, flagging them for additional investigation.  Clustering algorithms can be used to analyze expense-generating behaviors of employees or business units, or to group transactional data together in order to detect trends within VAT filing cycles, or trends within the VAT data itself.

Instead of the data mining inherent in supervised learning, unsupervised learning focuses on rule mining, in which VAT data is compared in order to detect fraud.

Optical Character Recognition (OCR)
Just as bank ATM machines are now capable of scanning checks using elements of AI-based computer vision and pattern recognition, tax authorities’ algorithms can learn to recognize text and numbers on a printed page. Of course, having access to a large dataset is critical to successfully training the system. When this technique is used for digitizing invoices, handwritten notes, images, contracts, and other paper-based material, it serves as another tool in the efforts to fight VAT fraud. In addition, VAT-related data entry tasks that used to take hundreds of man-hours, and were often error prone, can now be completed automatically and with high levels of accuracy.


Combating the wide-scale prevalence of VAT fraud requires resources and time. While authorities are investigating mega corporations – such as Google, Apple, Ikea, Amazon and Microsoft – in an effort to recover billions of dollars in owed taxes, at the same time they are also scrutinizing smaller businesses to recover any losses due to VAT fraud. With electronic data capture becoming more widespread, it is easier than ever for tax authorities to exploit the data afforded by these systems. It is for this reason that today’s companies are placing increased emphasis on being proactive in avoiding VAT fraud.  Big Data is already revolutionizing business efficiency across multiple industries; now governments have begun utilizing these technologies to impact the tax function of the future.

To ensure your business in in compliance with the complex VAT regulations, contact VATBox for a consultation and demo.  Based on its proprietary AI-based platform, VATBox is an automated, enterprise-wide, cloud-based VAT recovery solution that handles reports, data collection, aggregation, qualification, submissions and the complexities that accompany these processes using a fully automated approach. VATBox, a leader in the VAT recovery marketplace, has successfully streamlined the global VAT recovery process, providing businesses with unrivaled visibility, compliance, and data integrity, and ultimately boosting its bottom line.


Copyright © 2021 VATBox Ltd. All rights reserved. 

We appreciate your interest in VATBox!

* Mandatory Fields