close
close

The AI ​​tool helps journalists to recognize passports that are hidden in massive offshore data leaks

In the rapidly developing landscape of investigative journalism, in which massive lows of leaked data contain the keys for the detection of hidden corruption and financial confidentiality, precision and efficiency have become critical. Recently, the global investigative journalism Network (Gijn) published an revealing article entitled “Passes are the key to uncovering offshore secrets. We use machine learning to find them efficiently in order to find one of the most important instruments in the reveal of passports that can expose themselves to be striking for banal travel documents.

At the center of this investigative revolution is the international consortium of investigative journalists (ICIJ), a global network that has conducted some of the most important journalistic breakthroughs of the 21st century, including the Panama Papers and Pandora Papers investigations. These monumental efforts revealed the complex financial nets, which were expressed by elites, politicians and civil servants worldwide in order to hide prosperity and avoid the exam. What many may not recognize is that passes serve as important identifiers in these examinations and help to combine shady companies and trust with real people.

Passes offer decisive, irrefutable data points, birth data, nationalities and unique passport numbers that enable journalists to hold out by layers of offshore anonymity. In jurisdiction in which the property of companies behind a veil of Shell companies, trusts and candidates and candidates and candidates and candidates and nominal directors and opaque, a pass scan is often the only way to attribute these companies with actual persons.

However, it is a discouraging challenge to find a pass scan that is buried under millions of documents. In massive data leaks, information can be buried in millions of files that include PDFs, E -Mails, Images and Scanned Documents. Passes rarely have obvious file names and the OCR software (optical character detection) fights with the poor quality of many scans. Journalists were previously based on keyword search with open source search engine from ICIJ, data shape, filtering for terms such as “pass” or “visa” and certain file types. However, this approach created an overwhelming number of incorrect positive aspects and completely missed many passport photos.

In order to cope with these challenges, ICIJ worked with the AI ​​Journalism Resource Center of the Oslo Metropolitan University (Oslomet) and the National National broadcaster NRK to develop an Advanced Machine Learning (ML) tool for the fast and precise record of passes within the massive document sets.

The solution uses computer vision, a branch of artificial intelligence, with which machines can “see” and interpret visual data. In the core of the project is Yolo (“You only look at once”), an open source object recognition algorithm that was originally developed for generic image recognition tasks. The team has adapted Yolo to identify passportlayouts and train the model with a diverse data set with commented passport photos collected by ICIJ and its employees.

The process begins with the conversion of each document to an image file, which then scans the model from potential passport pages. When the model recognizes a pass, it extracts information from the machine-readable zone (MRZ)-the two lines of coded text at the end of the Pass Photo sites. This extraction captures critical fields such as the names of the passport, the date of birth, nationality and passport number.

The results were excellent. The tailor-made YOLO model reached an 86% precision rate, which were only 14% of the images marked as passports false and an almost perfect recall rate, which successfully identified almost 100% of the actual passport times in the test data record.

ICIJ integrated the instrument for recognizing passport into the existing workflow for document processing. The model runs as a service that can scan up to 500 document pages per minute on a machine equipped with a 16 GP GPU. After automated detection of the model, the data team checks the results with prophecies, an open source platform for checking and checking the facts.

As soon as the passes are validated, they are marked in Datahare, which means that they can be searched for journalists all over the world immediately. This integration significantly accelerates the examination process and reduces the volume of irrelevant documents that journalists have to examine manually.

A case study from the examination of Pandora Papers shows the effects of the tool: the team was limited from an initial pool of over 110,000 documents to around 75,000 visual files. The machine learning model marked about 1,000 potential passes. After several rounds of human validation, journalists confirmed around 500 unique passes with exact country information. This reduced manual review of the workload of 110,000 documents on only 3,000-a massive efficiency gain that saves weeks of work.

While the tool for machine learning automates a significant part of the detection process, human participation remains indispensable. This cooperation between AI experts and investigative journalists illustrates the AI ​​model “Man in the loop”, in which machines with large-scale, repetitive tasks and people provide critical judgment, reviews and editorial decisions.

Agustin Armendariz, a high -ranking data reporter at ICIJ, explains: “Listran lists of pass owners and economic owners are often the best starting point for reporters who is new to a leak to look for a story relevant to their audience.” He also emphasizes that the passport identification instrument enables investigators to quickly find people from public interest in massive leaks and to concentrate their deeper analysis on the most promising leads.

The treatment of pass data contains serious ethical and legal concerns. Such personal data must be treated with strict confidentiality and security. The tool and its underlying data never leave the secure infrastructure of ICIJ. Participants and employees are all tied to strict confidentiality agreements.

It is important that ICIJ decided not to publicly publish the weights of the model in order to prevent potential reverse engineering attacks that could endanger the anonymity and data integrity of the source. Although the model itself remains proprietary, ICIJ ways to share the methodology behind the tool so that other journalistic organizations can create their own identification models with their data, which promotes a wider ecosystem of machine -assisted investigative journalism.

The development and provision of this tool for passport recognition symbolizes a growing interface of investigative journalism and modern technology. It shows how AI, in particular computer vision and machine learning, can change the way journalists seven by mountains of data and enable them to spend less time for placking and more time for analysis, storytelling and review.

When data leaks are increasingly and more complex, tools like this are indispensable for newsrooms worldwide. They mark a critical evolution-from manual search and endless scrolls to highly developed AI-controlled workflows that can discover connections deeply in unstructured data.

This technology also underlines the importance of innovation, cooperation and ethical considerations in journalism. The synergy between AI researchers and investigative journalists offers a blueprint for future projects that use artificial intelligence responsibly without affecting accuracy or confidentiality.

While the Passport recognition model is currently concentrated in a certain type of document, the principles contained therein can be applied to identify other critical documents in Leckscontracts, emails, financial reports or identification cards.

At a time when transparency is often covered by highly developed financial and legal agreements, journalists for mechanical learning can be drilled in order to drill the opacity. By combining technological skills with the journalistic strict, the ICIJ and its partners set a precedent for how investigative journalism can adapt and thrive in the middle of the growing flood of data.

The instrument for recognizing passport is more than a technical innovation-es is a strong instrument in the global struggle against corruption and financial secret. While the elites of the world are looking for more and more complex paths to hide their wealth and influence, journalists who are armed with AI and machine learning are willing to uncover the truth.

Please follow Blitz on Google News Channel

Vijaya Laxmi Tripura, a research sign, columnist and analyst is a special contribution to lightning. She lives in Cape Town, South Africa.

Leave a Comment