Leaked files are creating ‘monumental’ fraud and cyber-attack risks for organisations, their employees and customers, with new analysis finding financial documents present in 93% of all data breaches.
After investigating more than 141 million leaked files from almost 1,300 breached datasets, data intelligence platform Lab 1 discovered that nearly all included financial, HR and customer data across emails, spreadsheets, code files, and unstructured files, like PDFs.
Using AI agents to scrape and analyse every file exposed, Lab 1’s first annual Anatomy of a Breach Report reveals that sensitive financial documents appeared in almost every incident, and account for 41% of all files.

Bank statements, which could allow fraudsters to commit identity fraud, were present in 49% of incidents, while IBANs, which can be used for mandate scams and payment redirection, were included in 36% of breached datasets.
Equally concerning, customer and corporate personally identifiable information (PII) was also exposed in nearly all breaches examined, with human resources data, often containing employee PII like payroll and resume data, appearing in 82% of breaches, while two-thirds (67%) involved communications and records around customer service interactions.
Emails were the most prevalent type of exposed sensitive information, being leaked in 86% of all data breaches at a rate equivalent to 54 email addresses per incident, creating what Lab 1 described as a ‘significant risk of phishing and impersonation’, with adversaries able to train social engineering AI models or conduct highly targeted campaigns at scale.
Impacting American citizens in particular, the analysis found that US Social Security Numbers were identified in half of all incidents (51%), commonly exploited in identity theft and benefits fraud and highly regulated under US law.
The analysis also exposed new cyberattack avenues. More than three-quarters (79%) of breached datasets included system logs, critical for understanding system behaviour, user activity, and environmental configurations, which can be used by attackers to map out systems and detect vulnerable endpoints.
Added to that, cryptographic keys (SSH and RSA Keys) were present in 18% of all incidents, which enable attackers to bypass authentication and access secure systems, while cloud and infrastructure indicators, such as AWS S3 paths and virtual hosts, featured in two-fifths of breaches, facilitating data exfiltration or the discovery of unsecured cloud storage.
Code files, which were exposed in 87% of incidents and account for 17% of all exposed files, introduce vulnerabilities to the Software Bill of Materials by undermining the integrity and trustworthiness of the software supply chain.
Recommended reading
Lab 1’s content-level analysis shows the full blast radius of organisations implicated in these incidents is expansive, with the median number of distinct organisations exposed per breach standing at 482, with many of these firms having nth-party relations to the breached company and unaware of their potential exposure.
While the blast radius of a breach varies significantly by organisation and sector, the incident with the biggest impact identified by Lab 1 had a blast radius of over 1.73 million affected organisations.
“With cyber-criminals now behaving like data scientists to unearth these valuable insights to fuel cyber-attacks and fraud, unstructured data cannot be ignored,” said Robin Brattel, co-founder and CEO of Lab 1.
“Ultimately, organisations must understand what information has been leaked, how it can be used, and who might be affected. And faster than it can be used against them.”