I often use a search engine to explore and review my clients’ websites and check if anything is untoward. The other week
I came across a report on one client’s site that was obviously intended
for internal consumption only. I immediately rang my client to warn
them. They explained that the report had been posted to the public
website instead of the internal intranet by mistake and they’d removed
it as soon as the error was discovered. Obviously they were quite
alarmed that I could still access it more than a week later.
This is a great example of the all-consuming nature of Web searches, Google searches in particular. Google takes a snapshot of each page its
search crawlers examine and caches it as a backup. It’s also the
version used to judge if a page is a good match for a query. My
client’s report was only on the Web for about three hours and yet a
copy of it ended up stored in Google’s cache and was still available
for anyone to read. The fact that sensitive information that gets
crawled can remain in the public domain means data classification and
content change processes are vital to prevent this type of data leakage
from occurring.
Unfortunately, private or sensitive business information makes its way onto the public Internet all too often. In this tip, we’ll discuss
reasons why this happens, and some strategies to help enterprises keep
private or sensitive data off the Web.
Problems that can cause website information leaks
The incident noted above gave me the opportunity to address with my client some specific information security problems that led to the
report being posted on its website. The first problem was that the
organization didn’t properly classify its data and documents.
Implementing a system of data classification and clearly labelling
documents with that classification would make such an incident far less
likely.