In investigating or tracing a security incident, we are sometimes asked to determine from whence or where an incident occurred, besides its perpetrator. Usually, we perform IP address tracing to get some sort of evidence. Another possible scenario is when we are faced with a situation where a file is thought to be the evidence and we are required to ascertain that. Sometimes in this case, merely checking its metadata is considered one of the best ways in determining whether it is valid as a piece of evidence. Unfortunately, it is not the case; not even close.
Recently, I was asked in court to provide unbiased opinion on a certain technical aspect of a legal case. The allegation was, as far as I was informed, hinged on the evidence that a certain file was created by a certain someone on a certain date and was uploaded from a certain public IP address. I explained that in computer forensics, what can be regarded as iron-clad evidence cannot possibly hinged on something as elusive as that.
In investigating or tracing a security incident, we are sometimes asked to determine from whence—or where—an incident occurred, besides its perpretrator. Usually, we perform IP address tracing to get some sort of evidence. Another possible scenario is when we are faced with a situation where a file is thought to be the evidence and we are required to ascertain that. Sometimes in this case, merely checking its metadata is considered one of the best ways in determining whether it is valid as a piece of evidence. Unfortunately, it is not the case; not even close. I am not saying that those two pieces of information are useless, however if we are to engage professionally, utmost care should be practiced when analyzing them. This short write-up will attempt to cover a little bit about IP address and metadata, seen from a digital forensics perspective.
Let's say we are required to trace an email. Sure, we could quickly analyze its header which could look like this:
To note, the above image was googled and chosen out of thousands of sample images of email headers out there. Based on the image above, we could tell that the email was sent from a computer whose private IP address—its local context—is 192.168.1.13. On the other hand, the mail server which is recognized by the public Internet domain has 99.108.173.229 as its IP address. Based on this information, is it permissible for us to conclude just which computer actually sent it? If we assume that the owner of malinikishore1@gmail.com uses only his private computer when sending that email, it is probable that we could find out just exactly which computer is responsible.
Surely it can't be this easy? Yep, not at all. There are a few additional things that we need to clarify if we are trying to find evidences, such as:
It should be obvious now that knowing IP address can only takes us so far.
Let's take a brief look at another case example: the use of a proxy server. As we all know, this is an old technology. One of its more famous uses is to circumvent network restrictions when connecting to the public Internet when we are in a restricted office network. Network administrators might use this to actually regulate network connections inside an office network, for example allowing who can access the Internet and who should be denied.
Let's conduct a simple experiment. Before using a proxy server, my public IP address is shown below (I use http://whatismyipaddress.com/):
That IP address, if not static, can change any day. This is pretty common if we are using portable hotspots.
From the above result, we could tell who is it that I'm using for the Internet service provider. Besides that, my general location—that is, Jakarta—can be clearly seen.
Afterwards, let's say I connect my computer to one the available free proxy servers in China (a sample list can be found at http://proxylist.hidemyass.com/2). What could happen is as follows:
Obviously I was not located in China. So without performing extra verification and without additional corroborating facts, just an IP address is meaningless. If we are hosting an application that audits IP address of every incoming request, then if the visitor was behind a proxy server (especially the non-transparent or anonymous one) we shouldn't be hopeful that we could discern the visitor's actual IP address. Of course, it's a different story when law enforcement officers could use court orders or warrants when investigating and tracing an IP address. However, having that kind of authority is not for everyone in any country, much less between countries.
It is therefore important to notice that in IP address tracing, some basic questions are in oder:
Once again, the main point here is that we have be careful in determining or using IP address as evidence in reconstructing a fact in a security incident. IP address still needs to be taken into account for certain, but its context must be absolutely clear.
Some IT people will find metadata as a familiar term. Basically, metadata is data about data; that is, information that we could use to find more details about a file (whether a PDF document or a JPEG image). Before we continue, I need to mention that for the purpose of this write-up and to avoid sounding like a complete pedantic, I don't differentiate between data and information.
Okay, back to metadata. I am taking a PDF file for a good example. We know that a PDF file has a collection of metadata that could help us to know, for example, when it was created, by whom, and so forth. The thing is, as professionals, we have to be very careful in depending totally on metadata which should become clear in a bit as to why.
More often than not, terminologies such as “Creation date” or “Created” that are populated by detailed timestamps can be confusing, if not misleading altogether. The reason being "creation date" could be different between a human's definition and the computer's. Most would assume that "creation date" means the very first time a document was created, whenever that was. Not necessarily to be the case with a computer though—to a computer, "creation date" could probably the timestamp when the file existed for the very first time in its local system storage. So do not be alarmed if someone were to insist that she had created a certain documen on, say, the 10th of December of 2016 at 17:01 whereas on our personal computer (after we had downloaded the file), the creation date became, say, the 11th of December of 2016, at 18:33. This is normal.
This one is even more confusing, if not useless when there are no supporting or corroborating information. Why is that? Let's pretend we have an anti-virus software that performs automated scans to all of our personal computer files which we stored on our Windows machine. A file's accessed date could be the timestamp when that anti-virus software accessed it during that scanning activity. It is no surprise therefore, that R. Lee (the author of Advanced Digital Forensics and Incident Response and a SANS instructor) said that these kinds of timestamps are not trustworthy. Interesting, isn't it? To know more about this matter, please read this excellent paper from SANS.
This aspect of metadata is also suspect. Have you ever opened a Word document (.doc or .docx) and then export it into a PDF, and just about you're closing the Word document, Microsoft Word asks whether you would like to save due to recent "changes"? Insist that there was no changes all you want, Microsoft Office has its own opinion. This is one of the quirks that could confuse us in analyzing timestamps. Again, be careful.
If timestamp issue isn't enough to give us headache, don't worry, metadata reliability is another fun piece to think about. In short, metadata can be modified. Let's take a PDF file as an example. Let's say I have this PDF file below.
From its File Properties, we can conclude temporarily that the PDF file:
Unfortunately, this kind of information is not something that is immutable. Now let's play around with the PDF's metadata.
It's easy to guess that my intention was to modify the contained information within the PDF file. There is no specific reason as to why I chose those fields in this simple experiment. It must be noted that a tool might have a different behavior in processing a file's metadata, whether it's PDF, JPEG, or any other. What we need to keep in mind here is that in analyzing a file's metadata, it is not as simple or straightforward as we might think.
After I modified the metadata, if we look at the medatadata of that same PDF file we might look at something like this.
Now we are seeing as if:
Hopefully, it's quite clear now that when it comes to metadata, by itself is never adequate as evidence when analyzing a security incident involving files.
In the matter of metadata analysis, some basic questions must be first answered if we are trying to be as valid and accurate as possible:
In this write-up, I want to demonstrate that an analysis of a security incident from a digital forensics perspective must be performed with utmost care, especially when concluding it. The problem is even more exacerbated if we only have limited information such as IP address or metadata. Digital forensics analysis must be a combination and a series of activities to obtain a collection of relevant and chronological information that could aid us in making valid conclusions. In certain cases, where complete information is impossible to procure, we have that professional responsibility to admit and state that a definite conclusion could not be given, except by giving several assumptions or possible scenarios of a security incident.