Don't Count on IP Address and Metadata Too Much



In investigating or tracing a security incident, we are sometimes asked to determine from whence or where an incident occurred, besides its perpetrator. Usually, we perform IP address tracing to get some sort of evidence. Another possible scenario is when we are faced with a situation where a file is thought to be the evidence and we are required to ascertain that. Sometimes in this case, merely checking its metadata is considered one of the best ways in determining whether it is valid as a piece of evidence. Unfortunately, it is not the case; not even close.


Introduction

Recently, I was asked in court to provide unbiased opinion on a certain technical aspect of a legal case. The allegation was, as far as I was informed, hinged on the evidence that a certain file was created by a certain someone on a certain date and was uploaded from a certain public IP address.  I explained that in computer forensics, what can be regarded as iron-clad evidence cannot possibly hinged on something as elusive  as that.

In investigating or tracing a security incident, we are sometimes asked to determine from whence—or where—an incident occurred, besides its perpretrator. Usually, we perform IP address tracing to get some sort of evidence. Another possible scenario is when we are faced with a situation where a file is thought to be the evidence and we are required to ascertain that. Sometimes in this case, merely checking its metadata is considered one of the best ways in determining whether it is valid as a piece of evidence. Unfortunately, it is not the case; not even close. I am not saying that those two pieces of information are useless, however if we are to engage professionally, utmost care should be practiced when analyzing them. This short write-up will attempt to cover a little bit about IP address and metadata, seen from a digital forensics perspective.

 

IP Address Tracing

Private vs Public

Let's say we are required to trace an email. Sure, we could quickly analyze its header which could look like this:

IP Addresses in an Email Header 

To note, the above image was googled and chosen out of thousands of sample images of email headers out there. Based on the image above, we could tell that the email was sent from a computer whose private IP address—its local context—is 192.168.1.13. On the other hand, the mail server which is recognized by the public Internet domain has 99.108.173.229 as its IP address. Based on this information, is it permissible for us to conclude just which computer actually sent it? If we assume that the owner of malinikishore1@gmail.com uses only his private computer when sending that email, it is probable that we could find out just exactly which computer is responsible.

Surely it can't be this easy? Yep, not at all. There are a few additional things that we need to clarify if we are trying to find evidences, such as:

  1. Did the owner of that email address actually use his personal computer when sending it out?
  2. What additional information that can be used to verify that no other computers belonging to someone else were used to send that email?
  3. If that particular email was sent from a working or office environment, can we be sure that DHCP isn't used?
  4. As far as human factor goes, what about policies relating to human behavior in that office—is there a policy saying that all employees must be plugged in at the same network port each and every time they want to connect to the office network (which for some reason does not provide Wi-Fi)?
  5. And if Wi-Fi is actually provided in that office, how sure are we that the Wi-Fi router's configuration supports our initial conclusion?

It should be obvious now that knowing IP address can only takes us so far.

Proxy Server

Let's take a brief look at another case example: the use of a proxy server. As we all know, this is an old technology. One of its more famous uses is to circumvent network restrictions when connecting to the public Internet when we are in a restricted office network. Network administrators might use this to actually regulate network connections inside an office network, for example allowing who can access the Internet and who should be denied.

Let's conduct a simple experiment. Before using a proxy server, my public IP address is shown below (I use http://whatismyipaddress.com/):

Non-Proxied IP Address 

That IP address, if not static, can change any day. This is pretty common if we are using portable hotspots.

From the above result, we could tell who is it that I'm using for the Internet service provider. Besides that, my general location—that is, Jakarta—can be clearly seen.

Afterwards, let's say I connect my computer to one the available free proxy servers in China (a sample list can be found at http://proxylist.hidemyass.com/2). What could happen is as follows:

 Proxied IP Address

 

Obviously I was not located in China. So without performing extra verification and without additional corroborating facts, just an IP address is meaningless. If we are hosting an application that audits IP address of every incoming request, then if the visitor was behind a proxy server (especially the non-transparent or anonymous one) we shouldn't be hopeful that we could discern the visitor's actual IP address. Of course, it's a different story when law enforcement officers could use court orders or warrants when investigating and tracing an IP address. However, having that kind of authority is not for everyone in any country, much less between countries.

Additional Stuffs on IP Address

It is therefore important to notice that in IP address tracing, some basic questions are in oder:

  1. Can we ascertain that the recorded IP address is a static or private one?
  2. If that IP address is private and static, can we be sure that only a single person (or a single computer) uses it?
  3. If tracing an IP address only leads to a general location such as the name of a city (whether Jakarta or Ürümqi), are there any other corroborating evidences that can ascertain or pinpoint that the IP address is owned by a specific entity or computer?
  4. Can we be sure that the IP address did not pass through a proxy server? If it did pass through a proxy server, what have we done to uncover the actual originating IP address (if it is even possible)?
  5. Have we considered the case of proxy chaining when determining the validity of any recorded IP address?

Once again, the main point here is that we have be careful in determining or using IP address as evidence in reconstructing a fact in a security incident. IP address still needs to be taken into account for certain, but its context must be absolutely clear.

 

How Much Can We Trust Metadata?

Some IT people will find metadata as a familiar term. Basically, metadata is data about data; that is, information that we could use to find more details about a file (whether a PDF document or a JPEG image). Before we continue, I need to mention that for the purpose of this write-up and to avoid sounding like a complete pedantic, I don't differentiate between data and information.

Okay, back to metadata. I am taking a PDF file for a good example. We know that a PDF file has a collection of metadata that could help us to know, for example, when it was created, by whom, and so forth. The thing is, as professionals, we have to be very careful in depending totally on metadata which should become clear in a bit as to why.

Document Creation Date

More often than not, terminologies such as “Creation date” or “Created” that are populated by detailed timestamps can be confusing, if not misleading altogether. The reason being "creation date" could be different between a human's definition and the computer's. Most would assume that "creation date" means the very first time a document was created, whenever that was. Not necessarily to be the case with a computer though—to a computer, "creation date" could probably the timestamp when the file existed for the very first time in its local system storage. So do not be alarmed if someone were to insist that she had created a certain documen on, say, the 10th of December of 2016 at 17:01 whereas on our personal computer (after we had downloaded the file), the creation date became, say, the 11th of December of 2016, at 18:33. This is normal.

Document Accessed Date

This one is even more confusing, if not useless when there are no supporting or corroborating information. Why is that? Let's pretend we have an anti-virus software that performs automated scans to all of our personal computer files which we stored on our Windows machine. A file's accessed date could be the timestamp when that anti-virus software accessed it during that scanning activity. It is no surprise therefore, that R. Lee (the author of Advanced Digital Forensics and Incident Response and a SANS instructor) said that these kinds of timestamps are not trustworthy. Interesting, isn't it? To know more about this matter, please read this excellent paper from SANS.

Document Modified Date

This aspect of metadata is also suspect. Have you ever opened a Word document (.doc or .docx) and then export it into a PDF, and just about you're closing the Word document, Microsoft Word asks whether you would like to save due to recent "changes"?  Insist that there was no changes all you want, Microsoft Office has its own opinion. This is one of the quirks that could confuse us in analyzing timestamps. Again, be careful.

Modifying Metadata

If timestamp issue isn't enough to give us headache, don't worry, metadata reliability is another fun piece to think about. In short, metadata can be modified. Let's take a PDF file as an example. Let's say I have this PDF file below.

Initial Metadata

Initial Metadata on Author and Application Creator

From its File Properties, we can conclude temporarily that the PDF file:

  1. Came for a Word document named "Test-Document.docx" originally.
  2. Was created "Today, 18:17" (it does not matter when "Today" is for this example).
  3. Was created by "Word" application which was also its "Content Creator" in this regard.

Unfortunately, this kind of information is not something that is immutable. Now let's play around with the PDF's metadata.

Modified Author and Application Creator

It's easy to guess that my intention was to modify the contained information within the PDF file. There is no specific reason as to why I chose those fields in this simple experiment. It must be noted that a tool might have a different behavior in processing a file's metadata, whether it's PDF, JPEG, or any other. What we need to keep in mind here is that in analyzing a file's metadata, it is not as simple or straightforward as we might think.

After I modified the metadata, if we look at the medatadata of that same PDF file we might look at something like this.

Modified Metadata

Now we are seeing as if:

  1. The PDF file was created on January 1st, 2016 at 18:17:33 precisely.
  2. The application used to create the PDF file was “A Brand New Creator” and no longer Microsoft Word.
  3. The author of that PDF file is “A Brand New Author” now.

Hopefully, it's quite clear now that when it comes to metadata, by itself is never adequate as evidence when analyzing a security incident involving files.

Additional Things on Metadata

In the matter of metadata analysis, some basic questions must be first answered if we are trying to be as valid and accurate as possible:

  1. Are there guarantees that the metadata we are seeing is not forged in any way?
  2. By knowing fully that timestamps are problematic, how sure are we that the timestamps in those metadata are actually trustworthy?
  3. Are there any supporting or corroborating information that could aid us in arriving at an unbiased conculsion?

 

End Notes

In this write-up, I want to demonstrate that an analysis of a security incident from a digital forensics perspective must be performed with utmost care, especially when concluding it. The problem is even more exacerbated if we only have limited information such as IP address or metadata. Digital forensics analysis must be a combination and a series of activities to obtain a collection of relevant and chronological information that could aid us in making valid conclusions. In certain cases, where complete information is impossible to procure, we have that professional responsibility to admit and state that a definite conclusion could not be given, except by giving several assumptions or possible scenarios of a security incident.