June is Safari month here in the Sausage Factory and this post is the third in the series. Just imagine having an observation point in the house across the road from your suspect. When the suspect surfs the internet the man in the OP (with the help of a good pair of binoculars) makes notes of what he reads on screen (OK.. he may use a long lens instead of binoculars and take photos but bear with me). Essentially this is exactly what Spotlight does when a user utilises the Safari web browser (versions 3,4 and 5) to view web pages - it writes the URL, Web Page Title and all the text content in the web page into a file.
- These files filenames are in the format URL.webhistory
- Their internal structure is that of a binary plist with three strings to each record Full Page Text, Name and URL
- They are stored at the path ~/Library/Caches/Metadata/Safari/History
- The file created date of these files represents the time that the URL was first visited (since History was last cleared)
- The file modified date represents the time that the URL was last visited
It can be seen that it is possible to deduce information from these files that amounts to internet history and therefore it it may be appropriate to consider this data along with records extracted from history.plist and cache.db files.
Recovery from Unallocated
These files are deleted when a user clears Safari history. However it is possible to recover these files from unallocated. Using my file carver of choice - Digital Detective's Blade I wrote an appropriate Data Recovery Profile (which I will happily share with you upon request)
Running this profile resulted in the recovery of over ten thousand files. I then added the recovered files into Encase as single files. I noticed that a small percentage of these files had the text content stored as ascii and not unicode text. I am at this stage not sure why.
Investigation of Live and Recovered Spotlight Webhistory Files using Encase
If you review these files using Encase you will see in the View (bottom) pane the relevant data -the URL is at the start of the file, followed by the text in unicode and then the webpage title near the end of the file. If the content is relevant reporting on it is a pain -potentially three sweeping bookmarks are required using two different text styles. The unicode text sweeping bookmark is also likely to be truncated due its length. Therefore reviewing any number of these files this way is not a good plan.
The eagle eyed amongst you will have observed that in my Blade Data Recovery Profile I gave the recovered files a plist file extension (as opposed to a webhistory file extension). This because these files have a binary plist structure and I use Simon Key's binary Plist Parser v3.5 enscript to parse them. This excellent enscript allows the option to create a logical evidence file which creates a file for each plist name/value pair. I run the enscript with this option, add the logical evidence file back into my case and the review the contents with just a unicode text style selected and bookmark as appropriate. This method is much quicker and removes the need to mess about with unicode formatting. It also makes keyword searching easier. For example to view all URLs green plate (set include) your logical evidence file, apply a sort to the name column in the table pane, scroll down to cause each URL to appear in turn in the view pane. Use a similar method for the Full Page Text and Name items.
Miscellaneous Information in relation to the webhistory file format
Prior to considering the Plist Parser enscript to parse these files I briefly looked at its format with a view to tempting some programming friends to write me a parser. I established that
- The file is a binary plist. I do not want to too far into the intricacies of how these plists are assembled. We are interested in objects within the object table. Binary plists use marker bytes to indicate object type and size. The objects we are interested in are strings, either ASCII or unicode. Looking at Apple's release of the binary plist format (scroll about a fifth of the way down the page) it can be seen that the Object Format Marker byte for ASCII strings found in this file is in binary 01011111, followed by an integer count byte. In hex these marker bytes as seen in this file are 5Fh 10h. The Object Format Marker byte for unicode strings found in this file is in binary 01101111, followed by an integer count byte. In hex these marker bytes as seen in this file are 6Fh 11h.
- The byte immediately prior to the URL (generally starting http) and after the marker 5Fh 10h decoded as an 8 bit integer denotes the length of the URL. However if the URL is longer than 255 bytes the marker will be 5Fh 11h indicating the following two bytes are used to store the length decoded as 16 bit big endian
- Following the URL there is a marker 6Fh 11h - the next two bytes decoded 16 bit big endian is the number of characters of text extracted from the web page - multiply by 2 to calculate the length of the unicode text element of the record
- Following the unicode text element is a marker 5Fh 10h -the next byte immediately prior to the webpage title decoded as an 8 bit integer denotes the length of the webpage title
- the last four bytes of the file formatted 32 bit big endian is the record size (detailing the number of bytes from the start of the URL to the end of the fifth byte from the end of the file)
Click on image for larger version