Finding Geo

October 4, 2015, 6:20 am

≫ Next: Windows Phone 8.10 MMS (for Lumia 530) ...

≪ Previous: "Awesome" Windows Phone 8 Stuff

Monkey, just keep swimming through the WinPhone data ... ya clown!

UPDATE 6OCT2015: Edited FindMyPhone and Multimedia sections + added suspected main Location setting Registry location.

A couple of recent cases had this monkey investigating how Windows Phone 8.10 stores geolocation data on a Nokia Lumia 530.
There does not appear much forensic documentation regarding this, so this post is going to be a pretty voluminous / potentially narcoleptic episode of squirrel chasing (without any neat scripts to run at the end either). Despite the length of this post, monkey gets the sneaking suspicion that there is more to be discovered. I guess we have to start somewhere ...

Carrier-locked versions of the 530 can be picked up for as little as $50 in Monkeytown so they could be more popular than you'd initially suspect. They also come bundled with Windows Phone 8.10 by default.
Downloading of the 4 GB capacity devices was done via eMMC read using the Z3X-Pro flasher box and took approximately 90 minutes per device. Note: The soldering points for these are not for the banana fisted. The points are so tiny that this monkey needed his big boy soldering pants AND special adult assistance (Thanks Boss Rob!).

For the 530, there are 3 potential storage areas for geolocation data – the Partition 26 (P26) "MainOS" NTFS partition, the Partition 27 (P27) "Data" NTFS partition and the removable SD Card.
The test data came from 2 well used 530 devices (Devices A and B), a factory fresh 530 (Device C) and a simulated test device (Device D). While all four were 8.10 (MajorVersion.MinorVersion) devices, their SYSTEM hive's \Versions\BuildNumber values were not all the same (probably due to being configured for different service providers).
An X-Ways Forensics (XWF) "simultaneous search" for likely geolocation keywords was initially performed (eg "latitude", "lat", "GPS", "GNSS", "degree"). Subsequently, a regex search was also performed using some likely latitude regular expressions (eg1 "-1[2-5]." for "-12." to "-15.". eg2 "-1[2-5]°" for "-12°" to "-15°").
Tip: To type the degree symbol, you hold down the Alt key as you type 248 on the numeric keypad (with numlock ON).
For accuracy purposes, it might also help to know that 1 nautical mile (approximately 1852 m) is 1 minute of a degree (there are 60 minutes to a degree).
Thanks to a Brian Moran (@brianjmoran) tip, we found some information on commonly used latitude/longitude formats here.
To simplify the number of searches, it was also assumed that any textual latitudes will have the corresponding longitude close by. Ideally, there will also be an associated timestamp so we can say that at a certain time, the device was at location X. The XWF regex search technique should find any plain text (UTF8, UTF16-LE) strings which contain relevant latitude information. However, it will not find any latitudes stored as binary data (ie floats/doubles).

The main Location setting (for this phone model) is suspected to be at:
P26:\Windows\system32\config\SOFTWARE\OEM\Nokia\GPS\LocationService which was set to 1 when the main Location setting was enabled and set to 0 when disabled. We're not sure if this is the direct cause or a secondary result of the Location setting. Presumably, this location will vary with non-Nokia devices.

The proliferation of web based location/map services and the availability/storage capacity of P27:/pagefile.sys (which contains the swapped out contents of RAM) has resulted in a large number of readable lat/long pairs. Connecting a lat/long pair to specific phone functionality and/or proving a user’s direct knowledge was/remains a challenge.

To assist with this we have organized the lat/long information in this post into the following categories:
- ObservationLogWP8 (crowd sourced location logs)
- FindMyPhone (location tracking of device)
- Default Internet Explorer Browser
- Cortana (personal assistant)
- Multimedia metadata (from device pictures/video)
- WP8 Application data
- Registry

ObservationLogWP8 (crowd sourced location logs)

This appears to be related to Microsoft’s crowd-sourcing efforts to survey/report WiFi and cell tower information for device location (see here and here). Various UTF8 and UTF16-LE encoded XML fields belonging to a parent "ObservationLogWP8" XML element were found in Device A’s P27:/pagefile.sys. These fields included a timestamp, location, WiFi and Cell Tower signal information. Not all instances of these had latitude/longitude information, some only had timestamped WiFi and cell information (they may have been related to requests for location).
The string "ObservationLogWP8" was also found in a LocationCrowdsource.dll. The LocationCrowdsource.dll also existed in a Nokia 520 running Windows Phone 8.0 suggesting this functionality is not new to Windows Phone 8.10.
From the dates found in test data, it is suspected the initial phone setup is one event that triggers this data being recorded. According to the FAQ mentioned previously, enabled apps can also request the device location which could result in ObservationLogWP8 data being generated.

One of the more complete "ObservationLogWP8" XML elements (from P27:\pagefile.sys) looked like:

<Env Version="1.0"><Body Type="ObservationLogWP8"><LocationData><RequestHeader><Timestamp>2015-12-31T01:23:45.678+12:34</Timestamp><Authorization /><TrackingId>378130cb-e97f-4558-a3ac-123456789ABC </TrackingId><ApplicationId>d002970e-345b-409f-9e22-b360eb83f641</ApplicationId><DeviceProfile ExtendedDeviceInfo="NOKIA/Lumia 530" OSVersion="8.10.14234.WPB_CXE_R1(wpbldlab).20150123-1722" LFVersion="2.0" Platform="" ClientGuid="00000000-0000-0000-0000-000000000000" DeviceType="WP8" DeviceId="d002970e-345b-409f-9e22-123456789ABC" /></RequestHeader><LocationStamps><LocationStamp ts="2015-12-31T01:23:45.678+12:34"><Loc la="-XX.12345" lo="YYY.12345" al="12.00000" spd="5.25000" hed="180.50000" hacc="3" hdop="0.80000" vdop="0.80000" herralong="2" haxis="2" /><CellTowers ts="2015-12-31T01:23:45.678+12:34"><Umts7 mcc="505" mnc="1" lac="12345" ucid="123456789" uarfcn="1234" psc="12" rscp="-100" ecno="-11" /></CellTowers><WifiPoints ts="2015-12-31T01:23:45.678+12:34"><Wifi7 bssid="AA:BB:CC:11:22:33" rssi="-95" /><Wifi7 bssid="AA:BB:CC:11:22:33" rssi="-86" /><Wifi7 bssid="AA:BB:CC:11:22:33" rssi="-95" /> /></WifiPoints></LocationStamp></LocationStamps></LocationData></Body></Env>

Note1: The mcc = country id and mnc = carrier id.
Note2: Note the ApplicationId GUID field in the data which could potentially connect an app to this location data.

For privacy reasons, the above example had the following information modified:
-    Timestamps
-    Various GUIDs (except ApplicationId and ClientGuid)
-    bssid (MAC address of WiFi access points)
-    CellTowers
-    Location/Movement information

If the main Location setting is OFF, ObservationLogWP8 data should not be written (in theory). This was possibly observed when we found that some devices contained hits for "ObservationLogWP8" in LocationCrowdsource.dll but no ObservationLogWP8 XML elements. Also, if an individual application does not have it's Location capability enabled, the ObservationLogWP8 data should not be present for that app.
From the Microsoft "Personal Wi-Fi Access Point Opt-Out" section here:

"If you have a Wi-Fi access point or router and you wish to exclude it from Microsoft’s location positioning database ... you can submit the MAC address to Microsoft’s block list"

This means by default (with a Location enabled device), the device gets to snoop around and map any/all WiFi access points it can. So you'd expect to see more of these ObservationLogWP8 XML instances, the longer a phone is used (assuming the phone moves around).

FindMyPhone (location tracking of device)

Windows Phone 8.10 has the capability to ring/lock/erase/locate a registered device from this website. This capability is not enabled by default and requires the user register their device and phone number with their Microsoft account. The FindMyPhone settings menu has a checkbox for "Save my phone’s location periodically and before the battery runs out to make it easier to find". This is OFF by default. According to the (March 2014) Windows Phone 8 Privacy Statement:

"the location of your phone will be sent periodically to your online account at the My Windows Phone page". It "only stores the last known location of your phone. When a new location is sent, it replaces the previously stored location".

There were various FindMyPhone P26:\Windows\system32\config\SOFTWARE registry entries, FindMyPhone DLL libraries and a FindMyPhone executable present in our devices. The FindMyPhoneRuntimeDll.dll contains what appears to be a print format string for a "MyPhoneOperation" data element. Searching for "MyPhoneOperation" returned hits in Device A at
P27: \Users\WPNETWORK\APPDATA\Local\dcpsvc\StagingFiles\CssV1_1\*FOLDER_GUID*\*FILENAME_GUID*.csd - which contained a timestamp, Latitude, Longitude and possibly Altitude information stored in a .csd file. We are unsure of the significance of the GUIDs in the directory and file names as they varied between phones. For Device A (with FindMyPhone enabled), there were multiple directories and .csd files with two .csd files containing geolocation information. For Device D (with FindMyPhone enabled), there were multiple directories and .csd files but only one .csd contained geolocation information.
Devices B and C did not store this information potentially indicating that the FindMyPhone functionality was not enabled on those devices. The found XML string looked like:

<MyPhoneOperation> <UpdateLocation> <Location>(2015-12-31 01:23:45Z) -XX.123456,YYY.1234,ZZ.000000</Location> </UpdateLocation> </MyPhoneOperation>

Where XX is latitude and YYY is longitude. ZZ is suspected to be altitude. For Device A, the lat/long listed was about 400m W of the suspected location so the third parameter is probably not accuracy. Note the differences in precision (decimal places) between latitude and longitude. So if your device does contain a FindMyPhone location, the accuracy of the position can vary.

Two configurable FindMyPhone Registry settings are located at:
P26:\Windows\system32\config\SOFTWARE\Microsoft\Settings\FindMyPhone\LocationSyncEnabled
(which was set to 1 when "Save my phone's location periodically and before the battery runs out to make it easier to find" is checked. 0 if unchecked (by default)) and
P26:\Windows\system32\config\SOFTWARE\Microsoft\Settings\FindMyPhone\MpnEnabled
(which was set to 1 when "Always use push notifications (not SMS) to send commands and apps to my phone" is checked. 0 if unchecked (by default)).
There were also multiple hits for "FindMyPhone" in various GUID named sub-keys under:
P26:\Windows\system32\config\SOFTWARE\Microsoft\WPTaskScheduler\{*GUID1*}
which suggests the regular running of a process/processes to report back the device’s current location. For example, the "Schedule" entry value under the GUID sub-key contains the strings "ProcessFindMyPhoneCommand", "c:\Programs\FindMyPhone\ShellCommandDispatcher.exe ProcessFindMyPhoneCommand". This was also present on a factory fresh install.
There was also a "Schedule" entry under P26:\Windows\system32\config\SOFTWARE\Microsoft\WPTaskScheduler\{*GUID2*} which contained the strings "FMPLastLocationSyncSchedule", "c:\Programs\FindMyPhone\ShellCommandDispatcher.exe SyncLocation". This one was NOT present on a factory fresh install with FindMyPhone disabled.

Default Internet Explorer Browser

The default Internet Explorer browser provides some potential geolocation data via cookies, webcache and the "GetLocationUsingFingerprintResponse" browser function. The P26:\Windows\system32\config\SOFTWARE\Microsoft\Internet Explorer\Version registry entry value was set to 11.0.0.0 for all test devices.
The browser also has an "Allow access to my location" setting which should affect how much location information is stored/accessed.
Cookies which contained (URL/percent encoded) textual latitude and longitude information were located in randomly named .txt files in P27: \Users\DefApps\APPDATA\INTERNETEXPLORER\INetCookies. The availability and format will vary depending on the website requesting the user’s location and the device's Location settings. Using ESEDatabaseview from NirSoft to view the P27:\Users\DefApps\APPDATA\Local\Microsoft\Windows\WebCache\WebCacheV01.dat tables (eg table "Container_21") can help link a URL to the randomly named cookie file.
URLs with latitude/longitude information were also found in Webcache log files such as P27: \Users\DefApps\APPDATA\Local\Microsoft\Windows\WebCache\V01tmp.log
For example, a (redacted) GoogleMaps directions URL:

http://maps.googleapis.com/maps/api/directions/json?origin=-XX.1234567891234,YYY.123456789123&waypoints=&destination=-XX.1234,YYY.123&mode=d&units=metric&language=en&sensor=true

Note1: XX = latitude and YYY = longitude.
Note2: Notice the precision (number of decimal places) of the latitude differs from the longitude. Similar entries were also observed in WebcacheV01.dat which makes sense if the .log files are being used as transaction logs for the WebcacheV01.dat ESE database.

The "GetLocationUsingFingerprintResponse" browser function appears to be used by Internet Explorer to submit various values (eg cell tower/WiFi signal strengths) to a web based service. The service then sends back the (estimated) device location. Various hits for "GetLocationUsingFingerprintResponse" with close by Latitude, Longitude, Altitude, ServerUtcTime and RadialUncertainty were found in P27:\pagefile.sys.
For example:

<GetLocationUsingFingerprintResponse xmlns="http://inference.location.live.com" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><GetLocationUsingFingerprintResult><ResponseStatus>Success</ResponseStatus><LocationResult><ResolverStatus Status="Success" Source="Internal"/><ResolvedPosition Latitude="-XX.123456" Longitude="YYY.123456" Altitude="0"/><RadialUncertainty>2246</RadialUncertainty><TileResult/><TrackingId>7e8381e9-106c-45f2-8af9-123456789ABC</TrackingId></LocationResult><ExtendedV21Result CrowdSourcingLevel="High" ServerUtcTime="2015-12-31T01:23:45.1234567Z" CollectionType="Wifi|Cell" InferenceType="Wifi|Cell"/></GetLocationUsingFingerprintResult></GetLocationUsingFingerprintResponse>

For further information on "GetLocationUsingFingerprintResponse", see the "Post Script" section in Rudy Huyn’s blog post (in French but Chrome translates it OK-ish).
Also see Chad Tilbury’s (@chadtilbury) series of posts on browser geolocation forensics here.

Cortana (personal assistant)

The Alpha version of Cortana comes installed with the 530. Search for "What does the fox say?" or "Who is Siri?" and prepare to be semi-amused. It appears Cortana requires an active Internet connection so it may not always be enabled by the user if they're on a cheap pre-paid plan. If not enabled, searches are supposed to use Bing instead. Cortana can be voice activated and location aware so you can set it to remind you "When I get home, remind me to spank the monkey". Such behaviour (the reminding, not the spanking) means there could be a register of locations which may help prove the user had prior knowledge of a location.

Most of the information in this section was sourced from Brent Muir's (@bsmuir) recent Windows 10 forensics post here. Not all of the functionality mentioned in that post was applicable to Windows Phone 8.10 but it was still a great resource to have (Thanks to Brent for sharing!).

P27:\Users\WPNETWORK\APPDATA\Local\Graph\WP8LoggedInUser\Me\00000000.ttl
contained various Latitude and Longitude strings (preceded by the string "place."). According to Brent, these types of file contain searched/favourite locations. Two other data sets had .ttl files but they did not contain any latitude/longitude data. It is likely that the recorded position data is dependent on the main Location setting being ON.
An example 00000000.ttl entry looked like:

p:me/place.-XX.123456_YYY.123456 <http://platform.bing.com/dateAccessed> "1601-01-01T00:00:00.000Z"^^s:Date ;
a s:Place ;
b:preferences/favorites.businessType "" ;
b:preferences/favorites.entityId "-XX.123456_YYY.123456" ;
b:preferences/favorites.entityType "http://schema.org/Place" ;
b:preferences/favorites.entityUrl "local_vdpid:\"-7995539123\"" ;
b:preferences/favorites.isBusiness false ;
b:preferences/favorites.originalName "Some Place Banana-ish" ;
s:ShowOnMap true ;
s:address p:me/postalAddress.-XX.123456_YYY.123456 ;
s:dateModified "2015-12-31T01:23:45.123Z"^^s:Date ;
s:displayCoordinates "Point(-XX.123456 YYY.123456)"^^geo:wktLiteral ;
s:geo "Point(0.000000 0.000000)"^^geo:wktLiteral ;
s:mapZoomLevel 18 ;
s:name "Some Place Banana-ish" ;
s:phone "" .

Note: The "dateAccessed" value above has NOT been modified from the original value.

There were other instances of latitude/longitude recorded in this .ttl file but they did not have a timestamp. For example:

p:me/list.recent. -XX.123456_YYY.123456 a rdf:List ;
rdf:first p:me/place.-XX.123456_YYY.123456 ;
s:displayOrder 25 ;
s:memberOf p:me/geoAnnotationCollection.recent .

which appears to be a recent search location.

Similarly, there were also latitude/longitude pairs observed in P27:/pagefile.sys preceded by "place.". For example:

<http://platform.bing.com/persons/me/place. -XX.123456 YYY.123456>

It is unknown if/how this example can be correlated to the previous Cortana .ttl examples but they do look similar.

Grammar files are used by speech recognition to define input speech words/phrases. They could also be used to indicate prior knowledge of a location. For more information, refer to the MSDN reference on "Grammars for Windows Phone 8"here.
Location data was found in Device A grammar files located at P27:\SharedData\Speech\Grammars\0809\PointsOfInterestGrammar.cfp.txt and P27:\SharedData\Speech\Grammars\0809\PointsOfInterest2Grammar.cfp.txt
These grammar files did not exist in Device C (factory fresh phone).
The grammar files contained entries like:

"Monkeytown, BananaState":, http://platform.bing.com/persons/me/place. -XX.123456 YYY.123456”
"home":, http://platform.bing.com/persons/me/place. -XX.123456 YYY.123456

which was followed by a timestamp in the format:
31/12/2015 1:05:45 PM
This timestamp matched the file system File Created date (in UTC) for the parent .txt file.
Unlike Windows 10 for PC, there was no IndexedDB.edb or CortanaCoreDb.dat (geofence) databases found in our test data. However, this may be due to our test data not being setup with any geofences.

Multimedia metadata (from device pictures/video)

Our device created multimedia's file location, naming convention and metadata content was consistent with Det. Cindy Murphy et al's SANS whitepaper on Windows Phone 8. That is, camera created media was found in the phone’s internal memory at P27:\Users\Public\Pictures\Camera Roll.
Additionally, under Device A’s SD Card’s \Pictures\Camera Roll\ directory there were various .mp4 videos whose filename started with WP and a datestamp (eg WP_20150101_001.mp4). Upon further inspection in XWF, there was "Xtra" metadata embedded in each video file which included an ISO timestamp (eg 2015-01-01T01:02:03Z) and location (eg -XX.3456+YYY.4567). This information was also visible using Phil Harvey’s ExifTool.
Also stored under the SD card's \Pictures\Camera Roll\ were various .jpg files whose filename started with WP and a datestamp (eg WP_20150101_002.jpg). Upon further inspection in XWF, there was "EXIF" metadata embedded in each file which included the Phone model (eg "Lumia 530"), Date Original & Date Digitized timestamps (eg 2015:01:01 01:02:03), Latitude (eg 12° 34’ 5.678” S) and Longitude (eg 123° 45’ 5.678” E).
Photos from a different 530 device (media stored in internal phone memory), had similar EXIF data but did not contain GPS Latitude/Longitude information. This is possibly because the user did not enable the main Location setting and/or they did not enable the "Use Location info" in the "Photos" settings.
The following SOFTWARE hive entry value is suspected of configuring camera pictures/video with embedded location data:
P26:\Windows\system32\config\SOFTWARE\Microsoft\Photos\Shared\CameraSettings\EmbedLocation
This entry value was set to 2 for device multimedia files with embedded GPS metadata and it was set to 1 when the "Photos" ... "Use location info" setting was unchecked (no embedded GPS metadata). Presumably, the GPS data in EXIF also requires the main Location setting being enabled.

WP8 Application data

According to Microsoft, there are two location API libraries that Windows Phone 8.1 developers can use – the .NET Location API (for Windows Phone 7.1 and 8) and the Windows Phone Runtime Location API (new to Windows Phone 8 and 8.1).
The .NET Location API uses the System.Device.dll so whenever you see "System.Device.Location" you know the app was using the .NET Location API. See here for further details.
Alternatively, the MSDN Windows Phone Runtime uses the "Windows.Devices.Geolocation" namespace so if you see that string near latitude/longitude information, the Windows Phone Location API is potentially being used.
Both of these libraries are designed so the app can call a "Get Location" function and let the library calculate a position from the available resources (eg Cell, WiFi, GPS). As the library and results of the call are loaded into RAM, pagefile.sys can also potentially contain these app location artifacts.

The Facebook application comes bundled with the 530. Device A was the only data set that had Facebook user data. Various Facebook related location data was found in
P27:\Users\DefApps\APPDATA\Local\Packages\Microsoft.MSFacebook_8wekyb3d8bbwe\LocalState\Log.txt

This file did not exist in Device C (factory fresh) which suggests that Facebook was not used on that device.
Anyway, here's what the Facebook location data looked like:

[Type: Miscellaneous]    [Severity: Unspecified] 2015-12-31T01:23:45: POST: places.setLastLocation?coords{"accuracy":0.0,"altitude":0.0,"altitudeAccuracy":0.0,"heading":0.0,"latitude":-XX.12345,"longitude":YYY.12345,"speed":0.0}
[Type: Navigation]       [Severity: Low]         2015-12-31T01:23:45: Navigated to: Facebook.Views.Places
[Type: Miscellaneous]    [Severity: Unspecified] 2015-12-31T01:23:48: MULTIQUERY GET: Places:SELECT page_id, name, description, latitude, longitude, checkin_count, display_subtext, pic_square, distance(latitude, longitude, "-XX.12345", "YYY.12345") FROM place WHERE distance(latitude, longitude, "-XX.12345", "YYY.12345") < 750 LIMIT 25|Pages:SELECT page_id, name, fan_count, were_here_count, location FROM page WHERE page_id IN (SELECT page_id from #Places)

A potential Facebook "Last login" location was also found at:
P27:\Users\DefApps\APPDATA\Local\Packages\Microsoft.MSFacebook_8wekyb3d8bbwe\Settings\settings.dat and P27:\Users\DefApps\APPDATA\Local\Packages\Microsoft.MSFacebook_8wekyb3d8bbwe\Settings\settings.dat.LOG1
Both setting files contained a string similar to:

"AllowLocationAccess":true,"CacheExpirationInMinutes":10,"ConfirmExit":true,"CurrentUserName":"Randy Monkey","CurrentUserProfilePicUri":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/p200x200/12345678_123456789ABCDEF12_123456789ABCDEF1234_n.jpg?oh=9b1356725bdbc1f30cb193068343123&oe=564DE360&__gda__=1438678992_c2802dc1e45c7998ca39c6f5dcf67484","GetHereNowEnabled":true,"IsVerfified":false,"HasLoggedIn":true,"LastLatitude":-XX.123456789012345,"LastLocationDate":"\/Date(1430562061234)\/","LastLongitude":YYY.12345678901234,"LockScreenState":1,

For Device A, it also appears that the Windows Store app "GPS Voice Navigation" was installed, run and then uninstalled. There were location artifacts left in P27:\pagefile.sys such as:

app://8E116909-CF87-4176-894F-F54434EFB01E/_default#/Protocol?encodedLaunchUri=ms-drive-to%3A%3Fdestination.latitude%3D-XX.12345%26destination.longitude%3DYYY.123456%26destination.name%3DMonkey%2520Central%2520Headquarters

The GUID 8E116909-CF87-4176-894F-F54434EFB01E listed above is an alias for the GPS Voice Navigation app. This can be confirmed by visiting:
windowsphone.com/s?appId=8E116909-CF87-4176-894F-F54434EFB01E

Searching for "ms-drive-to" (as above) and "ms-walk-to" keywords can lead to further location data. According to this, these keywords can be used by an app to request walking/driving directions. So while it may not confirm a device's actual location, it can prove a user looked up directions from/to specific places.

For the "GPS Voice Navigation" app, various location data was found with references to the System.Device.Location namespace. As mentioned previously, this is a .NET Framework library for calculating location via GPS, WiFi, Cell Tower triangulation. Any location strings found near a "System.Device.Location" string could be potential device locations.
So in P27:\pagefile.sys, we found:

<Latitude xmlns="http://schemas.datacontract.org/2004/07/System.Device.Location">-XX.1234</Latitude><Longitude xmlns="http://schemas.datacontract.org/2004/07/System.Device.Location">YYY.123</Longitude> <Speed xmlns="http://schemas.datacontract.org/2004/07/System.Device.Location">NaN</Speed>

Also found in Device A P27:\pagefile.sys was this more comprehensive gem:

<KeyValueOfstringanyType><Key>CurrentPosition</Key><Value xmlns:d3p1="http://schemas.datacontract.org/2004/07/System.Device.Location" i:type="d3p1:GeoPositionOfGeoCoordinate8rbsckdZ"><d3p1:Location><d3p1:Altitude>45</d3p1:Altitude><d3p1:Course>188.3</d3p1:Course><d3p1:HorizontalAccuracy>3</d3p1:HorizontalAccuracy><d3p1:Latitude>-XX.123456789012345</d3p1:Latitude><d3p1:Longitude>YYY.1234567890123</d3p1:Longitude><d3p1:Speed>3.72</d3p1:Speed><d3p1:VerticalAccuracy>3</d3p1:VerticalAccuracy></d3p1:Location><d3p1:Timestamp xmlns:d4p1="http://schemas.datacontract.org/2004/07/System"><d4p1:DateTime>2015-12-31T01:23:45.1234567Z</d4p1:DateTime><d4p1:OffsetMinutes>600</d4p1:OffsetMinutes></d3p1:Timestamp></Value></KeyValueOfstringanyType>

As this only appeared in Device A, it is probably related to the "GPS Voice Navigation" app. This is potentially confirmed by nearby XML elements such as:

<Value xmlns:d3p1="http://schemas.datacontract.org/2004/07/GPS_VN_BL_WP8" i:type="d3p1:Locales">English</Value>

Also observed in Device A P27:/pagefile.sys, was an "EndLocation" keyword used by the "GPS Voice Navigation" app which showed the destination address details. So searching for that keyword could also reveal destinations entered into the app.

A basic app permission analysis was also performed on selected devices using the WP8_AppPerms.py script (see this previous post).
In Device A, There were 7 applications which required the ID_CAP_LOCATION capability (which provides access to device location):
WhatsApp
Facebook Messenger
*Facebook
*Skype
*XBox Video
*LumiaHelpTips_4?
*Nokia Music WP8 client

The apps that we preceded with an asterisk seem to be default apps that are pre-installed with the phone. Notice how even chat apps can require access to the device location. For example, Skype allows the user to "share location" from a chat. Similarly, WhatsApp can also send its location (as an attachment) in a chat.
Other permissions which a location aware app might need include:
ID_CAP_NETWORKING which would be required for accessing network services that can provide location information (eg “GetLocationUsingFingerprintResponse” data for Internet Explorer location queries).
ID_CAP_MAP which allows an app to display a map.
ID_CAP_CELL_API_LOCATION which seems to allow for/help location via Cellular position information.
ID_CAP_SENSORS which provides access to the phone’s accelerometer, compass, gyroscope.

Registry

According to XWF, Device A’s "Free space" (unallocated space) contained some UTF16-LE lat/long hits and a timestamp close to a "LastFoundAt" (ASCII) string.
For example, in close proximity to:

“lat=-XX.123456###long=YYY.123456###time=22/12/2015 01:23:45”

was an "hbin" string and a "LastFoundAt" string. The "hbin" signature suggests that the surrounding data belongs to a registry hive cell. Searching the P26:\Windows\system32\config\SOFTWARE hive resulted in a hit at \OEM\Nokia\NokiaAccessories\Devices\*FF:FF:FF:FF:FF:FF* (Hex string has been redacted) which contained an entry for "LastFoundAt" equal to the lat/long/time string observed.
However, this subkey did not exist in all test devices so this registry value may not always be available for providing a position.

P26:\Windows\system32\config\SOFTWARE\Microsoft\BingSuggests can have PreviousLatitude, PreviousLongitude entry values along with QueryAttemptTime, SuccessfulQueryTime and CacheUpdateTime entry values. It is unclear what the significance is of the Position information relative to Bing. Times appear to be LE 64 bit Number of 100 ns since 1 JAN 1601. The CacheUpdateTime was slightly later than the QueryAttemptTime (which equalled the SuccessfulQueryTime). The Registry Key's "LastWrittenTime" occured AFTER the various timestamp values. The Timestamp values were zeroes for a new phone.

Miscellaneous Squirrel Chasing

While squirrel chasing through the data, we noticed a few non-location related but interesting nuggets ... (as if this post wasn't already long enough!)

P27: \SharedData\Store\PurchaseHistory.xml
contains the purchased App GUID, the app name and the purchase date. If entering the app GUID to the "windowsphone.com/s?appId=" URL does not reveal the app name, try checking this file.

Calendar Reminders are stored in P26:\Windows\system32\config\SOFTWARE\Microsoft\WPTaskScheduler\{*GUID*} entry values. For example, SOFTWARE\Microsoft\WPTaskScheduler\{*GUID*}
could contain a "Schedule" entry with value string containing the strings: "Reminder", "Monkey's New Year" and "All day event 01/01/2015".

As mentioned by Brent Muir, notification messages are archived in a .dat file. For our test devices, this appears to be located at:
P27:\Users\DefApps\APPDATA\Local\Microsoft\Windows\Notifications\appdb.dat
This file was 24 MB in size but sparsely populated.
An example data string from the file looks something like:

<toast launch="app://5B04B775-356B-4AA0-AAF8-6491FFEA5610/Chat?EntryId=00000000213E0000060000000700000000000000&MessageId=000000004A4E0000020000000800000000000001"><visual><binding template="ToastText02"><text id="1">Voicemail Access</text><text id="2">Pls call 321, You have 5 New VoiceMail messages</text></binding></visual><audio src="ms-winsoundevent:legacy-notification.sms"/><backgroundColor>#0</backgroundColor></toast>

Unlike Windows 10 for PC, there was no "TimestampWhenSeen" stored in P26:\Windows\system32\config\SOFTWARE (ie when Notification Center was viewed).

Final Thoughts

Thanks to Boss Rob for patiently helping/waiting for this monkey to obtain/analyse the various phone dumps and allowing us to share our findings with the forensic community.

There is lot of potential geolocation data stored on a Windows Phone 8.10 device but it will depend on the main Location setting being ON and in certain situations, on the app’s required capabilities.
The P27:/pagefile.sys is probably the best place to look for textually encoded latitude/longitude data. However, Internet Explorer cache, app logs, device camera picture/video files and the Registry can also store location data.
It is suspected that other Windows Phone 8.1 makes/models will contain the same geolocation artifacts but this should be tested/verified by the analyst.
For future testing purposes, it is noteworthy that the 530 can be upgraded to Windows 10 for Mobile devices (whenever it officially comes out, under whatever name they decide on).

If you would like to share your thoughts/suggestions or any geolocation artifacts, please leave a comment.

↧

Windows Phone 8.10 MMS (for Lumia 530) ...

December 5, 2015, 5:12 pm

≫ Next: An Initial Peep at Windows 10 Mobile (Lumia 435)

≪ Previous: Finding Geo

Now with attachment info! Catch the excitement!

We recently noticed that while some commercial forensic tools show Windows Phone 8.10 MMS transaction information (eg Date, Phone number), they do not show or list the accompanying MMS file attachments. Welcome masochistic script monkey! May I take your "safe word"?
In Windows Phone 8.10, a user can send an MMS containing a picture (eg camera JPEG, PNG screen capture), a video (eg camera captured MP4), a Contact (via VCARD text) and/or a VoiceNote (via .AMR audio). These attachments are recopied/renamed/stored as weirdly named .dat files in various sub-directories (named "A" to "P") under Data:\SharedData\Comms\Unistore\data\7.
Received MMS are also stored as .dat files under the same location so it isn't immediately obvious what attachments go together and/or which were sent/received.

The main issue is finding a link between the MMS database transaction entries and the actual stored .dat file(s) which were sent/received.
There may be a more comprehensive link yet to be discovered, but the best link we've found so far is via the filesystem "Last Modified" timestamp of the .dat files and a timestamp found in the "Message" table of the store.vol database.
However, because file system modified times can vary by 1-2 seconds when compared to what is listed in the MMS database, we wrote two scripts - one to sort/print the filesystem's .dat attachment files in chronological order (wp8-1-mms-filesort.py) and a separate script (wp8-1-mms.py) to print the store.vol database MMS records in chronological order. The discerning simian analyst can then make the decision about which attachments go with which MMS based on the timestamp information, MMS total size, file types and individual file attachment sizes. Additionally, they can use the calculated SHA256 hash to locate any sent MMS files (received MMS files are initially stored only as .dat files).

The two scripts are available from my GitHub site and have been created/tested on Windows 7 PC running Python 2.7 using test data from a Nokia Lumia 530 running Windows Phone 8.10.

Background

The "store.vol" database is an ESE database located in Data:/USERS/WPCOMMSERVICES/APPDATA/Local/Unistore/ that contains 3 tables of interest for MMS:
- Message (contains metadata of sent/received MMS)
- Recipient (contains sent MMS phone numbers)
- Attachment (contains metadata about sent/received MMS attachments)

These tables have weird combination numeric/letter strings used for their column/field names (eg "0037001f").
Therefore, throughout this blog post we will use alternative monkey monikers (eg Size, Flag, Filename0) to keep things sane.

Here's an overview of how the various tables and .dat files fit together.
Note: Due space constraints, this relationship diagram does NOT include every field from each table.

So, are you seeing anyone right now? Oh ... it's complicated?

Every MMS will have a "Message" table entry. If it's a sent MMS, there will be a corresponding "Recipient" table entry (containing the Destination Phone number and Timestamp3).
Every sent/received MMS Message will have at least 2 "Attachment" table entries - one for a "smil" message layout XML file and one for each attached file (eg one for each picture).
Filename aliases (eg "FOT1234.jpg") for attached Images/Videos/Text/Voicenotes/VCARDs will appear in both the "smil" .dat file and in the corresponding "Attachment" table entry.
Each attachment's content will be stored in a separate .dat file with a similar filesystem "Last Modified" timestamp to the "smil" layout .dat file. This "Last Modified" timestamp will approximately equal the relevant MMS's Timestamp2 value in the "Message" table.

We recommend usingOSForensics"ESEDB Viewer" to view "store.vol" as it seems to have more reliable search functionality (and possibly shows more table columns) than NirSoft's ESEDatabaseView.
It's one thing to view "store.vol" using a dedicated viewer but to script a solution (independent of third party libraries) we have to look at how each field of interest is stored in the raw hex.

For the "Message" table there are 2 variations of MMS record - one for Sent MMS and one for Received MMS.
Each Received MMS "Message" record looks like:

Received MMS format

Sent MMS format

The important "Message" fields are colour highlighted in the diagrams:
- Msgid (Unique id number for each "Message" entry.)
- Flag (Sent (33 decimal) / Unread(0) / Read(1). Draft MMS are stored using a different Message record format which contains "IPM.MSG" (instead of "IPM.MMS") and their Flag is set to 41 decimal. They will not be discussed further in this post.)
- Size (Total MMS size in bytes. This will help when there are multiple file attachments per MMS.)
- Timestamp2 (File system Last Modified time (approximate). For sent MMS, this corresponds to the time of creation/last update and not the time actually sent.)
- Timestamp3 (Sent/Received time common to both "Message" / "Recipient" tables.)
- Phone0/Phone1/Phone2/Phone3 (Phone number for received MMS. These tend to be set to the same value when present. Sent message phone numbers have to be obtained from the "Recipient" table.)

Each "Recipient" table record looks like:

Recipient format

Where X represents a number of bytes that we don't really know/care about. All strings are null-terminated UTF-16-LE encoded. Timestamps are LE 8 byte integers representing the number of 100 ns intervals since 1 JAN 1601 (MS FILETIME). The string "@.SMS" is actually "x40x01x53x00x4dx00x53x00x00x00" in hex.
Each "Recipient" table record should correspond to a sent SMS/MMS. Our ass-umption here is that it's impossible to simultaneously send both SMS and MMS with the same Timestamp3 value. So if we find a "Recipient" table entry which has a Timestamp3 value that also occurs in a "Message" table "Sent Record", we can ass-ume that it is a sent MMS. Sent SMS have an "IPM.SMStext" value set instead of "IPM.MMS" in the "Recipient" table.

The important "Recipient" fields are:
- Msgid (Unique id number for each "Message" entry.)
- Timestamp3 (Sent/Received time common to both "Message" / "Recipient" tables.)
- DestPhone (Destination Phone number string.)

Each "Attachment" table record looks like:

Attachment format

Where X represents a number of bytes that we don't really know/care about. All strings are null-terminated UTF-16-LE encoded.
Filename0 has been observed to start with "<cidText" or "<cidSmil" or "<cidImage" or "<cidVideo" or "<cidAudio" or "<cidVCard". It has also been seen in the form "<123>" - where "<123>" represents a 3 digit number that increases with each "Rowid". MMS using the "<123>" alias format also had the fixed alias value of "<0000>" for the "smil" layout files.
Draft SMS/MMS do not use the "<>" method of enclosing aliases so cannot be found in the same manner.

The important "Attachment" fields are:
- Msgid (Unique id number for each Message entry.)
- Size (Attached file's size in bytes. There can be multiple file attachments per MMS. Adding all the "Attachment" sizes for a given "Msgid" should equal the "Message" table's size for that same "Msgid".)
- Filename0 (Alias enclosed by "<" and ">" characters. We can use it to identify MMS attachment entries (eg "<cidImage_FOT1234.JPG>").)
- Filename1 (Alias used in "smil" files. eg "FOT1234.JPG".)
- Filename2 (If it's a sent MMS, this will be the actual source Filename (eg WP_20151203_001.JPG). If it's a received MMS, this is typically set to the same value as Filename1.)
- Filetype (Description of the type of file.)

The Filetype values have been observed as:
- "application/smil" (for the MMS layout)
- "text/plain" (for the MMS text content)
- "image/jpeg" and "image/png" (for attached camera/screenshot images)
- "audio/amr" (for attached VoiceNotes)
- "video/mp4" (for attached camera videos)
- "text/x-vcard" (for attached Contacts)

It's been briefly mentioned above but the last piece of the banana is that MMS .dat files are stored under "Data:\SharedData\Comms\Unistore\data\7". One .dat file is required for each sent/received file. Additionally, every MMS has a "smil" XML layout file which lists an alias to the attached file(s) (eg <img src="FOT1234.JPG" region="Image"/>). We can also find that alias mentioned in a corresponding "Attachment" table entry for that MMS.

Here's an example "smil" .dat file:

<smil><head><layout><region id="Text" height="50%" width="100%" left="0" top="50%" fit="scroll"/><region id="Image" height="50%" width="100%" left="0" top="0" fit="meet"/></layout></head><body><par dur="5000ms"><img src="FOT1234.JPG" region="Image"/><text src="Text_0.txt" region="Text"/></par></body></smil>

The "text src" consistently seems to be set to "Text_0.txt". During testing we did not try sending multiple images in the same MMS with each image having it's own accompanying text. However, we suspect that each text entity would then get its own "text src" element and unique alias (eg "Text_0.txt", "Text_1.txt"). With our current understanding, determining the order of these multiple texts would be difficult but we can still retrieve the text content.

Note:"Data:\SharedData\Comms\Unistore\data\7" has been observed to also contain Draft MMS (text content) and Received Email attachment .dat files.
Attached files for Draft MMS (eg pics) do not appear to be stored in .dat files under that directory. Editing a Draft MMS should update the "Last Modified" filesystem time on the .dat files.

You might also have noticed that the constant 0x07000000 value appears in each of the above records. Coincidentally(?), 7 is also the rowid corresponding to the "SMS" row in the "Store" table. This kinda makes sense as both SMS and MMS are grouped together under the same Messaging menu on a Windows Phone 8 device.
It seems there is a "Store" table row for each potential store location on the phone (eg Outlook, SMS, ExternalStore, OneDrive etc).

Scripting and Testing

wp8-1-mms-filesort.py
Assuming the analyst has already exported the contents of "Data:\SharedData\Comms\Unistore\data\7" (eg using AccessData FTK Imager), we can write a Python script to create a clickable HTML table (sorted by "Last Modified" time) of .dat attachments. The script can also output the same data to a Tab Separated Variable (TSV) output file for importing into an analysis spreadsheet.

Here's the help for wp8-1-mms-filesort.py:

c:\Python27\python.exe wp8-1-mms-filesort.py
Running wp8-1-mms-filesort.py v2015-11-24

Usage: wp8-1-mms-filesort.py -i inputfiledir -t output.tsv (Optional) -o output.html (Optional)

Options:
-h, --help     show this help message and exit
-i DIRNAME     Input Directory To Be Processed
-t OUTPUTTSV   Output Tab Separated Variable (TSV) filename (Optional)
-o OUTPUTHTML Output HTML filename (Optional)

The script walks through each input directory and looks for filenames ending in "73701.dat". It then retrieves that file's "Last Modified" time and stores the filename and timestamp for later display.
For each of the stored filenames, the script reads the file contents and attempts to calculate what type of file it is, the file size, the SHA256 Hash of the file and if it's a "smil" file, it will list any file aliases used. It then prints the file information sorted chronologically by "Last Modified" time to the command line (SHA256 Hashes are NOT printed to command line) and/or HTML and/or TSV.
By grouping the .dat files by "Last Modified" time, it should make it easier to decide which .dat files belong together.

Here is some redacted example output:

c:\python27\python.exe wp8-1-mms-filesort.py -i 7 -t fsop.tsv -o fsop.html
Running wp8-1-mms-filesort.py v2015-11-24

Parsed 57 files

Mod. Timestamp Filename        Size(bytes)     Type    Comments
... [REDACTED]
2015-12-01T04:02:08     7\c\40000002000000073701.dat    7302    AMR
2015-12-01T04:02:08     7\d\40000003000000073701.dat    186     <smil> aud = P__4987.amr
2015-12-01T04:03:08     7\f\40000005000000073701.dat    893237 MP4
2015-12-01T04:03:08     7\g\40000006000000073701.dat    366     <smil> video =P__CD04.mp4
2015-12-01T04:05:04     7\h\40000007000000073701.dat    104     VCARD
2015-12-01T04:05:04     7\i\40000008000000073701.dat    386     <smil> VCARD present
2015-12-01T04:07:04     7\j\40000009000000073701.dat    366     <smil> video =P__BC3E.mp4
2015-12-01T04:07:04     7\k\4000000a000000073701.dat    899422 MP4
2015-12-01T04:08:20     7\l\4000000b000000073701.dat    186     <smil> aud = P__7C25.amr
2015-12-01T04:08:20     7\m\4000000c000000073701.dat    5702    AMR
2015-12-02T20:30:20     7\n\4000000d000000073701.dat    616     <smil> img = FOTDAD6.JPG, text = Text_0.txt
2015-12-02T20:30:20     7\o\4000000e000000073701.dat    624866 JPEG
2015-12-02T20:30:20     7\p\4000000f000000073701.dat    50      Unknown
2015-12-02T22:26:42     7\b\50000001000000073701.dat    52      Unknown
2015-12-02T22:26:42     7\c\50000002000000073701.dat    616     <smil> img = FOTEB94.jpg, text = Text_0.txt
2015-12-02T22:26:44     7\a\50000000000000073701.dat    571938 JPEG
2015-12-03T02:57:10     7\f\50000005000000073701.dat    38303   PNG
2015-12-03T02:57:10     7\g\50000006000000073701.dat    22      Unknown
2015-12-03T02:57:10     7\h\50000007000000073701.dat    616     <smil> img = FOT348F.png, text = Text_0.txt
2015-12-03T02:59:48     7\i\50000008000000073701.dat    616     <smil> img = FOT3070.png, text = Text_0.txt
2015-12-03T02:59:48     7\j\50000009000000073701.dat    48870   PNG
2015-12-03T02:59:48     7\k\5000000a000000073701.dat    18      Unknown

Note1: Type "Unknown" Types possibly indicate MMS message text files

Note2: Not all .dat files may belong to an MMS message (eg Received Email Attachments, Drafts)

Finished processing MMS .dat files ... Exiting ...

The corresponding output TSV looks like:

Mod. Timestamp    Filename    Size(bytes)    Type    SHA256 Hash    Comments
2015-12-01T04:02:08    7\c\40000002000000073701.dat    7302    AMR    CAE79982FCAD00B09707FB24FAB0D226E54E4B2DB85F926B34166B5DC7D3DBDD
2015-12-01T04:02:08    7\d\40000003000000073701.dat    186    smil    30614CF3267AD6C53B714E528F474B61C47CD10058CCB5838ACA55BA7E69C5C0    aud = P__4987.amr
2015-12-01T04:03:08    7\f\40000005000000073701.dat    893237    MP4    DC8200239F601FA8B14783BD97DDCE55E094CEA5E23D01DD3F4832A6F87A7CE2
2015-12-01T04:03:08    7\g\40000006000000073701.dat    366    smil    68A4069FB3AF78B3C045D830FCF03E964AE043E2276AD059C00A9BBFF30CEB76    video = P__CD04.mp4
2015-12-01T04:05:04    7\h\40000007000000073701.dat    104    VCARD    C1DF9E80CDCB9140D2FFC427B6896BB2E67EF8F989068E6F9CCAF367D947D29B
2015-12-01T04:05:04    7\i\40000008000000073701.dat    386    smil    C56C8A12E8E5E2536161B6AF52C3A20F244603F31CDF637B86B6F4AA74087A91    VCARD present
2015-12-01T04:07:04    7\j\40000009000000073701.dat    366    smil    77AEC99255C3234D2B0A1EAA5DD935D4C694255C6F4A4C821FCF0C62AC43DCE9    video = P__BC3E.mp4
2015-12-01T04:07:04    7\k\4000000a000000073701.dat    899422    MP4    D2DA77B80DC689460829C6B0F554BC0778348EA149DED1242E80787F005958D3
2015-12-01T04:08:20    7\l\4000000b000000073701.dat    186    smil    2EF5F8BEBC9C03665DB666539F1F9CB595CF543C541537E8EB104A81C60659AE    aud = P__7C25.amr
2015-12-01T04:08:20    7\m\4000000c000000073701.dat    5702    AMR    50AD1A3004E4C1541F507B38DB867A9186237612FF312FDF7E2BE4370617F3E6
2015-12-02T20:30:20    7\n\4000000d000000073701.dat    616    smil    AA2382E8BC8DEA7A7176F21C5FD58C4E52E54608404DE437B1C3299AF0206192    img = FOTDAD6.JPG, text = Text_0.txt
2015-12-02T20:30:20    7\o\4000000e000000073701.dat    624866    JPEG    4BA420B740F5C8DEF862690566195CE620C2495406CDD8383362713A2A38E562
2015-12-02T20:30:20    7\p\4000000f000000073701.dat    50    Unknown    E2E39EA85A25B1C02FB287049F1179B51DED480B99896B4E8E70F773BD2D7158
2015-12-02T22:26:42    7\b\50000001000000073701.dat    52    Unknown    7DE34A123DA5F220CD6346F8CE852036C255A4FE4C81C130EB465AC6378193C0
2015-12-02T22:26:42    7\c\50000002000000073701.dat    616    smil    144753D485039158AB10A06D86A6D421983EC12D708E992A8DA408B079917020    img = FOTEB94.jpg, text = Text_0.txt
2015-12-02T22:26:44    7\a\50000000000000073701.dat    571938    JPEG    139A1E26E6D317F1BBC1C36A7B98AFD3875351B6254B4AEB336F684841F23759
2015-12-03T02:57:10    7\f\50000005000000073701.dat    38303    PNG    D4DAA620FCF32FA0198CAE17AD55B3F847243617F034C4A0BB88356853C77662
2015-12-03T02:57:10    7\g\50000006000000073701.dat    22    Unknown    1BDA3B8EF697CC23E5299068695114F67F7B03C0DD7A4BFEB72F553088AB4009
2015-12-03T02:57:10    7\h\50000007000000073701.dat    616    smil    D28199571A7C838C52206E4DA69D4C2A02A44B851B57DF739DBE2E35F8B6C696    img = FOT348F.png, text = Text_0.txt
2015-12-03T02:59:48    7\i\50000008000000073701.dat    616    smil    4E2D452A0190125D1D44062DEB300B3E00FC2FD0F33E55392A12CCFC3925188E    img = FOT3070.png, text = Text_0.txt
2015-12-03T02:59:48    7\j\50000009000000073701.dat    48870    PNG    3360CD0D57820536D86841DCA94453049B6C4B3D95A3D3B25B91B4CD5ABD50E7
2015-12-03T02:59:48    7\k\5000000a000000073701.dat    18    Unknown    094D1EA6F14A289EC08AF1F85CE1B55A2D95937915A67061FFA85DFCD4284B37

Note1: "Unknown" Types possibly indicate MMS message text files
Note2: Not all .dat files may belong to an MMS message (eg Received Email Attachments, Drafts)

Apologies for the Blogger formatting - this is why we didn't print the SHA256 hash to the command line!
And here's what the corresponding HTML output looks like in a web browser:

HTML Output from "wp8-1-mms-filesort.py"

When we click on an HTML file link, we can view that attachment more readily. However, for the HTML links to work, the HTML file must be in the same directory as the extracted "7" directory.

In this example HTML output, we have the .dat files from a sent MMS containing two pictures (saved from the web) and a text set to "Funny? Don't remember that".

Sent MMS with text and 2 pictures

Note: The different timestamp values for .dat files from the same MMS.

When we click on the first row's file link (not circled) and open it via Notepad we see the text:
Funny? Don't remember that

Clicking on the "smil" file's link (circled in orange) and opening it via Notepad looks like:
<smil><head><layout><region id="Text" height="50%" width="100%" left="0" top="50%" fit="scroll"/><region id="Image" height="50%" width="100%" left="0" top="0" fit="meet"/></layout></head><body><par dur="5000ms"><img src="FOT66CB.jpg" region="Image"/></par><par dur="5000ms"><img src="FOT9241.jpg" region="Image"/><text src="Text_0.txt" region="Text"/></par></body></smil>

This tells us that there are 2 images (FOT66CB.jpg, FOT9241.jpg) and a text message associated with this MMS. To determine if this MMS is sent/received and/or the sent/received time and/or the phone number, we can process "store.vol" using the second script (wp8-1-mms.py).

Clicking on the JPEG link (circled in green) displays this picture in the browser:

Indeed!

Clicking on the JPEG link (circled in red) displays this picture in the browser:

Do you Mind?!

Ah, what's the point in blogging if we can't share pictures of toilet trained monkeys?!
Ahem, continuing on ...

wp8-1-mms.py
Assuming the analyst has already exported "Data:/USERS/WPCOMMSERVICES/APPDATA/Local/Unistore/store.vol", we can write another script (wp8-1-mms.py) to list MMS information for correlation with the output of wp8-1-mms-filesort.py. Essentially, wp8-1-mms.py prints out the information for every MMS attachment (sorted by Timestamp2).

Here's the help for wp8-1-mms.py:

c:\Python27\python.exe wp8-1-mms.py
Running wp8-1-mms.py v2015-11-14

Usage: wp8-1-mms.py -s store.vol -o output.tsv(Optional)

Options:
-h, --help show this help message and exit
-s STOREFILE store.vol file
-o OUTPUTFILENAME Output Tab Separated Variable filename (Optional)

Finding all MMS attachments requires searching "store.vol" for a slight variation of the Filename1 alias used in the "smil" file (eg "<cdImage_FOT1234.jpg>" instead of "FOT1234.jpg", "<123>" instead of "cid:123").
Note: It is thought that the "cid:123" style of alias operates in a similar manner to the "FOT1234.jpg" alias but we have not been able to generate the "cid:123" alias during testing.
Fortunately, we can generalise this to a search for a group of letters starting with "<cid" and/or a group of digits enclosed by "<" and ">". Once we have found a list of these hits, we can retrieve that "Attachment" row's values (Filename0, Filename1, Filename2, Type, Size) and store them in a dictionary (creatively called "attachments") keyed by "Msgid" for later use.

Similarly, we can find all "Recipient" table entries by searching for the "@.SMS" pattern mentioned earlier. We store the retrieved row data (Timestamp3, DestPhone) in another dictionary (called "recipients") keyed by "Msgid" for later use.

Next, we search for "Message" table entries by searching for the "IPM.MMS" pattern and store the retrieved data (Timestamp3, Timestamp2, Phone, Flag, Size) in a dictionary (called "mmsdict") keyed by "Msgid". If we don't find a Phone number in the "Message" record (ie Sent MMS), we use the retrieved "Msgid" to obtain the "DestPhone" number from the "recipients" dictionary populated earlier.

Finally, we can iterate through the "mmsdict" for each "Msgid" and retrieve the corresponding "attachments" info - there will usually be multiple attachments for each MMS dict entry.
The output will be printed to the command line (looks a bit cramped) and/or TSV file (optional).

Here's a redacted sample of the wp8-1-mms.py command line output. For brevity, we've edited the output to highlight the corporate fatcat/toilet monkey MMS message sent earlier. The command line output is pretty cramped so we recommend outputting to TSV.

c:\python27\python.exe wp8-1-mms.py -s store.vol -o store-op.tsv
Running wp8-1-mms.py v2015-11-14

Opening store.vol...

Processing Attachment table ...
57 Attachment "<cid" hits found in store.vol

0 Attachment "<d+>" hits found in store.vol

57 Total Attachment hits found in store.vol

Attachment ASIZE ERROR! at offset 0x12004c ... skipping this hit

Attachments sorted by msgid ...
===================================
[REDACTED]
No. Attachments = 4
msgid = 48 : (u'<cidText_0>', u'Text_0.txt', u'Text_0.txt', u'text/plain', 52)
msgid = 48 : (u'<cidSmil>', u'Smil.txt', u'Smil.txt', u'application/smil', 742)
msgid = 48 : (u'<cidImage_FOT66CB.jpg>', u'FOT66CB.jpg', u'Monkey throwing poo by SheriffBean on DeviantArt.jpg', u'image/jpeg', 37397)
msgid = 48 : (u'<cidImage_FOT9241.jpg>', u'FOT9241.jpg', u'Lolcats Funny Pictures Of Cats With Captions.jpg', u'image/jpeg', 34529)

[REDACTED]

Processed/Stored 56 out of 57 Attachment hits

Processing Recipient table ...
55 Recipient hits found in store.vol

[REDACTED]

Recipients sorted by msgid ...
===================================
[REDACTED]
msgid = 42 : ('2015-11-11T19:36:25', u'+12345678900')
msgid = 48 : ('2015-11-15T21:47:53', u'+12345678900')
msgid = 49 : ('2015-11-15T21:51:32', u'+12345678900')

[REDACTED]

Processed/Stored 54 out of 55 Recipient hits

Processing Message table ...
19 IPM.MMS Message hits found in store.vol

MMS sorted by msgid ...
===================================
[REDACTED]
msgid = 48 : ('2015-11-15T21:47:53', '2015-11-15T21:47:52', u'+12345678900', 33, 72720)

[REDACTED]

Processed/Stored 19 out of 19 Message hits

Printing finalized table sorted by Timestamp2 ...
===================================================
Timestamp2 Msgid   Timestamp3 Phone   Flag    TotalSize   Type    Filesize    Filename0   Filename1   Filename2
[REDACTED]
2015-11-15T21:47:52 48 2015-11-15T21:47:53 +12345678900    33 72720   text/plain 52 <cidText_0> Text_0.txt Text_0.txt

2015-11-15T21:47:52 48 2015-11-15T21:47:53 +12345678900    33 72720   application/smil    742 <cidSmil>   Smil.txt    Smil.txt

2015-11-15T21:47:52 48 2015-11-15T21:47:53 +12345678900    33 72720   image/jpeg 37397   <cidImage_FOT66CB.jpg> FOT66CB.jpg Monkey throwing poo by SheriffBean on DeviantArt.jpg

2015-11-15T21:47:52 48 2015-11-15T21:47:53 +12345678900    33 72720   image/jpeg 34529   <cidImage_FOT9241.jpg> FOT9241.jpg Lolcats Funny Pictures Of Cats With Captions.jpg

[REDACTED]
Finished processing store.vol ... Exiting ...

Looking at the "finalized table" results at the end, we can see that each attachment's Timestamp2 values are all equal to 2015-11-15T21:47:52. This does not correspond with our earlier results from wp8-1-mms-filesort.py (where the JPEG files had different "Last Modified" times to the "smil" and text files). Perhaps the Timestamp2 values in store.vol were written before the actual .dat files were written around the 52-53 second boundary?
Anyway, this shows that there can be some discrepancy between the .dat files "Last Modified" times and the Timestamp2 values recorded in "store.vol". Hence the need for a meat based decision maker!
We can also see that we can use the Msgid (eg 48) to group common MMS attachments together.
Timestamp3 is a common value to both "Message" and "Recipient" tables and seems to be the timestamp quoted for sent/received MMS by commercial forensic tools.
The Type value of 33 indicates that this is Sent MMS.
The TotalSize figure of 72720 equals the sum of each attachment's Filesize (52 + 742 + 37397 + 34529 = 72720).
Filename1 is the alias used for each attached file in the "smil" layout file (eg FOT66CB.jpg, FOT9241.jpg).
We can see that for saved pictures which were then sent via MMS, the original filename appears in Filename2 (eg "Monkey throwing poo by SheriffBean on DeviantArt.jpg").
By using the Filename1 alias, Timestamps and individual file sizes, we should be able to match these store.vol results with the previous .dat results.
That is, there are 4 files listed from store.vol (Text_0.txt, Smil.txt, FOT66CB.jpg, FOT9241.jpg) which have corresponding matches in the HTML output table produced by wp8-1-mms-filesort.py.

For a given attachment, you might not find a file with the Filename2 name on a Windows Phone 8.10 device/SD card as it could be a received file and the Filename2 string was actually sourced from the sender.
In this case, the file will only exist as a .dat file (assuming it was not re-saved locally after reception).
Alternatively, after sending an MMS attachment, the user may delete it from the phone or it might have been sourced from an SD card. For a file stored on an SD card/another device, we should be able use the SHA256 hash calculated from the relevant .dat file to help confirm the external file's identity.

And here is the more conveniently formatted TSV output file contents from wp8-1-mms.py ...

Timestamp2    Msgid    Timestamp3    Phone    Flag    TotalSize    Type    Filesize    Filename0    Filename1    Filename2
[REDACTED]
2015-11-15T21:47:52    48    2015-11-15T21:47:53    +12345678900    33    72720    text/plain    52    <cidText_0>    Text_0.txt    Text_0.txt
2015-11-15T21:47:52    48    2015-11-15T21:47:53    +12345678900    33    72720    application/smil    742    <cidSmil>    Smil.txt    Smil.txt
2015-11-15T21:47:52    48    2015-11-15T21:47:53    +12345678900    33    72720    image/jpeg    37397    <cidImage_FOT66CB.jpg>    FOT66CB.jpg    Monkey throwing poo by SheriffBean on DeviantArt.jpg
2015-11-15T21:47:52    48    2015-11-15T21:47:53    +12345678900    33    72720    image/jpeg    34529    <cidImage_FOT9241.jpg>    FOT9241.jpg    Lolcats Funny Pictures Of Cats With Captions.jpg

Blogger formatting strikes again! Basically the TSV output is the same as the command line output but its easier to import into a spreadsheet.

For shiggles, the wp8-1-mms.py script was run against a Lumia 520 / Windows Phone 8.0 store.vol but it seems that the store.vol offsets used are different and so the script did not work as intended.
It is ass-umed this script will work with other Windows Phone 8.10 devices however because Monkey has to believe he didn't waste his time with some one-off scripts ... grrr!

Also please note that on Windows Phone 8 devices, the displayed Messaging timestamps do not list the seconds (only hours/minutes). So while we can say that Timestamp3 is accurate to the minute, we cannot definitively claim its accuracy in seconds.

Some Final Thoughts

The wp8-1-mms.py store.vol script relies on searching the table records for unique markers (eg "IPM.MMS", "@.SMS", "<cid") and then reading/storing the surrounding field values. That's why we can extract data without knowing the entire structure of each database record.
We could have used a third party library to query the store.vol database directly but this means users would have to install that library on their analysis PCs (which are usually isolated from the Internet). For those interested, Jon Glass has blogged about Python coding using Joachim Metz's libesedb and Alberto Solino's Impacket ESE libraries here.

Be aware there may be more .dat files than MMS attachment entries in "Data:\SharedData\Comms\Unistore\data\7" (eg Received email attachments and/or Draft text). Not every .dat file may have a corresponding MMS.

GPS and other EXIF metadata can be present on sent/received images/videos (depending on the sending phone's settings and/or if the original file had embedded metadata). This can help an analyst decide if a picture was originally taken with the target device and/or the time/location of the picture.

This post only looked at allocated MMS - deleted MMS is an area for further research. Upon MMS deletion, "Message" / "Attachment" / "Recipient" entries should be deleted/overwritten from "store.vol". However, ESE .log transaction files and/or pagefile.sys may still contain enough information from deleted MMS to recreate the "store.vol" records. Recovering the .dat file content and linking it to a MMS transaction would be more complicated however.

Sending a Location (via Messaging) does not utilize the "Attachment" table so these cannot be retrieved by the scripts in this post.
For Location messaging, an "IPM.SMStext" record entry is made in the "Message" table and it has a Windows Phone URL string in the content column ("0037001f"). Analysts can browse to that URL and view a map centered on the sent Location. This map also has a timestamp and what appears to be an accuracy radius.
The URL format looks like:
http://www.windowsphone.com/l1/ZZZ

Where ZZZ = 13 character random code. Both of our test samples had 13 character codes starting with "CI" but your mileage may vary.
Theoretically, it should be possible to scan a store.vol (or pagefile.sys) for these types of URL and if found, they can tie a device to a specific location and time (assuming you can prove they were sent SMS).

And so finishes another glorious Windows Phone 8.10 post! For some reason, the phrases "One trick monkey" and "One off scripts" seem to be reverberating in monkey's caffeine addled brain ... Good! ;)

↧

An Initial Peep at Windows 10 Mobile (Lumia 435)

April 25, 2016, 12:10 am

≫ Next: The Chimp That Pimps And An Introduction to e.MMC Flash Memory Forensics

≪ Previous: Windows Phone 8.10 MMS (for Lumia 530) ...

Ooh! Yeah, show me where you keep your store.vol you dirty winphone you!

At first glance, the Windows 10 Mobile GUI looks a lot like Windows Phone 8. This post will focus on some key mobile communication artifacts (Calls, Contacts, SMS, MMS, pictures/video) and hopefully excrete a few noteworthy nuggets (of information!) along the way.

Special Thanks to @TheHexNinja and our resident "Robotic Organic Soldering System" for their assistance in obtaining the test phone and data.
Unfortunately, as it is from a work phone, we cannot share the data (so please don't ask). However, if you have artifacts that you are researching, we may be able to check our limited data set to help confirm your findings.

We started with a lower priced 8 GB Nokia Lumia 435 Dual Sim (RM-1068) which came with Windows Phone 8.1 installed. We then updated to the latest version of Windows Phone 8.1 before upgrading to Windows 10 Mobile. A few days after upgrading, Microsoft released another Windows 10 Mobile update so we updated again to version 10.0.10586.218. The initial Win 10 upgrade process took 2-3 hours and seems to have left the previous Windows Phone 8.1 directories intact but with zero sized files (at least for the files it no longer seems to use).

After populating the phone with a troglodytic hermit's amount of test data, a chip off was performed and the data read into a binary image file.
FTK Imager (free) and X-Ways Forensics (commercial) were then used to view/browse the data.
OSForensics ESEDB Viewer (trial) and Nirsoft's ESEDatabaseView (free) programs were used to view ESE databases. Both of these were recently updated for handling Windows 10 ESE databases.
MS Calculator and DCode (free) were used to translate MS FILETIME values.

Similarly to Windows Phone 8, there were 27 partitions. The last 2 partitions were the only ones larger than 32 MB:
- MainOS [1543 MB]
- Data [5783 MB]
These partitions were formatted to use NTFS.

The phone was not encrypted by default. There is a Device Encryption option in the Settings, System menu and Windows 10 Mobile devices can also be enrolled in a Mobile Device Management scheme (ie remotely enforce IT policies such as data encryption and/or content protection).
There is also a "Find My Phone" device setting to "Locate, ring, lock and erase your device from account.microsoft.com/devices".

By default, there is no PIN set but if you want to encrypt the device, a PIN must be set. A brief look at the SOFTWARE hive (the Registry is still stored under MainOS:\Windows\system32\config) shows that Win10 Mobile does not use the same PIN hashing mechanism as seen in Windows Phone 8 (see Francesco Picasso's post here ) ... D'Oh!

Here's a pic of our lovely Windows 10 Mobile ~~incubator~~ Assistant ... One of them looks a little green to me!

It seems Microsoft/Nokia went for the "You're NOT gonna lose THIS phone!" colouring scheme.

The Lumia 435has 1 GB RAM and 8 GB flash storage (all combined on the one chip). There were no In-Service-Programming ports found, so for a physical acquisition, it was chipoff or nothing.
We inserted our own FAT32 formatted 4 GB SD card along with a GSM SIM card. The default settings are to install to the internal memory but upon inserting the SD card (and restarting), the user is prompted to choose where to save various types of data (either to SD card or internal memory). If Apps are set to install to the SD card, their data is obfuscated/encrypted. Twitter was installed from the MS App Store to the SD card and while some filenames were visible, the actual file contents were not in plain text.
The device ~~is too cheap to~~ does not support Windows Hello biometric (face/fingerprint/iris) unlocking.

SMS and MMS

Microsoft have created a default Messaging app (one app per SIM card) which conveniently aggregates SMS, MMS and Skype chats. These records are stored in a store.vol ESE database under:
Data:\Users\DefApps\APPDATA\Local\comms\UnistoreDB\

Unfortunately, the Messaging app only displays times with a minute resolution. In store.vol however, records are timestamped using 8 byte MS FILETIME (the number of 100 nanosecond intervals since 1JAN1601).

SMS content is viewable from store.vol's "Message" table.
Because this table has tens of columns, it is not yet known how similar it is to Windows Phone 8.1.
Flag values seem consistent between Windows Phone 8 and 10 but if columns have been added/removed, this will affect the offsets between data fields (and hence any previous Windows Phone 8 scripts). Due to time constraints, we haven't been able to confirm if these offsets have changed so any diagrams in this post will purposely leave out offset information.

Here's some diagrams showing the key SMS fields ... For readability, several fields have been omitted (marked by yellow strips).

Received SMS "Message" table record

Sent SMS "Message" table record

Note1: The weird hexadecimal strings above each box is the actual column name as observed in the "Message" table. The names in the boxes are human readable strings made up by this monkey to keep track of things.
Note2: The PHONE fields are blank for Sent SMS

For Sent/Received SMS messages, the "Message" table's "001a001f" column is set to IPM.SMStext.
Other possible values are:
- IPM.SMSText.SIM (possibly SMS stored on SIM)

- IPM.MMS (for MMS)

- IPM.Note (for email)
- IPM.MSG (for chat and drafts)

Draft messages have a JSON encoded text string containing the body text in column "0037001f" (TEXT field).
Draft messages also have their "0e070013" column (FLAG field) set to 44 (decimal).

Potential values for the Message record FLAG field are:
- Sent = 33
- Received unread = 0
- Received read = 1
- Draft = 41

To find the destination phone number for sent SMS/MMS, we can use the Message record's MSGID field to find the corresponding Recipient record ("Message" table's "00010003" column = "Recipient" table's "20040013" column). The Recipient record contains the destination phone number (PHONE field) as seen below.

"Recipient" table record for SMS/MMS

Alternatively, we can match a Message record's timestamp "0e060040" column (FILETIME2) to a Recipient record's "0e060040" (FILETIME) column. This was the method we used in our previous Windows Phone 8 scripts.

We also found XML like strings depicting SMS content in:

Data:/ProgramData/MICROSOFT/SMSROUTER/MessageStore/SmsInterceptStore.db
This ESE database is new in Windows 10 and appears to be related to the SmsRouter service. My Google-Fu was not strong enough to find much on this service. If it does see/store all SMS, then it may provide access to SMS which have been otherwise deleted from store.vol.
OSForensics ESEDB Viewer had difficulty parsing the SmsInterceptStore.db "Messages" table (showed no entries). However, NirSoft ESE DatabaseView returned 1 entry.
Searching SmsInterceptStore.db with WinHex for the UTF16-LE string "MessageType>Text" found 3 instances of SMS versus the one instance returned by Nirsoft ESE DatabaseView. The extra records may have been deleted from the database but still resident in the file.
The SMS contents seems to be complete (ie they are not snippets).

MMS also store records in store.vol's "Message" and "Recipient" tables. Additionally, MMS store attachment information in the "Attachment" table and the attached files and MMS message text are stored under Data:\SharedData\Comms\Unistore\data\7\

We can use the filesystem "Date modified" time of the attachment files to sort and assign attachments to their corresponding MMS.

Here's a diagram showing the MMS relationship (not all fields shown):

The Windows 10 Mobile MMS relationship isn't exactly clear ... Paging Dr Phil!

Every MMS will have a "Message" table entry. If it's a sent MMS, there will be a corresponding "Recipient" table entry (containing the Destination Phone number and FILETIME3).
Every sent/received MMS Message will have at least 2 "Attachment" table entries - one for a "smil" message layout XML file and one for each attached file (eg one for each dick pic attached). If there's any text in the MMS, there will also be a "Text_0.txt" entry.
Each file attachment will be stored in its own .dat file which will have a similar filesystem "Last Modified" timestamp to the "smil" layout .dat file. This "Last Modified" timestamp will approximately equal the relevant MMS's FILETIME2 value (within 1-2 seconds).

So to locate all MMS, we search the "Message" table for any records with the column "001a001f" set to IPM.MMS.
When we find an MMS, we note the MSGID value (column "00010003" value) and the FILETIME2 value (column "0e060040" value).
We take that MSGID value and search for a match in the "Attachment" table's column "20040013".
There should be at least 2 entries - one for the attachment, one for the "Smil.txt" markup file. There may also be one for MMS text.
We can also use the FILETIME2 value to find .dat files with a similar "Last Modified" timestamp under:
Data:\SharedData\Comms\Unistore\data\7\
A bit messy but it seems to work for our limited test data ...
For further details on MMS hunting, check out our previous Windows Phone 8.1 MMS post here.

Contacts

Contact information can be accessed via the default People app. The contact info is stored in store.vol's "Contact" table. The first row entry corresponds to the device owner.
Here's a diagram showing some relevant fields. Its probably incomplete due to our basic test data but it can still give you an indication of the types of data stored. And the duplication ... *whispers* duplication!

"Contact" table record structure (incomplete)

Using a hex editor, we can see that each Contact record ends with the binary sequence 01040000008200E00074C5B7101A82E008. This is the same magic number observed for Windows Phone 8.1 contact records and a good way to way locate Contact records from raw hex (eg from ESE transaction log files or the pagefile at Data:\pagefile.sys).

We also observed 2 pipe separated plain text contact lists stored in:
\Users\DefApps\APPDATA\Local\Packages\Microsoft.People_8wekyb3d8bbwe\LocalState\contacts.txt
and
\Users\DefApps\APPDATA\Local\Packages\Microsoft.People_8wekyb3d8bbwe\LocalState\pcontacts.txt
These contained non-Skype contacts.

Call History

The call history can be accessed via the Phone app (for an overall summary) and/or the People app (per contact call history). There is one Phone app available per SIM.
The source of the call history appears to be the store.vol's "Callhistory"table.
Here's a diagram showing some relevant fields. Unlike other tables, "Callhistory" actually uses human readable column names!

"Callhistory" table record

Inbound Private numbers do not appear in the "RawNumber" or "RawNumberHash" or "ResolvedNumber" columns.
Missed Calls have the same "StartTime" and "EndTime" and their "Type" field set to 2.
The "Line" column contains a GUID (eg {B1776703-738E-437D-B891-44555CEB6669} ) for the phone line which made/received the call. On a device with multiple SIM cards, there may be than one GUID. From previous Windows Phone 8 observations, the example GUID seems constant across several devices and is a good way to locate Callhistory records in hex dumps (GUID string is encoded UTF16-LE).
Our test Skype calls had the "Line" value set to "Microsoft.Messaging_8wekyb3d8bbwe" instead of the GUID.

Pictures / Video

The Data:\Public folder appears to be the default location for saved pictures/video. The Data:\Public folder can contain documents, sound recordings, downloaded files, music, saved pictures/video and ringtones.

We found camera pictures/video under Data:\Users\Public\Pictures\Camera Roll\
eg1 WP_20160414_09_08_08_Pro.jpg
eg2 WP_20160414_09_46_55_Pro.mp4

We were able to save pictures from the MS Edge browser under
Data:\Users\Public\Downloads
and
Data:\Users\Public\Pictures\Saved Pictures

Other possible storage locations for pictures/video include:
- the user's OneDrive cloud storage.
- the user's SD card. From our testing, by default, media files saved to the SD card are not encrypted.
- the OneNote app.

Final Thoughts

Since we dumped our 435, Microsoft has issued another Win 10 Mobile update so the information in this post may have changed.

In addition to store.vol, remember to check any ESE transaction log files (eg USStmp.log) for data which maybe no longer be present in store.vol. This will probably involve using a hex editor to search for various unique strings or values.

Plugging the 435 into a PC via USB will let the PC user see the Data:\Public folder. This can contain documents, sound recordings, downloaded files, Music, saved pictures, saved video and ringtones. However, Windows 7 did not recognise the 435 as a physical drive when (for shiggles) we tried imaging it via FTK Imager over USB.

Also bundled with Windows 10 Mobile are the following apps - OneDrive, Skype, MS Edge Browser, Facebook, Excel, Word, Powerpoint, OneNote and Cortana.
Cortana was not activated for this testing as there were difficulties activating it for our region (Australia).
Skype chats were observed in:
- store.vol
- the Messaging SQLite database at Data:\Users\DefApps\APPDATA\Local\Packages\Microsoft.Messaging_8wekyb3d8bbwe\LocalState\XXX\main.db
- the Skype SQLite database at Data:\Users\DefApps\APPDATA\Local\Packages\Microsoft.SkypeApp_kzf8qxf38zg5c\LocalState\XXX\main.db

Note: XXX= Mixed string containing the User's Microsoft Account ID.

Users don't need the SIM inserted to view Messaging, Outlook, Skype and Facebook apps. They can also view those apps when in Airplane mode although Outlook will not synch so may not display any messages.

For more information on Windows 10 Mobile please read the following:
Windows 10 Mobile security guide
and
Secure boot and device encryption overview

Interestingly, the second reference mentions hardware providers blowing any JTAG fuses before the device leaves the factory. Ahem ... This suggests that retailed Windows 10 Mobile devices will not be readable via JTAG.
Additionally, given the lack of In Service Programming (ISP) ports on the budget level Lumia 435, it is likely that the more expensive Lumias will also not expose any ISP points.
This could mean that chipoff may be the only option for physical acquisition of Windows 10 Mobile Lumias.
But don't take this Sky-Is-Falling Monkey's word - have a go yourself and let the community know if you find an alternative to chipoff.

Overall, it's nice that most of the artifacts we looked at have not changed THAT much since Windows Phone 8. The store.vol file is still the key artifact especially since they amalgated the Windows Phone 8 "Phone" database into it for Windows 10 Mobile. I'm thinking it shouldn't be too difficult to update our Windows Phone 8 scripts to handle Windows 10. Ah yes, "shouldn't be" ... methinks getting comprehensive test data will be the main bottleneck - especially if chipoff is required.

↧

The Chimp That Pimps And An Introduction to e.MMC Flash Memory Forensics

May 14, 2016, 3:56 pm

≫ Next: Panel Beaten Monkey

≪ Previous: An Initial Peep at Windows 10 Mobile (Lumia 435)

Pimpin Ain't Easy?

SANS is offering the top 3 referrers to its DFIR Summit 2016 website, an Amazon Echo smart speaker.
As of 11 May 2016, this Chimpy McPimpy was number 5 on the list.
Chimpy would very much like to win an Echo (echo, echo) so he can take it apart and share what forensic artifacts are left on the device.

The Echo is a smart speaker that can listen out for voice commands, play music, search the Internet and control Internet Of ~~Shitty~~ Things. Apparently, more than 3 million have been sold in the US since 2014.

Here's a (pretty meh) Superbowl commercial demonstrating some of the Echo's capabilities:

And here's the Wikipedia entry for the Amazon Echo just so monkey doesn't have to regurgitate any further (I already have enough body image issues).

The folks at Champlain College have also recently blogged about their Amazon Echo forensic research (here, here and here).
They have a report due out this month (May 2016).
From what this monkey can ascertain, their research focuses on network captures and the Amazon Echo Android App side of things. They also mentioned looking into "chipping off" the device but I'm not sure if this was a core part of their research as it wasn't mentioned in later posts.

So Monkey is proposing this - (if you haven't already) please followthis link to the SANS DFIR Summit website and if monkey manages to win an Amazon Echo, he will blog about getting to that sweet, sweet, echoey data from the internal Flash memory. See here and here for some background on Flash memory.

How do we know it uses Flash memory?
The awesome folks at ifixit.com have already performed a teardown which you can see here.

From ifixit.com's picture of the logic board (below), we notice the Flash memory component bearing the text SanDisk SDIN7DP2-4G (highlighted in yellow).

Amazon Echo's Logic board

Searching for the Flash storage component(s) on most devices (eg phones, tablets, GPS, answering machines, voice recorders) starts with Googling the various integrated circuit (IC) chip identifiers. The Flash memory component is normally located adjacent to the CPU (minimizes interference/timing issues).
In this case, the ifixit.com peeps have helpfullyidentified/provided a link to the 4 GB SanDisk Flash memory chip.
But if we didn't have that link, we would try Googling for "SanDisk SDIN7DP2-4G" and/or "SanDisk SDIN7DP2-4G +datasheet" to find out what type of IC it was.
According to this link - for the 4th quarter of 2015, Samsung's NAND revenue (33.8%) led Toshiba (18.6%), SanDisk (15.8%), Micron (13.9%), SK Hynix (10.1%) and Intel (8%). Other (smaller) manufacturers such as Phison, Sony, Spansion were not mentioned. Not sure how accurate these figures are but if you see one of these manufacturers logos/name on a chip, you have probably found a NAND memory chip of some kind (eg Flash, RAM).

Anyhoo, from the link that ifixit.com provided we can see the following text:

SDIN7DP2-4G,153FBGA 11.5X13 e.MMC 4.51

Here's what it all means:
153 FBGA (Fine pitched, Ball Grid Array) means there are 153 pin pads arranged in a standard way.
The 11.5X13 refers to the chips dimensions in millimetres.
The e.MMC 4.51 tells us the chip adheres to the Embedded Multi-Media Card (e.MMC) standard (version 4.51) for NAND Flash chip interfacing. We will discuss the e.MMC standard a little further on.

To double check ifixit.com's data link, we did some Googling and found this link which seems to confirm from multiple sites that the SanDisk Flash chip is 153 FBGA and 11.5 x 13.
Ideally, we would have found the actual datasheet from SanDisk but sometimes you just gotta make do ...

It is also worth noting that not all Flash memory chips are e.MMC compatible. Some devices may use their own proprietary NAND interface. Some chips might be NOR Flash (eg Boot ROM) and thus not really relevant to our quest for user data.
Additionally, the latest Flash memory chips may follow a newer (faster, duplex) standard called Universal Flash Storage (UFS). See here for more details on UFS.
So while it appears the days of e.MMC chips are numbered, there's still a LOT of e.MMC storage devices out there that can be potentially read.

When reading Flash storage for forensics, some key considerations are:
- Does it follow the e.MMC standard?
- Chip pin arrangement (number of pins and spacing)
- Chip dimensions (typically in mm)

The e.MMC standard is used by Flash memory chip manufacturers to provide a common infrastructure / command set for communicating. This way a board manufacturer can (hopefully) substitute one brand of eMMC chip with another brand (probably cheaper) of the same capacity. The standard focuses on the external eMMC chip interfacing and not the internal NAND implementation (which would be manufacturer specific). Having a e.MMC Flash chip makes reading a whole lot easier.

But don't just listen to me, JEDEC - the folks responsible for the eMMC standard (and UFS), state :

"Designed for a wide range of applications in consumer electronics, mobile phones, handheld computers, navigational systems and other industrial uses, e.MMC is an embedded non-volatile memory system, comprised of both flash memory and a flash memory controller, which simplifies the application interface design and frees the host processor from low-level flash memory management. This benefits product developers by simplifying the non-volatile memory interface design and qualification process – resulting in a reduction in time-to-market as well as facilitating support for future flash device offerings. Small BGA package sizes and low power consumption make e.MMC a viable, low-cost memory solution for mobile and other space-constrained products."

To get a copy of the e.MMC standard (free registration required), check out this link.

The e.MMC standard document provides this helpful diagram:

JEDEC e.MMC Electrical Standard v5.1

From this we can see that a "Device controller" handles any interfacing with the actual NAND storage ("Memory Array"). This includes things like reading/writing to NAND, paging, TRIM, error correction, password protection.

There are 4 signals/pins required when reading an e.MMC memory:
- CLK = Synchronizes the signals between the e.MMC chip and the "Host Controller" (ie CPU of device)
- CMD = For issuing commands/receiving command replies from/to the "Host Controller"
- DATA0 = For receiving the data at the "Host Controller"
- VCC / VCCQ = Power for the NAND memory / Power to the Device Controller. In some cases, this can be the same voltage (1.8 V)
- GND / VSS = Ground

It is not a co-incidence that these connections are also required for In-System Programming (ISP) Forensics. But that is probably a topic more suitable for a Part 2 (hint, hint).

We can see these pins labelled in this ForensicsWiki diagram of a BGA 153 e.MMC chip

BGA-153 Layout

Note: ForensicsWiki have labelled it as BGA169 but it does not show the extra 16 (typically unused) pins. Count the number of pins (I dare you!) - there's only 153. At any rate, our target SanDisk chip should look like the BGA153 diagram above. Most of the pins are unused / irrelevant for our reading purposes.
The ever helpful GSMhosting site shows us what a full BGA 169 looks like:

BGA-169 Layout - the extra 16 pins comprise the 2 arcs above/below the concentric squares

Other pin arrangements we've seen include BGA162/186 and BGA/eMCP221. Some Flash chips are combined in the same package as the RAM. These are called eMCP (Multi-Chip Package).
Control-F Digital Forensics have blogged an example list which matches some common devices with their e.MMC pin arrangement/size. They also note that the pitch (spacing between pins) for the previously mentioned layouts is 0.5 mm.

So here's what BGA-162 looks like:

BGA-162 Layout (Source: http://forum.gsmhosting.com/vbb/11016505-post9.html)

Anda BGA/eMCP221 looks like:

BGA/e.MCP221 Layout (Source: http://forum.gsmhosting.com/vbb/11260019-post6.html)

Final Thoughts

Due to e.MMC standardisation, reading the data off an e.MMC Flash chip should be straight forward and repeatable - which is great for forensics. Interpreting the subsequent data dump artifacts is usually a more challenging task.
The e.MMC Flash memory content discussed in this post applies equally to Smartphones, Tablets etc.

UPDATE: For even more details on Flash Memory Forensics, check out the following papers:
Forensic Data Recovery from Flash Memory
By Marcel Breeuwsma, Martien de Jongh, Coert Klaver, Ronald van der Knijff and Mark Roeloffs
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007

and

Theory and practice of flash memory mobile forensics (2009)
By Salvatore Fiorillo
Edith Cowan University, Western Australia

The paper by Breeuwsma et al. is probably THE paper on Flash memory Forensics.

Please don't forget to click on this link so Monkey can get his ~~Precious~~ Amazon Echo. You might like to do it from a VM if you're worried about security.
If, for whatever reason, monkey doesn't get an Echo - it's no big deal. Just thought it would make for an interesting exercise as we head towards the Internet of Lazy Fatties ... At the very least, we have learnt more about performing e.MMC Flash memory forensics.

In other news, in June 2016, this monkey will be:
- Attending his first SANS DFIR Summit
- Speaking on a "Innovation in Mobile Forensics" panel with Cindy Murphy, Heather Mahalik , Andrew Hoog and Chris Crowley. Monkey is still pinching himself about joining the collective brain power of that panel *GULP*
- Facilitating/Rockin' the Red Apron for SANS FOR585 Advanced Smartphone Forensics with Cindy Murphy (just after the DFIR Summit)

So if you see me around (probably hiding behind/near Cindy or Mari DeGrazia), feel free to say hello and let us know if this blog site has helped you ... I promise I'll try not to fling too much shit (while you're facing me anyway. Hint: Keep eye contact at all times!).

As always, please feel free to leave feedback regarding this post in the comments section below.

↧

Panel Beaten Monkey

July 3, 2016, 2:51 pm

≫ Next: A Timestamp Seeking Monkey Dives Into Android Gallery Imgcache

≪ Previous: The Chimp That Pimps And An Introduction to e.MMC Flash Memory Forensics

FYI: A "Panel Beater" = Auto body mechanic in Monkeytown-ese

This Monkey was recently invited to ~~shit himself~~ sit on a SANS DFIR Summit panel discussing Innovation in Mobile Forensics with an All-Star cast of Andrew Hoog, Heather Mahalik, Cindy Murphy and Chris Crowley. While it rated well with the audience, personally (because its all about THIS monkey!) - it seemed that whenever I thought of something relevant, another panel member chirped up with a similar idea and/or the discussion moved on to the next question.
I felt it was kinda difficult to contribute something meaningful yet concise in a 30 second sound bite. Especially for my first open question speaking gig.
Monkey might need to decrease his deferential politeness and/or increase his use of assertive poo flinging in future panel discussions. Alternative suggestions are also welcome in the comments :)

Here's the synopsis of the panel from the DFIR Summit Program ...

Puzzle Solving and Science: The Secret Sauce of Innovation in Mobile Forensics
In today’s world, technology (especially mobile device technology) moves at a much faster pace than any of us can keep up with, and available training and research doesn’t always address the problems we encounter. As forensic examiners we face the daily challenges of new apps, new, updated and obscure operating systems, malware, secure apps, pass code and password protected phones, encoding and encryption problems, new artifacts, and broken hardware in order to obtain the evidence we need in a legally defensible and forensically sound manner. In this session, learn from consistent and experienced innovators in the mobile forensics field the tips, tricks, and mindset that they bring to bear on the toughest problems and how to move beyond cookie cutter forensics towards an approach that allows you to successfully solve and own problems others might consider too hard to even try.

Anyhoo, the initial concept was to have several one word themed slides and discuss how these traits can help with innovation in mobile forensics.
Due to a panel format change, the original slides didn't get much play time so monkey thought he'd run through them now and present his thoughts with a focus on advice for those newer to mobile forensics. Some of the points made here may have been mentioned during the panel by other speakers but at least here I have time to elaborate and present my point of view. Bonus huh?

Now let's meet the panel ... Can you tell that we went for a superhero introductory theme?

Heather Mahalik!

Cindy Murphy!

Chris Crowley!

Andrew Hoog!

Cheeky4n6monkey!

And now onto the rest of the slides ...

Curiosity

This is what attracts most of us to forensics. How does "Stuff" work and given a set of resultant data, how can we reconstruct what happened?
Documenting your curiosity (via blog post, white paper, journal article) is a great way of both sharing knowledge with the community and demonstrating your ability to research and think independently.
In mobile forensics, curiosity will usually lead to hex diving especially when hunting for new artifacts.
Curiosity naturally leads to "Squirrel chasing" where one interesting artifact can lead you to many others. So you might start out with one focus and end up discovering a bunch of cool artifacts.

Creativity

Our ability to create solutions depends on our paint set. The wider array of skills you have as a mobile forensic examiner, the more creative you can be - especially as mobile devices are a combination of both hardware and software.
For inspiration, background knowledge and anticipating future trends, read research papers, blogs, books, patents, mobile device service manuals/schematics and industry standards (eg eMMC JEDEC standard). Knowing the background details today will help you analyze tomorrow's device.
Start with a popular make/model and learn how a device works. Go to ifixit.com and the FCC website for pictures of device breakdowns. Read up on how eMMC Flash memory devices work. You don't have to be able to MacGyver a mobile device on a desert island but familiarize yourself with the fundamental concepts (eg eMMC memory has a NAND Controller acting as the interface to the actual NAND memory).
Look at how an SQLite database is structured. Most apps rely on these types of databases to store their data. The official website is a great place to start.
Develop/practise skills in soldering, chipoff, network forensics, malware reverse engineering, scripting for artifacts.
Know how to find/make/use automated tools. Tools can be used as intended/documented (eg NetworkMiner to read .pcaps) or in more novel ways (eg use an Android emulator to create app artifacts and save on rooting test devices/acquisition time).

Scientific Method

As mobile devices change (use of devices, underlying hardware, encryption, new apps/OS artifacts) we need to be able to record our observations in a structured, repeatable way and be able to communicate our findings to others.
The best way is to create your own data on a test device using a documented set of known actions. As Adam Savage from Mythbusters says: "Remember, kids, the only difference between screwing around and science is writing it down".
Also, as Mari Degrazia (and Meoware Kitty) showed us at the DFIR Summit, you should also "Trust But Verify" your tools.

Perseverance

Don't let failure discourage you if/when it comes.
You may need to use a different technique or change your assumptions. Or wait for new developments by someone else and revisit.
There may be more than one solution. Evaluate which is better or worse. The faster method is not always the most comprehensive.
You are not alone. Chances are someone else in the community may have the keys to your problem. Ask around Twitter, forensic forums and your professional network.

Teamwork/Collaboration

No one monkey knows ALL THE THINGS.
I find it helpful to email a trusted group of mobile forensic gurus and describe what I am seeing. Even if they are not able to help directly, it forces me to structure my thinking and help me question my approach.
Having a trusted group you can bounce ideas/findings off helps both yourself and potentially everyone in the group who may not have the time to otherwise investigate. The increased pool of experience and potential access to more varied test data are added bonuses as well. There is also an inherent double checking of your analysis.
Communicate your ideas often. Even if you start feeling like a spam monkey, realize that people can come up with amazing ideas/suggestions when prompted with the right stimulus.
Share your innovation with the community - they may be able to help you improve it and/or adapt it for another purpose that you never would have thought of.

Choose your team wisely though. There are some "One way transaction" types who you can help and then never hear from again. Be aware that it is a small community and word does get around about potential time wasters/bullshitters.
Alternatively, you might be contacted by some rude farker after some free advice/labour - eg "You seem like you know what you are doing. Here's my problem ..."
Realize that being polite/considerate goes a long way to building the required level of trust. Recognize that you are probably asking someone to give up their free time for your cause.
Give team mates a default "opt out" of receiving your spam. For example, "If you wish to keep receiving these types of emails, please let me know. Otherwise, Thankyou for your time." and if you don't hear back, stop sending shit. Most people in forensics will be keen to discover new artifacts/research but be sure to try to organize your thoughts before hitting send.

Manage people's expectations. If you don't know or are not sure - it is better to under promise and over deliver later. Don't feel bad about saying "I don't know" or "I'm currently working on other things and don't have the time right now".

Luck

I believe that you can make your own "Luck" through being prepared when the opportunity presents itself.
For example, I had difficulties landing a forensics job after finishing my graduate studies in Forensic Computing. The market here in Monkeytown was relatively small compared to the US.
Through personal research projects that I blogged about and multiple US internships, I was able to land a rare and Monkeytown based forensic research dream job for which I am still counting my blessings. Having a documented prior body of work helped make the recruitment process so much easier (it also helped that there were technical people in charge of the recruiting).
Pure forensic research jobs seem to be rare in this industry - most positions seems to require a significant element of case work/billable hours. So I really appreciate the ability to pick an area or device and "research the shit out out of it".

On the other hand, occasionally in a case, you can have some plain old good fortune such as when Cindy Murphy and I were looking at a Windows Phone 8 device and we found an SMS stating "Da Code is ..." (which ended up being the PIN code for the phone).

Questions?

I just included this slide because I think it was one of my better 'toons in the slide deck :)

Final Thoughts

Physical fitness and rest are also important factors in staying creative. In the past, I've had some difficulties sleeping which obviously had an adverse affect on my work. A light regimen of regular exercise (15 minutes x 3 times per week) on the stationary bike has worked wonders on my tiredness levels and aerobic fitness. The paunch still remains a work in progress however ;)
For those interested, check out Dr Michael Mosely and Peta Bee's excellent research book on High Intensity Training (HIT) called FastExercise. It shows how you don't have to spend a huge amount of time at the gym to start seeing some immediate health benefits.

So long as you remain committed to learning, the innovation will come. Don't sweat about the non creative periods.

Learning to script is a good way of forcing you to understand how data is stored at the binary level. Python is a popular choice in forensics for its readability, many existing code libraries and large user base.

A library of "most likely to be encountered" test devices can help you to create before/after reference data sets to validate your research. These may be sourced privately from online (eg eBay) or from previous cases.

When public speaking, I have to learn to project my voice more. Elgin from the SANS AV crew kindly took the time after the panel to advise me to speak more from the diaphragm in the future. Concrete feedback like this is the best way to improve my speaking ability. Having said that, maybe monkey also needs to dose up on the caffeine before the next panel so he can react quicker/with more urgency. I'm guessing experience is the best teacher though.

The 2016 SANS DFIR Summit Presentation Slides are now available from here. Get them while they're hot!

Special Thanks to Jennifer Santiago (Director of Content Development & SANS Summit Speaker Wrangler) for her patience in dealing with this first time speaker/panellist.
Special Thanks also to my fellow panellists Andrew, Chris, Cindy and Heather for welcoming this monkey as a peer rather than a curiosity.

Not to get all heavy and philosophical on you but I found this quote that pretty much sums up my thoughts on innovation. It is from Nguyen Quyen who apparently was a Vietnamese Anti-French Colonist from the early part of the 20th Century. Ain't Google great?

"Successful innovation is not a single breakthrough. It is not a sprint. It is not an event for the solo runner. Successful innovation is a team sport, it's a relay race."

Good luck quoting that on a panel and not sounding like a complete wanker though ;)

If anyone has some suggestions for how I can improve my panel talking skills or would like to share some tips on innovation in mobile forensics, please leave a comment. Thanks!

↧

A Timestamp Seeking Monkey Dives Into Android Gallery Imgcache

July 28, 2016, 11:54 pm

≫ Next: Google S2 Mapping Scripts

≪ Previous: Panel Beaten Monkey


Are you sure?! Those waters look pretty turdy ...

UPDATE 4AUG2016: Added video thumbnail imgcache findings and modified version of script for binary timestamps.

Did you know that an Android device can cache images previously viewed with the stock Gallery3D app?
These cached images can occur in multiple locations throughout the cache files. Their apparent purpose is to speed up Gallery loading times.
If a user views an image and then deletes the original picture, an analyst may still be able to recover a copy of the viewed image from the cache. Admittedly, the cached images will not be as high a quality as the original, but they can still be surprisingly detailed. And if the pictures no longer exist elsewhere on the filesystem - "That'll do monkey, that'll do ..."

The WeAre4n6 blog has already documented their observations about Android imgcache here.
So why are we re-visiting this?
We were asked to see if there was any additional timestamp or ordering information in the cached pictures. If a device camera picture only exists in the Gallery cache, it won't have the typical YYYYMMDD_HHMMSS.JPG filename. Instead, it will be embedded in a cache file with a proprietary structure and will need to be carved out. These embedded cached JPGs do not have any embedded metadata (eg EXIF).
An unnamed commercial phone forensics tool will carve the cached pictures out but it currently does not extract any timestamp information.

Smells like an opportunity for some monkey style R&D eh?
Or was that just Papa Monkey's flatulence striking again? An all banana diet can be so bittersweet :D

Special Thanks to:
- Terry Olson for posting this question about the Gallery3D imgcache on the Forensic Focus Forum and then kindly sharing a research document detailing some imgcache structures.
- Jason Eddy and Jeremy Dupuis who Terry acknowledged as the source of the research document.
- LSB and Rob (@TheHexNinja) for their help and advice in researching the imgcache.
- Cindy Murphy (@CindyMurph) for sharing her recollections of a case involving imgcache and listening to this monkey crap on.
- JoAnn Gibb for her suggestions and also listening to this monkey crap on.

Our main test devices were a Samsung Galaxy Note 4 (SM-910G) and a Galaxy Note 4 Edge (SM-915G) both running Android 5.1.1.

Our initial focus was the following cache file:
userdata:/media/0/Android/data/com.sec.android.gallery3d/cache/imgcache.0

After an image is viewed fullscreen in the Gallery app, imgcache.0 appears to be populated with the viewed picture plus six (sometimes less) other images. It is suspected the other cached pictures are chosen based on the display order of the parent gallery and will be taken from before/after the viewed image. If a picture is found in this cache file, it is likely that the user would have seen it (either from the parent gallery view or when they viewed it fullscreen).
From our testing, this file contains the largest sized cached images. From the filesystem last modified times and file sizes, it is suspected that when the imgcache.0 file reaches a certain size, it gets renamed to imgcache.1 and newly viewed images then start populating imagecache.0. Due to time constraints, we did not test for this rollover behaviour. By default, the initial imgcache.0 and imgcache.1 files appear to be 4 bytes long.

Also in the directory were mini.0 and micro.0 cache files which contained smaller cached images. Similarly to imgcache.0, these files also had .1 files.

mini.0 contains the smallest sized, square clipped, thumbnail versions of the cached images. They appear to be similar to the images displayed from the Gallery preview list that is shown when the user long presses on a fullscreen viewed Gallery image.

micro.0 contains non-clipped images which are smaller versions of the images in imgcache.0 but larger in size than the images in mini.0. These appear to be populated when the user views a gallery of pictures. Launching the Gallery app can be enough to populate this cache (likely depends on the default Gallery app view setting).

imgcache.0 has been observed to contain a different number of images to mini.0 or micro.0. It is suspected this is due to how the images were viewed/previewed from within the Gallery app.

Other files were observed in the cache directory but their purpose remains unknown. eg imgcache.idx, micro.idx, mini.idx were all compromised mainly of zeroed data.

UPDATE 4AUG2016:
A device video was also created/saved on the test device and displayed via the Gallery app. A corresponding video thumbnail was consequently cached in the imgcache.0, mini.0 and micro.0 files. These video cache records were written in a slightly different format to the picture cache records.

The imgcache structure

Based on the supplied research document and test device observations, here's the record structure we observed for each Galaxy Note 4 “imgcache.0” picture record:

Record Size (4 Byte LE Integer) = Size in bytes from start of this field until just after the end of the JPG
Item Path String (UTF16-LE String) = eg /local/image/item/
Index Number (UTF16-LE String) = eg 44
+ separator (UTF16-LE String) = eg +
Unix Timestamp Seconds (UTF16-LE String) = eg 1469075274
+ separator (UTF16-LE String) = eg +
Unknown Number String (UTF16-LE String) = eg 1
Cached JPG (Binary) = starts with 0xFFD8 ... ends with 0xFFD9

The cached JPG is a smaller version of the original picture.
The Unix Timestamp Seconds is referenced to UTC and should be adjusted for local time. We can use a program like DCode to translate it into a human readable format (eg 1469075274 = Thu, 21 July 2016 04:27:54. UTC).
The Index Number seems to increase for each new picture added to the cache and may help determine the order in which the picture was viewed.

There are typically 19 bytes between each imgcache.0 record. However, the first record in imgcache.0 usually has 20 bytes before the first record’s 4 byte Record Size.
The record structure shown above was also observed to be re-used in the “micro” and “mini” cache files.

UPDATE 4AUG2016:
Here's the record structure we observed for each Galaxy Note 4 “imgcache.0” video thumbnail record:

Record Size (4 Byte LE Integer) = Size in bytes from start of this field until just after the end of the JPG
Item Path String (UTF16-LE String) = eg /local/video/item/
Index Number (UTF16-LE String) = eg 44
+ separator (UTF16-LE String) = eg +
Unix Timestamp Milliseconds (UTF16-LE String) = eg 1469075274000
+ separator (UTF16-LE String) = eg +
Unknown Number String (UTF16-LE String) = eg 1
Cached JPG (Binary) = starts with 0xFFD8 ... ends with 0xFFD9

The Unix Timestamp Milliseconds is referenced to UTC and should be adjusted for local time. We can use a program like DCode to translate it into a human readable format (eg 1469075274000 = Thu, 21 July 2016 04:27:54. UTC).

The item path string format did not appear to vary for a picture/video saved to the SD card versus internal phone memory.

The Samsung Note 4 file format documented above was NOT identical with other sample test devices including a Moto G (XT1033), a Samsung Galaxy Core Prime (SM-G360G) and a Samsung J1 (SM-J100Y).
The Moto G’s Gallery app cache record size did not include itself (ie 4 bytes smaller) and the Galaxy Core Prime / J1’s Gallery app cache record did not utilize a UTF16LE timestamp string. Instead, it used a LE 8 byte integer representing the Unix timestamp in milliseconds (for BOTH picture and video imgcache records). This was written between the end of the path string and the start of the cached JPG’s 0xFFxD8.
These differences imply that a scripted solution will probably require modifications on a per device/per app basis.

UPDATE 4AUG2016:
As a result of this testing, a second script (imgcache-parse-mod.py) was written to parse Galaxy S4 (GT-i9505)/ Galaxy Core Prime / J1 imgcache files which appear to share the same imgcache record structures. Please refer to the initial comments section of the imgcache-parse-mod.py script for a full description of that imgcache structure. This modified script will take the same input arguments as the original imgcache-parse.py script described in the next section.

Scripting

A Python 2 script (imgcache-parse.py) was written to extract JPGs from “imgcache”, “micro” and “mini” cache files to the same directory as the script.

UPDATE 4AUG2016:
The script searches the given cache file (eg imgcache.0) for the UTF16LE encoded "/local/image/item/" and/or “/local/video/item/” strings, finds the record size and then extracts the record's embedded JPG to a separate file. The script also outputs an HTML table containing the extracted JPGs and various metadata.

An example HTML output table looks like:

Example HTML output table for picture imgcache records

Example HTML output table entry for a video imgcache record

The extracted JPG filename is constructed as follows:

[Source-Cache-Filename]_pic_[Hex-Offset-of-JPG]_[Unix-Timestamp-sec]_[Human-Timestamp].jpg
OR
[Source-Cache-Filename]_vid_[Hex-Offset-of-JPG]_[Unix-Timestamp-ms]_[Human-Timestamp].jpg

The script also calculates the MD5 hash for each JPG (allowing for easier detection of duplicate images) and prints the filesize and the complete item path string.
Each HTML table record entry is printed in the same order as it appears in the input cache file. That is, the top row represents the first cache record and the bottom row represents the last cache record.

The script was validated with Android 5.1.1 and the Gallery3d app v2.0.8131802.
You can download it from my Github site here.

Here is the help for the script:

C:\Python27\python.exe imgcache-parse.py
Running imgcache-parse.py v2016-08-02

Usage: imgcache-parse.py -f inputfile -o outputfile

Options:
-h, --help   show this help message and exit
-f FILENAME imgcache file to be searched
-o HTMLFILE HTML table File
-p           Parse cached picture only (do not use in conjunction with -v)
-v           Parse cached video thumbnails only (do not use in conjunction with -p)

Here is an example of how to run the script (from Windows command line with the Python 2.7 default install). This will process/extract BOTH pictures and video cache records (default):

C:\Python27\python.exe imgcache-parse.py -f imgcache.0 -o opimg0.html
Running imgcache-parse.py v2016-08-02

Paths found = 14

/local/image/item/44+1469075274+1 from offset = 0X18
imgcache.0_pic_0X5A_1469075274_2016-07-21T04-27-54.jpg
JPG output size(bytes) = 28968 from offset = 0X5A

/local/image/item/43+1469073536+1 from offset = 0X7199
imgcache.0_pic_0X71DB_1469073536_2016-07-21T03-58-56.jpg
JPG output size(bytes) = 75324 from offset = 0X71DB

/local/image/item/41+1469054648+1 from offset = 0X1982E
imgcache.0_pic_0X19870_1469054648_2016-07-20T22-44-08.jpg
JPG output size(bytes) = 33245 from offset = 0X19870

/local/image/item/40+1469051675+1 from offset = 0X21A64
imgcache.0_pic_0X21AA6_1469051675_2016-07-20T21-54-35.jpg
JPG output size(bytes) = 40744 from offset = 0X21AA6

/local/image/item/39+1469051662+1 from offset = 0X2B9E5
imgcache.0_pic_0X2BA27_1469051662_2016-07-20T21-54-22.jpg
JPG output size(bytes) = 30698 from offset = 0X2BA27

/local/video/item/38+1469051577796+1 from offset = 0X33228
imgcache.0_vid_0X33270_1469051577796_2016-07-20T21-52-57.jpg
JPG output size(bytes) = 34931 from offset = 0X33270

/local/image/item/37+1469051566+1 from offset = 0X3BAFA
imgcache.0_pic_0X3BB3C_1469051566_2016-07-20T21-52-46.jpg
JPG output size(bytes) = 28460 from offset = 0X3BB3C

/local/image/item/27+1390351440+1 from offset = 0X42A7F
imgcache.0_pic_0X42AC1_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 97542 from offset = 0X42AC1

/local/image/item/28+1390351440+1 from offset = 0X5A7DE
imgcache.0_pic_0X5A820_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 122922 from offset = 0X5A820

/local/image/item/29+1390351440+1 from offset = 0X78861
imgcache.0_pic_0X788A3_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 127713 from offset = 0X788A3

/local/image/item/30+1390351440+1 from offset = 0X97B9B
imgcache.0_pic_0X97BDD_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 97100 from offset = 0X97BDD

/local/image/item/31+1390351440+1 from offset = 0XAF740
imgcache.0_pic_0XAF782_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 66576 from offset = 0XAF782

/local/image/item/32+1390351440+1 from offset = 0XBFBA9
imgcache.0_pic_0XBFBEB_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 34746 from offset = 0XBFBEB

/local/image/item/33+1390351440+1 from offset = 0XC83BC
imgcache.0_pic_0XC83FE_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 26865 from offset = 0XC83FE

Processed 14 cached pictures. Exiting ...

The above example output also printed the HTML table we saw previously.
Some further command line examples:
C:\Python27\python.exe imgcache-parse.py -f imgcache.0 -o output.html -p
(will parse/output picture cache items ONLY)

C:\Python27\python.exe imgcache-parse.py -f imgcache.0 -o output.html -v
(will parse/output video thumbnail cache items ONLY)

Testing

During testing of the Gallery app - device camera pictures, a screenshot and a picture saved from an Internet browser were viewed. Cached copies of these pictures were subsequently observed in the “imgcache.0”, “mini.0” and “micro.0” cache files.
From our testing, the Unix timestamp represents when the picture was taken/saved rather than the time it was browsed in the Gallery app.
This was tested for by taking camera picture 1 on the device, waiting one minute, then taking picture 2. We then waited another minute before viewing picture 1 in the Gallery app, waiting one minute and then viewing picture 2.
Running the imgcache-parse.py script and viewing the resultant output HTML table confirmed that the timestamp strings reflect the original picture’s created time and not the Gallery viewed time. The HTML table also displayed the order of the imgcache.0 file - picture 1 was written first, then picture 2.
We then cleared the Gallery app cache and viewed picture 2 in the Gallery app followed by picture 1.
Running the imgcache-parse.py script again and viewing the resultant output HTML table displayed the order of the imgcache.0 file. Picture 2 was written first, then picture 1.

UPDATE 4AUG2016:
A device video was also created (20160802_155401.mp4), uploaded to Dropbox (via app v2.4.4.8) and then downloaded and viewed in the Gallery app. The imgcache.0 record timestamp for the created video (1470117241703 = 05:54:01) differed to the imgcache.0 timestamp for the downloaded video (1470117253000 = 05:54:13). This difference of approximately 12 seconds was slightly longer than the 11 second video duration.
It is suspected that the created video’s imgcache timestamp represents when the original video was first being written and the downloaded video’s imgcache timestamp represents when the original video was finalised to the filesystem.
The video thumbnails displayed in the Gallery app and imgcache for each video were also different. The downloaded video thumbnail appeared to be from approximately 1 second into the video. The created video thumbnail seemed to be the first frame of the video. The MD5 hashes of both video files were identical.

As per LSB's helpful suggestion, rather than take a full image of the test phone for each acquisition of cache files, we plugged our test device into a PC and used Windows Explorer to browse to the Phone\Android\data\com.sec.android.gallery3d\cache folder and copy the cache files to our PC. This saved a significant amount of imaging time. To minimize any synchronization issues, the phone should be unplugged/re-plugged between file copies.

Final Thoughts

Depending on the device, it may be possible to determine the created timestamp of a picture viewed and cached from the Android Gallery app. The Gallery cache may also include pictures which are no longer available elsewhere on the device.
A Python script (imgcache-parse.py) was created to extract various metadata and the cached images from a Samsung Note 4 Gallery app’s (imgcache, micro and mini) cache files.
UPDATE 4AUG2016:A modified version of this script (imgcache-parse-mod.py) was also created to handle binary timestamps as observed on Galaxy S4 / J1 / Core Prime sample devices.

It is STRONGLY recommended that analysts validate their own device/app version combinations before running these scripts. Your mileage will vary!
For example, take a picture using the device camera and validate its YYYYMMDD_HHMMSS.JPG filename/metadata against the timestamp in the item path (if its there).
For case data, look for device images with date/time information in them (eg pictures of newspapers, receipts etc. or device screenshots) to increase the confidence level in extracted timestamps.

The Gallery app was not present in various Android 6.0 test devices that we looked at. It may have been usurped by the Google Photos app. However, we have seen the Gallery app on Android 5 and Android 4 devices which would still make up the majority of devices currently out there.

Monkey doesn't have the time/inclination but further areas of research could be:
- Decompiling the Gallery .apk and inspecting the Java code.
- Rollover functionality of the cache files (eg confirm how imgcache.1 get populated).
- Why there can be multiple copies of the same image (with same MD5) appearing at multiple offsets within the same imgcache file.
- Determining how the cache record index number is being calculated.
- Determining the “imgcache.idx”, “micro.idx”, “mini.idx” files purpose.

Anyhoo, it would be great to hear from you in the comments section (or via email) if you end up using these scripts for an actual case. Or if you have any further observations to add (don't forget to state your Android version and device please).

Sorry, but for mental health reasons I will NOT recover your dick pics for you. ie Requests for personal image recovery will be ignored. If you Google for "JPG file carver", you should find some programs that can help you recover/re-live those glorious tumescent moments.

Can you tell how working in forensics has affected my world view? ;)

↧

Google S2 Mapping Scripts

August 12, 2016, 8:34 am

≫ Next: Monkey Plays (LAN) Turtle

≪ Previous: A Timestamp Seeking Monkey Dives Into Android Gallery Imgcache

Sorry Monkey - there is just no point to mapping jokes ...

Cindy Murphy's recent forensic forays into Pokemon Go (here and here)
have inspired further monkey research into the Google S2 Mapping library. The S2 library is also used by Uber, Foursquare and Google (presumably) for mapping location data. So it would probably be useful to recognise and/or translate any S2 encoded location artifacts we might come across in our forensic travels eh? *repeats in whispered voice*travels...
After a brief introduction to how S2 represents lat/long data, we will demonstrate a couple of multi-platform (Windows/Linux) Python conversion scripts for decoding/encoding S2 locations (using sidewalklabs' s2sphere library).

S2 Mapping (In Theory)

The main resources for this section were:

Christian S. Perone's blog post on S2.
Octavian Procopiuc's "Geometry on the Sphere: Google's S2 Library" GoogleDoc.

The TLDR - its possible to convert a latitude/longitude from a sphere into a 64 bit integer. The resultant 64 bit integer is known as a cellid.
But how do we calculate this 64 bit cellid? This is achieved by projecting our spherical point (lat, long) onto one of 6 faces of an enclosing cube and then using a Hilbert curve function to specify a grid location for a specified cell size. Points along the Hilbert curve that are close to each other in value, are also spatially close to one another. This diagram from Christian's blog post better illustrates the point:

Points close to each other on the Hilbert curve have similar values. Source: Christian Perone's Blog

In the above diagram, the Hilbert curve (darker gray line) goes from the bottom LHS to the bottom RHS (or vice versa). Each grid box contains one of those Y-shaped patterns. The scale at the bottom represents the curve if it was straightened out like a piece of string.
If you now imagine the grid becoming finer/smaller but still requiring one Y-shape per box, you can see how a smaller cell grid size requires a finer resolution for points on the curve. So the smaller the cell/grid size, the higher the number of bits required to store the position. For example, a level 2 cell size only needs 4 bits where as the maximum level 30 requires 60 bits. Level 30 cell sizes are approximately 0.48-0.93 square centimetres in size depending on the lat/long.

Fun fact: Uber apparently uses a level 12 cell size (approx. 3.3 to 6.4 square kilometres per cell).
Second fun fact: The Metric system has been around for over 100 years so stop whining about all the metric measurements already *looks over sternly at the last of the Imperial Guards in the United States, Myanmar and Liberia*.

Ahem ... so here's what a level 30 cellid looks like:

Level 30 cellid structure. Source: Octavian Procopiuc's GoogleDoc

The first 3 bits represent which face of the enclosing cube to use and the remaining 60 bits are used to store the position on the Hilbert curve. Note: The last bit is set to 1 to mark the end of the Hilbert positioning bits.

When cellids are converted into hexadecimal and have their least significant zeroes removed (if present), they are in their shortened "token" form.
eg1 cellid = 10743750136202470315 (decimal) has a token id = 0x951977D377E723AB
eg2 cellid = 9801614157982728192 (decimal) = 0x8806542540000000. The 16 hex digits can then be shortened to a token value of "880654254". To convert back to the original hex number, we keep adding least significant zeroes to "880654254" until its 16 digits long (ie 64 bits).
Analysts should anticipate seeing either cellids or token ids. These might be in plaintext (eg JSON) or may be in an SQLite database.

Note:Windows Calculator sucks at handling large unsigned 64 bit numbers. According to this, its limited between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807. So a number like 10,743,750,136,202,470,315 made it return an incorrect hex representation after conversion.
This monkey spun his paws for a while trying to figure out why the token conversions didn't seem to make sense. The FFs Solution - use the Ubuntu Calculator for hex conversions of 64 bit integers instead.

The Scripts

Two Python 2.7+ scripts were written to handle S2 conversions and are available from my GitHub here. They have been tested on both Windows 7 running Python 2.7.12 and Ubuntu x64 14.04 running Python 2.7.6.

s2-latlong2cellid.py converts lat, long and cellid level to a 64 bit Google S2 cellid.

s2-cellid2latlong.py converts a 64 bit Google S2 cellid to a lat, long and S2 cellid level.

IMPORTANT: These scripts rely on the third-party s2sphere Python library. Users can install it via:

pip install s2sphere

(on Windows) and:

sudo pip install s2sphere

(on Ubuntu)

Here's the help text for s2-latlong2cellid.py:

python s2-latlong2cellid.py -h
Running s2-latlong2cellid.py v2016-08-12

usage: s2-latlong2cellid.py [-h] llat llong level

Converts lat, long and cellid level to a 64 bit Google S2 cellid

positional arguments:
llat        Latitude in decimal degrees
llong       Latitude in decimal degrees
level       S2 cell level

optional arguments:
-h, --help show this help message and exit

Here's the help text for s2-cellid2latlong.py:

python s2-cellid2latlong.py -h
Running s2-cellid2latlong.py v2016-08-12

usage: s2-cellid2latlong.py [-h] cellid

Convert a 64 bit Google S2 cellid to a lat, long and S2 cellid level

positional arguments:
cellid Google S2 cellid

optional arguments:
-h, --help show this help message and exit

Testing

A handy online S2map visualizer was written by David Blackman - the Lead Geo Engineer at Foursquare (and formerly of Google ie he knows his maps). See also the Readme here. S2map also has several other mapping dropbox options besides the default "OSM Light". These include "Mapbox Satellite" which projects the cellid onto an aerial view.

We start our tests by going to GoogleMaps and noting the lat, long of an intersection in Las Vegas.

Pick a spot! Any spot!

We then specify that lat, long as input into the s2-latlong2cellid.py script (with level set to 24):

python s2-latlong2cellid.py 36.114574 -115.180628 24
Running s2-latlong2cellid.py v2016-08-12

S2 cellid = 9279882692622716928

We then put that cellid into s2map.com:

Level 24 test cellid plotted on s2map.com.

Note: The red arrow was added by monkey to better show the plotted cellid (its tiny).
So we can see that our s2-latlong2cellid.py gets us pretty close to where we originally specified on GoogleMaps.

What happens if we keep the same lat, long coordinates but decrease the level of the cellid from 24 to 12?

python s2-latlong2cellid.py 36.114574 -115.180628 12
Running s2-latlong2cellid.py v2016-08-12

S2 cellid = 9279882742634381312

Obviously this is a different cellid because its set at a different level, but just how far away is the plotted level 12 cellid now?

Level 12 test cellid plotted on s2map.com.

Whoa! The cell accuracy has just decreased a bunch. It appears the center of this cellid is completely different to the position we originally set in GoogleMaps. It is now centred on the Bellagio instead of the intersection. This is presumably because the cell size is now larger and the center point of the cell has moved accordingly.

To confirm these findings, we take our level 24 cellid 9279882692622716928 and use it with s2-cellid2latlong.py.

python s2-cellid2latlong.py 9279882692622716928
Running s2-cellid2latlong.py v2016-08-12

S2 Level = 24
36.114574473973924 , -115.18062802526205

We then plot those coordinates on GoogleMaps ...

Level 24 test cellid 9279882692622716928 plotted on GoogleMaps via s2-cellid2latlong.py

ie Our s2-cellid2latlong.py script seems to work OK for level 24.

Here's what it looks like when we use the level 12 cellid 9279882742634381312:

python s2-cellid2latlong.py 9279882742634381312
Running s2-cellid2latlong.py v2016-08-12

S2 Level = 12
36.11195989469266 , -115.17705862118852

Level 12 test cellid 9279882742634381312 plotted on GoogleMaps via s2-cellid2latlong.py

This seems to confirm the results from s2map.com. For the same lat, long, changing the cellid level can significantly affect the returned (centre) lat, long.

We also tested our scripts against s2map.com with a handful of other cellids and lat/long/levels and they seemed consistent. Obviously time contraints will not let us test every possible point.

Final Thoughts

Using the s2sphere library, we were able to create a Python script to convert a lat, long and level to an S2 cellid (s2-latlong2cellid.py). We also created another script to convert a S2 cellid to a lat, long and level (s2-cellid2latlong.py).
The higher the cellid level, the more accurate the location. You can find the cellid level by using the s2-cellid2latlong.py script.
Plotting a cellid with s2map.com is the easiest way of visualizing the cellid boundary on a map. Higher levels (>24) become effectively invisible however.

To locate potential S2 cellids we can use search terms like "cellid" or variations such as "cellid=". If its stored in plaintext (eg JSON), those search terms should find it. No such luck if its encrypted or stored as a binary integer though.

While there are other S2 Python libraries, this Monkey decided to use sidewalklabs s2sphere library based on its available documentation and pain-free cross platform support (pip install is supported).

Other Google S2 Python libraries include:
https://github.com/micolous/s2-geometry-library
(As demonstrated in Christian's blog and also used in the Gillware Pokemon script. This seems to be Linux only)
and
https://github.com/qedus/sphere
(Has the comment: "Needs to be packaged properly for use with PIP")

Some other interesting background stuff ...
Interview article with David Blackman (Foursquare)

Matt Ranney's (Chief Systems Architect at Uber) video presentation on "Scaling Uber's Real-time Market Platform" (see 18:15 mark for S2 content)

Uber uses drivers phone as a backup data source (claims its encrypted)

In the end, creating the Python conversion scripts was surprisingly straight forward and only required a few lines of code.
It will be interesting to see how many apps leave Google S2 cellid artifacts behind (hopefully ones with a high cellid level). Hopefully these scripts will prove useful when looking for location artifacts. eg when an analyst finds a cellid/token and wants to map it to a lat/long or when an analyst wants to calculate a cellid/token for a specific lat, long, level.

↧

Monkey Plays (LAN) Turtle

January 4, 2017, 4:53 am

≫ Next: Monkey Unpacks Some Python

≪ Previous: Google S2 Mapping Scripts

OMG! Sooo Turtle-y!

The Hak5LAN Turtle recently plodded across our desk so we decided to poke it with a stick and see how effective it is in capturing Windows (7) credentials.
From the LAN Turtle wiki:

The LAN Turtle is a covert Systems Administration and Penetration Testing tool providing stealth remote access, network intelligence gathering, and man-in-the-middle monitoring capabilities.
Housed within a generic "USB Ethernet Adapter" case, the LAN Turtle’s covert appearance allows it to blend into many IT environments.

It costs about U$50 and looks like this:

It consists of a System-On-Chip running an openwrt (Linux) based OS. Amongst other things, it can act like a network bridge/router between:
- a USB Ethernet interface which you plug into your target PC. This interface can also be ssh'd into via its static IP address 172.16.84.1 (for initial configuration and copying off creds).
- a 10/100 Mbps Ethernet port which you can use to connect the Turtle to the Internet (providing remote shell access and allowing the install of modules/updates from LANTurtle.com). It is not required to capture creds during normal operation.

It also has 16 MB on board Flash memory and can be configured to run a bunch of different modules via a Module Manager.

By using the Turtle's USB Ethernet interface to create a new network connection and then sending the appropriate responses, the Turtle is able to capture a logged in user's Windows credentials. Apparently Windows will send credentials over a network whether the screen is locked or not (a user must be logged in).

We will be using the QuickCreds module written by Darren Kitchen which was based on the research of Rob "Mubix" Fuller.
To send the appropriate network responses, QuickCreds calls Laurent Gaffie's Responder Python script and saves credentials (eg NTLMv2 for our Win 7 test case) to numbered directories in /root/loot. The amber Ethernet LED will blink rapidly while QuickCreds is running. When finished capturing (~30 secs to a few minutes), the amber LED is supposed to remain lit.

But wait - there's more! The turtle can also offer remote shell/netcat/meterpreter access, DNS spoofing, man-in-the-middle browser attacks, nmap scans and so much more via various downloadable modules. Alas, we only have enough time/sanity/Turtle food to look at the QuickCreds module.

Setup

We will be both configuring and testing the Turtle on a single laptop running Windows 7 Pro x64 with SP1. Realistically, you would configure it on one PC and then plug it into a separate target PC.

We begin setup by plugging the Turtle into the configuration PC and using PuTTY to ssh as root to 172.16.84.1. For proper menu display, be sure to adjust the PuTTY Configuration's Windows, Translation, Remote character set to "Latin-1, Western Eur".

The default root Turtle password is sh3llz. Upon first login, the user is then prompted to change the root password.
Ensure an Internet providing Ethernet cable is plugged in to the Turtle's Ethernet port to provide access to LANTurtle.com updates.

Note: The Turtle may also require Windows to install the "Realtek USB FE Family Controller" Network Adapter driver before you can communicate with it.

Upon entering/confirming the new root password, you should see something like:

LAN Turtle Main Menu via PuTTY session

Under Modules, Module Manager, go to Configure, then Directory to select the QuickCreds module for download. You can select/check a module for download via the arrow/spacebar keys.

Return back to Modules, select the QuickCreds module, then Configure (this will take a few minutes to download/install/configure the dependencies from the Internet). Remember to have an Internet providing Ethernet cable plugged into the Turtle.

Select the QuickCreds Enable option so QuickCreds is launched whenever the Turtle is plugged into a USB port.
(Optional) You can also select the Start option to start the QuickCreds module now and it should collect your current Windows login creds.

We are now ready to remove the Turtle from our config PC and place it into a target PC's USB port.

If you're having issues getting the Turtle working, try to manually reset the Turtle following the "Manually Upgrading" wiki procedure at the bottom of this page.

There's also a Hak5 Turtle/QuickCreds demo and explanation video by Darren Kitchen and Shannon Morse thats well worth a view.

Capturing Creds

Insert the Turtle into the (locked) target PC and wait for the creds to be captured. Our Turtle's amber Ethernet light followed this pattern on insertion:
- ON/OFF
- OFF (10 secs)
- Blinking at 1 Hz (15 secs)
- OFF (1-2 secs)
- Rapid Blinking > 1 Hz (indefinitely or until we launch PuTTY when it remains ON)

From testing, once we see the rapid blinking, the creds have been captured.

If you have an Internet cable plugged in to the Turtle when capturing creds, you can also remote SSH into the Turtle to retrieve the captured creds.This is not in the scope of this post however.

For our testing, we will keep it simple and use PuTTY's scp to retrieve the stored creds (eg capture creds, retrieve Turtle, take Turtle back to base for creds retrieval):
We remove the Turtle from the target PC and re-insert it into our config PC. For our testing on a single laptop this meant - we removed the Turtle, unlocked the laptop and then re-inserted the Turtle.
Note: Due to the auto enable, the Turtle will also capture the config PC's creds upon insertion.

Now PuTTY in to the Turtle, then choose Exit to get to the Turtle command prompt/shell (shell ... Get it? hyuk, hyuk).

To find the latest saved creds we can type something like:

ls -alt /root/loot

which shows us the latest creds (corresponding to our current config PC) is stored under /root/loot/12/

root@turtle:~# ls -alt /root/loot/
-rw-r--r--    1 root     root           319 Jan 2 11:14 responder.log
drwxr-xr-x    2 root     root             0 Jan 2 11:13 12
drwxr-xr-x   14 root     root             0 Jan 2 11:11 .
drwxr-xr-x    2 root     root             0 Jan 2 11:01 11
drwxr-xr-x    2 root     root             0 Jan 2 11:00 10
drwxr-xr-x    2 root     root             0 Jan 2 10:46 9
drwxr-xr-x    2 root     root             0 Jan 2 08:58 8
drwxr-xr-x    2 root     root             0 Jan 2 08:49 7
drwxr-xr-x    2 root     root             0 Jan 2 08:46 6
drwxr-xr-x    2 root     root             0 Jan 2 08:35 5
drwxr-xr-x    2 root     root             0 Jan 2 08:34 4
drwxr-xr-x    2 root     root             0 Jan 2 08:26 3
drwxr-xr-x    2 root     root             0 Jan 2 08:21 2
drwxr-xr-x    2 root     root             0 Jan 2 08:20 1
drwxr-xr-x    1 root     root             0 Jan 2 08:20 ..
root@turtle:~#

So looking further at /root/loot/11/ (ie the creds from when we plugged the Turtle into the locked laptop) shows us a few log files and a text file containing our captured creds (ie HTTP-NTLMv2-172.16.84.182.txt).

root@turtle:~# ls /root/loot/11/
Analyzer-Session.log Poisoners-Session.log
Config-Responder.log Responder-Session.log
HTTP-NTLMv2-172.16.84.182.txt
root@turtle:~#

Our creds should be stored in HTTP-NTLMv2-172.16.84.182.txt and we can use the following command to check that the file contents look OK:

more /root/loot/HTTP-NTLMv2-172.16.84.182.txt

which should return something like:

admin::N46iSNekpT:08ca45b7d7ea58ee:88dcbe4446168966a153a0064958dac6:5c7830315c7830310000000000000b45c67103d07d7b95acd12ffa11230e0000000052920b85f78d013c31cdb3b92f5d765c783030

Where admin is the login name and the second field (eg N46iSNekpT) corresponds to the domain.
Note: This is an NTLMv2 example sourced from hashcat.

Once we have found the appropriate file containing the creds we want, we can use PuTTY pscp.exe to copy the files from the Turtle to our config PC.
From our Windows config PC we can use something like:

pscp root@172.16.84.1:/root/loot/11/HTTP-NTLMv2-172.16.84.182.txt .

to copy out the creds file. Note the final . to copy the creds file into the current directory on the config PC.

We can then feed this (file or individual entries) into hashcat to crack the user password. This is an exercise left for the reader.

Turtle Artifacts?

Now that we have our creds, lets see if we can find any fresh Turtle ~~scat~~ er, artifacts.

Starting with the Turtle plugged in to an unlocked PC, we look under the Windows Device Manager and find the Network adapter driver for the Turtle - ie the "Realtek USB FE Family Controller"

Turtle Network Adapter Driver Properties

The Details Tab from the Properties screen yields a "Device Instance Path" of:
USB\VID_0BDA&PID_8152\00E04C36150A

Similarly, the "Hardware Ids" listed were "USB\VID_0BDA&PID_8152" and "USB\VID_0BDA&PID_8152&REV_2000".

The HardwareId string ("VID_0BDA&PID_8152") implies that the driver was communicating with a Realtek 8152 USB Ethernet controller. Note:0BDA is the vendor id for Realtek Semiconductor (see https://usb-ids.gowdy.us/read/UD/0bda) and the Turtle Wiki specs confirm the Turtle uses a "USB Ethernet Port - Realtek RTL8152".

We then used FTK Imager (v3.4.2.2) to grab the Registry hives so we can check them for artifacts.

Searching the SYSTEM hive for part of the "Device Instance Path" string (ie "VID_0BDA&PID_8152") yields an entry in SYSTEM\ControlSet001\Enum\USB\VID_0BDA&PID_8152

Potential First Turtle Insertion Time

The Last Written Time appears to match the first time the Turtle was inserted into the PC (21DEC2016 @ 21:15:54 UTC).

Another hit occurs in SYSTEM\ControlSet001\Enum\USB\VID_0BDA&PID_8152\00E04C36150A

Potential Most Recent Turtle Insertion Time

The Last Written Time appears to match the most recent time the Turtle was inserted (2JAN2017 @ 11:45:01 UTC).

The Turtle's 172.16.84.1 address appears in the Windows SYSTEM Registry hive as a "DhcpServer" value under SYSTEM\ControlSet001\services\Tcpip\Parameters\Interfaces\{59C1F0C4-66A7-42C8-B25E-6007F3C40925}.

Turtle's DHCP Address and Timestamp

Additionally under that same key, we can see a "LeaseObtainedTime" value which appears to be in seconds since Unix epoch (1JAN1970).
Using DCode to translate gives us:

Turtle DHCP LeaseObtainedTime Conversion

ie 2 JAN 2017 @ 11:24:37
This time occurs between the first time the Turtle was inserted (21DEC2016) and the most recent time the Turtle was inserted (2JAN2017 @ 11:45:01). This is plausible as the Turtle was plugged in multiple times during testing on the 2 JAN 2017. It is estimated that the Turtle was first plugged in on 2 JAN 2017 around the same time as the "LeaseObtainedTime".

These timestamps potentially enable us to give a timeframe for Turtle use. We say potentially because it is possible that another device using the "Realtek USB FE Family Controller" driver may have also been used. However, the specific IP address (172.16.84.1) can help us point the flipper at a rogue Turtle.

The "Realtek USB FE Family Controller" string also appears in the "Description" value under the SOFTWARE hive:
SOFTWARE\Microsoft\Windows NT\CurrentVersion\NetworkCards\17

NetworkCards entry potentially pointing to Turtle

Note: The NetworkCards number entry will vary (probably will not be 17 in all cases)

There are probably more artifacts to be found but these Registry entries were the ones that were the most obvious to find. The Windows Event logs did not seem to log anything Turtle-y definitive.

However, based on the artifacts above, we can only say that a Turtle was probably plugged in. We don't have enough (yet?) to state which modules (if any) were run.

Final Thoughts

Anecdotally from the Hak5 Turtle Forums, capturing Windows credentials with the LAN Turtle seems to be hit and miss.
From our testing, the Turtle QuickCreds module worked for a Win7 laptop but failed to capture creds for a Win10 VM running on the same laptop. Once the Turtle was plugged in to the laptop, it captured the creds for the host Win7 OS but upon connecting the Turtle to the Win10 VM via the "Removable Devices" VMware 12 Player menu, the amber LED remains solidly lit and the Win10 creds were not captured.
Interestingly, not all of the Win7 Registry artifacts listed previously were observed in the Win10 VM's Registry:
Both SYSTEM\ControlSet001\Enum\USB\VID_0BDA&PID_8152 and SYSTEM\ControlSet001\Enum\USB\VID_0BDA&PID_8152\00E04C36150A were present in the Win10 SYSTEM registry.
However, no hits were observed for "172.16.84.1" in SYSTEM.
There were various hits for "Realtek USB FE Family Controller" in SYSTEM.
The "Realtek USB FE Family Controller" string also appears in the "Description" value under the Win10 SOFTWARE hive:
SOFTWARE\Microsoft\Windows NT\CurrentVersion\NetworkCards\5
The lack of Win10 Registry DHCP artifacts probably indicates that while the Realtek USB Ethernet driver was installed, the Turtle was unable to assign the 172.16.84.1 IP address within the WIN10 VM (possibly because the Win7 still has it reserved?).

Fortunately, Jackk has recorded a helpful YouTube video demonstrating the LAN Turtle running QuickCreds successfully against a Win10 laptop (not VM). So it is possible on Win10 ... Jackk also shows how to use the Turtle's sshfs module to copy off the cred files via a FileZilla client (instead of using pscp).

Any comments/suggestions are turtle-y welcome in the comments section below.

↧

Monkey Unpacks Some Python

August 21, 2017, 1:55 am

≫ Next: Monkey takes a .heic

≪ Previous: Monkey Plays (LAN) Turtle

UNPACK-ing Python .. Now with added monkey!

Some forensic folks have suggested that a Python tutorial on how to read/print binary data types might be helpful to budding Python programmers in the community.
So in this post, we will simulate reverse engineering a fictional contact file format and then write a Python script to extract/print out the values.
For brevity, this post ass-umes the reader has a basic knowledge of Python (i.e. they can launch a script and know about functions/assigning variables etc.). There are plenty of introductory tutorials online - if you are a beginner, you might want to check out the Google Developer Python course before proceeding.

The script (unpack-tute.py) has been tested with both Python v2.7.12 and Python 3.4.1 on a Win7x64 PC.
Historically, Python 2 had more supported 3rd party libraries. Consequently, it was the first version of Python that this monkey learned and we are actually more familiar with Python 2. Python 2's End Of Life is currently scheduled for April 2020 so there's a few years left. However, as this script does not rely on 3rd party libraries, we have adapted it to run on both Python 2 and 3.
The main difference affecting this script was that Python 3 treats strings as Unicode by default so we had to add an encode('utf-8') call when searching through our data file.

There is more than one way to code a solution. We have tried to make this code easy to follow instead of making it "Pythonic" (whatever that even means) or by adding lots of error checking code (if you write a script, you should know how it to use it!).
The Python script (unpack-tute.py) and sample binary file (testctx.bin) will be posted to my brand-monkey-spanking-new GitHub Python Tutorials folder.

So, here's a screenshot of the "testctx.bin" file we want to read:

Screenshot of "testctx.bin" (brought to us courtesy of WinHex!)

Note: The first contact record is highlighted and offsets are listed in decimal. Curious George is ... curious?

Using some reverse engineering strategies that we previously wrote about here we can make a few observations regarding the structure of each Contact record ...

We can see there's a repeated "ctx!" string before each Contact record.
After each record's "ctx!" field, there is a Little Endian 2 byte field that seems to increase with each subsequent record (eg 0x0100 at decimal offset 68, 0x02FF at decimal offset 516, 0xFFFF at decimal offset 804). For initial classification, we will say its an index record number.
Each record has a UTF16LE (ie 2 bytes per character) string that contains a name (eg George).
Each record has a UTF8/ASCII (ie 1 byte per character) string that contains a phone number (eg 5551234).
Before each of the strings, there is a one byte integer corresponding to the string size in bytes.
The last field seems to be a 4 byte field. By observing which bytes vary and which bytes remain constant(ie the left most bytes change more rapidly than the rightmost bytes), we suspect the last field is a Little Endian timestamp field. Feeding in the first record's last 4 bytes (ie 0x26CDDB56) into DCode results in a valid date/time for a Unix 32 bit Little Endian timestamp

DCoding the Contact timestamp

So here's our contact record format:

Contact record data structure

And here's a summary of what we want the script to do:

1. Open "testctx.bin" file (read only)
2. Store file contents
3. Search file contents for ctx! markers
4. For each hit:
    4a. Print hit offset
    4b. Extract Index Number field and print
    4c. Extract Name Length field and print
    4d. Extract Name String (UTF16LE) field and print
    4e. Extract Phone Length field and print
    4f. Extract Phone String (UTF8) field and print
    4g. Extract Unix Timestamp field and print (in ISO format)

5. Close file

Simples!

The Script

OK now that we know what we want to do, here's how we implement each step in code ...

Steps 1 & 2 Open file and store file contents (See "unpack-tute.py" lines 25-33):
1. Open "testctx.bin" file (read only)
2. Store file contents

For step 1, we open the "textctx.bin" file in read-onlybinary mode (what the "rb" stands for):
    fb = open(filename, "rb")

We chose read-only mode because we don't want to change the file contents and we chose binary mode because we are interpreting the file as raw bytes (not text).
Then to read/store the file contents, we call:
    filecontent = fb.read()

So the "filecontent" variable will now contain every byte from the "testctx.bin" file and individual bytes can be accessed directly using the "slice" notation.
For example, filecontent[0:3] is 3 bytes long and includes the bytes at offsets 0, 1 and 2. It does NOT include the byte at offset 3.
If we replace the start/end locations of our slice example with a variable called startoffset, we get:
    filecontent[startoffset:(startoffset+3)]
This will include the 3 bytes at start, start+1, start+2 only.
The reader might want to remember that little notation nugget as monkey has the feeling it will be popping up again later ... (Hehe, Poo jokes are still floating around in 2017!)

Step 3: Search file contents for "ctx!" markers (See "unpack-tute.py" lines 35-49):
Knowing that "ctx!" encoded in ASCII/UTF8 is x63 x74 x78 x21, we can use a variable "searchstring" to represent our search term in hex:
    searchstring = "\x63\x74\x78\x21"

We now consider the "filecontent" variable as one big string of bytes ...
Python string types have a find() method which searches the parent string for a substring. The find() method returns -1 if the substring is not found otherwise, it returns the first offset where it found the substring. The find() method can also take an starting offset argument so we can use a while loop to repeatedly call find() with an incrementing starting offset until we get no more hits. Thus we can find an offset for each substring hit in the parent string, which we can then store in a Python list called "hitlist".
Here's the code:
    nexthit = filecontent.find(searchstring.encode('utf-8'), 0)
    hitlist = []
    while nexthit >= 0:
        hitlist.append(nexthit)
        nexthit = filecontent.find(searchstring.encode(), nexthit + 1)

We use searchstring.encode('utf-8') because of Python 3 compatibility issues. Python 3 treats all strings as Unicode by default, where as we need to search in UTF8 (ie byte by byte). So we have to encode the searchstring as UTF8 before running the search.
Default Python 2 strings are represented as sequences of raw bytes so calling searchstring.encode('utf-8') in Python 2 has no real effect - we could have used Python 2 lines such as:
    nexthit = filecontent.find(searchstring, 0)
and
    nexthit = filecontent.find(searchstring, nexthit + 1)
This was the only major script change required for Python2 and Python 3 compatibility.

Step 4: Looping through each hit (See "unpack-tute.py" lines 50-88):
Now we have our hitlist of offsets to "ctx!" markers and we know how each contact record is structured, so we can iterate through the filecontent variable using a for loop and extract/print the data we need using the slice notation we discussed previously.

4a. We print out each hit offset in both decimal and hexadecimal.
    print("\nHit found at offset: " + str(hit) + " decimal = " + hex(hit) + " hex")

We use the str() function to convert the "hit" offset variable into a decimal string for printing and the hex() function to convert the hit offset variable into a hexadecimal string.

4b. The first field ("Index Number") after the "ctx!" marker will start 4 bytes after the hit offset. To calculate the offset, we can use code like:
    indexnum_offset = hit + 4
As we have already read the entire file into filecontent, we can access the 2 byte "Index Number" field and interpret it as a Little Endian 2 byte integer as follows:
    indexnum = struct.unpack("<H", filecontent[indexnum_offset:(indexnum_offset+2)])[0]

We are using the struct module's "unpack" function on the given filecontent slice to interpret the slice as a LE 2 byte integer and store it in the "indexnum" variable.
The "<H" argument tells unpack how to interpret the raw bytes i.e. "<" for Little Endian, "H" for unsigned 2 byte integer.
The unpack function returns a tuple (kinda like a sequence of variables) so we specify the "[0]" at the end to retrieve the first converted value. It seems a bit weird until you find out that you can chain types together in the same unpack call. For example, "<HH" specifies 2 consecutive LE unsigned 2 byte integers. Unfortunately, we cannot use chaining here due to the variable length of Name/Phone strings in the contact record.
There's a bunch of other unpack types defined in the Python help documentation (search for "pack unpack").

We can now print out our interpreted "indexnum" value but we need to use the str() function to convert our Index Number integer into a printable string. We can use code such as:
    print("indexnum = " + str(indexnum))

We can re-use a similar pattern of code for the remaining fields in the record.
That is, we calculate the offset of field X, interpret those slice bytes and then print.
Because we know the record field sizes (or can read them e.g. via "Name Length" size byte), calculating the offsets becomes an exercise in adding field sizes to previous field offsets to get to the next offset address.

4c. So for the second field ("Name Length") we can use:
    namelength_offset = indexnum_offset + 2
    print("namelength_offset = " + str(namelength_offset))
    namelength = struct.unpack("B", filecontent[namelength_offset:(namelength_offset+1)])[0]
    print("namelength = " + str(namelength))

For the "Name Length" field (one byte long), we use a starting offset ("namelength_offset") which is 2 bytes past the "Index Number" offset ("Index Number" field is 2 bytes long).
We use unpack with the "B" argument as we are interpreting the 1 byte at filecontent[name_length_offset:(namelength_offset+1)] as an unsigned 1 byte integer and storing it in the "namelength" variable.

4d. For the third field ("Name String") we can use:
    namestring_offset = namelength_offset + 1
    namestring = filecontent[namestring_offset:(namestring_offset+namelength)].decode('utf-16-le')
    print("namestring = " + namestring)

After calculating the "Name String" field offset (should be one byte past the "Name Length" field), we can use the string.decode('utf-16-le') method to interpret the filecontent[namestring_offset:(namestring_offset+namelength)] slice as a UTF16LE string and store it in the "namestring" variable.

4e. For the fourth field ("Phone Length") we can use:
    phonelength_offset = namestring_offset + namelength
    phonelength = struct.unpack("B", filecontent[phonelength_offset:(phonelength_offset+1)])[0]
    print("phonelength = " + str(phonelength))

After calculating the "Phone Length" field offset (should be "Name Length" bytes past the "Name String" offset), we use unpack with the "B" argument as we are interpreting the 1 byte at filecontent[phonelength_offset:(phonelength_offset+1)] as an unsigned 1 byte integer and storing it in the "phonelength" variable.

4f. For the fifth field ("Phone String") we can use:
    phonestring_offset = phonelength_offset + 1
    print("phonestring_offset = " + str(phonestring_offset))
    phonestring = filecontent[phonestring_offset:(phonestring_offset+phonelength)].decode('utf-8')
    print("phonestring = " + phonestring)

After calculating the "Phone String" field offset (should be one byte past the "Phone Length" field), we can use the string.decode('utf-8') method to interpret the filecontent[phonestring_offset:(phonestring_offset+phonelength)] slice as a UTF8 string and store it in the "phonestring" variable.

4g. For the sixth and last field ("Unix Timestamp") we can use:
    timestamp_offset = phonestring_offset + phonelength
    print("timestamp_offset = " + str(timestamp_offset))
    timestamp = struct.unpack("<I", filecontent[timestamp_offset:(timestamp_offset+4)])[0]
    print("raw timestamp decimal value = " + str(timestamp))
    timestring = datetime.datetime.utcfromtimestamp(timestamp).strftime("%Y-%m-%dT%H:%M:%S")
    print("timestring = " + timestring)

We calculate the timestamp offset as being "Phone Length" bytes past the "Phone String" field and print the timestamp offset to help with debugging.
We use unpack with the "<I" argument to interpret the 4 byte filecontent[timestamp_offset:(timestamp_offset+4)] slice as a LE unsigned 4 byte integer and then store the integer value in the "timestamp" variable.
eg interprets 0x26CDDB56 LE as 0x56DBCD26 BE = 1457245478 decimal = number of seconds since 1JAN1970.
We then call the datetime.datetime.utcfromtimestamp() method to create a Python "datetime" object using the number of seconds since 1JAN1970. The returned datetime object has a "strftime" method we can call to obtain a human readable ISO format string. The "%Y-%m-%dT%H:%M:%S" argument to strftime() specifies that we want a datetime string formatted as Year-Month-DayTHour:Minute:Second.

Step 5: After we process all of the "ctx!" hits, we close the file (See "unpack-tute.py" line 89):
    fb.close()

For shiggles, we also print out the number of hits in the hitlist on line 91 before the script finishes.
    print("\nProcessed " + str(len(hitlist)) + " ctx! hits. Exiting ...\n")

Running the script

For Python v2.7.12:
In a Win7x64 command terminal window with "unpack-tute.py" and "testctx.bin" copied to "c:\":

C:\>c:\Python27\python.exe unpack-tute.py
Running unpack-tute.py v2017-08-19

Hit found at offset: 64 decimal = 0x40 hex
indexnum = 1
namelength_offset = 70
namelength = 12
namestring = George
phonelength = 7
phonestring_offset = 84
phonestring = 5551234
timestamp_offset = 91
raw timestamp decimal value = 1457245478
timestring = 2016-03-06T06:24:38

Hit found at offset: 512 decimal = 0x200 hex
indexnum = 65282
namelength_offset = 518
namelength = 18
namestring = King Kong
phonelength = 9
phonestring_offset = 538
phonestring = +15554321
timestamp_offset = 547
raw timestamp decimal value = 1457245695
timestring = 2016-03-06T06:28:15

Hit found at offset: 800 decimal = 0x320 hex
indexnum = 65535
namelength_offset = 806
namelength = 30
namestring = Magilla Gorilla
phonelength = 10
phonestring_offset = 838
phonestring = +445552468
timestamp_offset = 848
raw timestamp decimal value = 1457258495
timestring = 2016-03-06T10:01:35

Processed 3 ctx! hits. Exiting ...

C:\>

For Python 3.4.1:
In a Win7x64 command terminal window with "unpack-tute.py" and "testctx.bin" copied to "c:\":

C:\>c:\Python34\python.exe unpack-tute.py
Running unpack-tute.py v2017-08-19

Hit found at offset: 64 decimal = 0x40 hex
indexnum = 1
namelength_offset = 70
namelength = 12
namestring = George
phonelength = 7
phonestring_offset = 84
phonestring = 5551234
timestamp_offset = 91
raw timestamp decimal value = 1457245478
timestring = 2016-03-06T06:24:38

Hit found at offset: 512 decimal = 0x200 hex
indexnum = 65282
namelength_offset = 518
namelength = 18
namestring = King Kong
phonelength = 9
phonestring_offset = 538
phonestring = +15554321
timestamp_offset = 547
raw timestamp decimal value = 1457245695
timestring = 2016-03-06T06:28:15

Hit found at offset: 800 decimal = 0x320 hex
indexnum = 65535
namelength_offset = 806
namelength = 30
namestring = Magilla Gorilla
phonelength = 10
phonestring_offset = 838
phonestring = +445552468
timestamp_offset = 848
raw timestamp decimal value = 1457258495
timestring = 2016-03-06T10:01:35

Processed 3 ctx! hits. Exiting ...

C:\>

We can see that all of the name and phone strings are complete / as shown in the Hex view picture.
We also verified that each "timestring" value corresponded to it's raw LE hex value using Dcode.

Final Thoughts

After you know the basics of a language, programming is a skill best sharpened by working on actual projects (not reading books or blog posts).
Google and StackOverflow are your friends when researching how to code common tasks in Python.
Which makes print statements your No-BS-tell-it-like-it-is best friend when debugging (e.g. print offset addresses and/or values to debug). A well placed print statement can be the easiest way of finding out that your fifth cola/coffee didn't do you any favours.

The code in this script is intended for use with files that can fit into memory (ie 0 MB to *maybe* hundreds of MB).
Larger files may require breaking up the file into chunks before reading/processing.

In writing this script, we used Notepad++ (v6.7.9.2) with the Language set to Python to get the funky syntax highlighting (eg comments in green, auto-indenting). The TAB size was set to 4 spaces via the Settings, Preferences, Tab Settings menu. We disabled "Word Wrap" (under View menu) and enabled line numbers (under Settings, Preferences, Editing menu) so if/when you get a runtime error, you can find the relevant line more readily.

If you are in the forensic community and found this post helpful or you're in the forensic community and had some questions/thoughts about the code, please leave a comment or send me an email (No, I will not do your homework/assignment! But if its for a new artifact for a case, monkey might be convinced ;).

↧

Monkey takes a .heic

October 13, 2017, 9:30 pm

≫ Next: A Monkey Forays Into USB Flashdrives

≪ Previous: Monkey Unpacks Some Python

The hills are alive ... with the compression of H.265!

With iOS 11 and macOS High Sierra (10.13), Apple has introduced a file container format called High Efficiency Image File Format (aka HEIF - apparently its pronounced "heef"). Apple is using HEIF to store camera/video/Apple "Live Photos". HEIF is based on multiple standards such as:
-ISO Base Media File Format ISO (14496-12) for structuring data sections within the file container
- ISO/IEC 23008-12 MPEG-H Part 2 / ITU H.265 for compressing the actual still picture and video data. Also referred to as High Efficiency Video Coding (HEVC). Theoretically, HEIF could use other compression algorithms but Apple is using it exclusively with HEVC / H.265.

Some benefits of HEIF are:
- It approximately halves the file size for a given image/video quality.
- It allows for a single file to contain multiple media (eg multiple animated still pictures AND sound e.g. an Apple "Live Photo").

Apple HEIF images will have a .heic file extension. Apple HEVC encoded movies will have the familar Quicktime .mov extension but internally they will use HEVC / H.265 compression. The ISO Base Media File Format ISO (14496-12) is based upon the Quicktime file structure and so it will apply to both .heic images and HEVC .mov files.

Because it uses a more complex compression algorithm than previous standards (eg H.264 and JPEG), only recent model Apple devices have the required hardware to create HEVC content.
According to Apple's 2017 WWDC presentation 503 "Introducing HEIF and HEVC", to create HEVC pictures/video you need (at least) an iPhone 7 / iPad Pro (A10 Fusion chip) running iOS 11 or a 6th generation Intel Core processor running macOS 10.13 High Sierra.
Software decoding support is apparently available for all Apple devices (presumably running iOS 11 / High Sierra) but playback performance will probably suffer on older hardware.

For the rest of this post we will discuss:
- how to view .heic and HEVC .mov files
- the file format for .heic files
- the file format for HEVC .mov files

We won't be discussing how HEVC / H.265 compression works. For a quick overview on some basic concepts and the difference between H.264 and H.265, please watchthis video.

And before we dive any deeper ...
Special Thanks to Maggie Gaffney from Teeltech USA for providing us with iPhone 8 Plus test media files.
We also used sample .heic files from an Ars Technica review (iPad Pro) and sample files provided to the FFmpeg forum (iPhone 7 Plus).

Viewing & Compatibility Issues

Here is an article showing how to set up an iOS 11 device to save/transfer .heic files in their original format (Camera set to "High Efficiency" and "Photos - Transfer to Mac or PC" set to "Keep Originals"). Apple can auto-magically convert .heic files to .jpg files (and h.265 .mov to h.264 .mov) when transferring to non-compatible devices/destinations (eg PC or emails). So if you're not receiving .heic files, check those iOS settings.

Apart from viewing them natively on iOS or High Sierra (eg using Apple Photos or Preview), we found the easiest way to view .heic files was using this free Windows HEIF utility by @liuziangexit.
Note: there are two versions - Chinese and English. Being the uncultured lapdog monkey that we are, we downloaded the English version. Be sure to read the readme file included. Running it on Windows 10 (VM) also required installing the signed Microsoft C++ Redistributable package which was conveniently included in the download zip file.

There is also a website that converts .heic to .jpeg but this may not be appropriate for sensitive photos.

For playing HEVC/H.265 encoded .mov files, we found that IrfanView and VLC player worked OK (IrfanView seemed to have better performance than VLC when viewing high resolution videos).

FFmpeg (v3.3.3) can also be used to screen grab frames (1 per sec) from a H.265 .mov. The command is:

ffmpeg.exe -i sourcemovie.mov -vf fps=1 outputframe_%d.png

This will result in "outputframe_1.png", "outputframe_2.png" etc. being generated to the current directory.

For more compatible playback, we can convert an H.265 .mov into an H.264 .mov. The command is:

ffmpeg.exe -i source265movie.mov -map 0 -c copy -c:v libx264 outputmovie264.mov

This copies all other streams (eg audio, subtitles) to the new output file and re-encodes/outputs the source video stream to H.264. See here for details on using the FFmpeg map argument.

We found the easiest way to send a test .heic from an iPhone to a PC was to upload it to Dropbox which has been updated to support .heic and H.265 encoded .mov files. You can view both .heic and .mov files from the Dropbox.com website. Unfortunately, it appears that Dropbox might rename the files upon upload. We were expecting to see something like "IMG_4479.heic" but the filename on Dropbox was something like "Photo Oct 08, 10 20 05.heic". Consequently, a hash compare of the source/destination files may be required to verify exact copies.

Exiftool (v10.63) added improved support for HEIF and it will display the EXIF data from an Apple generated .heic or H.265 .mov file. It has not been confirmed if iOS created .heic / HEVC .mov files will retain ALL of their original EXIF metadata after being auto-magically converted to .jpg / H.264 files.

We have not been able to find a non-Apple viewer for HEVC encoded "Live Photos". Trying to transfer them via Dropbox resulted in a "Live Photo" .heic file containing a single image (no sound or other animation). Sorry, no "Live Photos" for you!

File Structures

Now that we know how we can view iOS created HEIF images and videos, lets take a closer look at the actual file formats.
This will be a (reasonably?) short overview - we aren't going to become "data masochists" and delve into every field or the compression side of things. Maybe in a future post (especially if you've been a bad, bad, dirty, dirty monkey and feel the need to be punished LOL) ...

Apple created .heic and .mov files are BIG Endian.

Both .heic and .mov files are based on the ISO Base Media File Format. This means a .heic or .mov file container is divided into dozens of functional "boxes" of data. The start of each box will be marked with a 4 byte box size (typically) and a four byte box type string (eg. 'ftyp', 'mdat', 'meta'). Within the box, there will be other data fields which may consist of other boxes and/or a structured pattern of bytes. So there is a complicated hierarchy of boxes within boxes thing happening which makes it difficult to quickly understand every detail. The majority of the bytes (ie compressed data) will be stored in an "mdat" box. Other boxes will be used to store meta data about how to access/treat the data in those "mdat" boxes.
For further details on how these boxes are structured, please refer to the ISO Base Media File Format standard. Both it and the Quicktime movie format document will be your best friends for this section. FYI the ISO Base Media File Format is also used for .mp4 and .3gp files - so learning about this format will aid in understanding multiple types of media files.

Other handy references include:
- Chapter 3 of Lasse Heikkila's HEIF implementation thesis
- the Nokiatech HEIF Github site
- the 2017 Apple WWDC HEIF presentations (follow the transcript and slide PDF links) for the HEIC File Format and the Intro to HEIF amd HEVC.

For an Apple iPhone 8 Plus .heic file (containing a single 4032 x 3024 image) the file structure can look like this:

ftyp (size=0x18, majorbrand = 'heic', minorversion = 0, compatiblebrands = mif1, heic)
meta (size = 0xF74)
    hdlr (size = 0x22, handler_type is "pict" i.e. file is an image)
    dinf (size = 0x24)
    pitm (size = 0xE, item_ID = 0x31 = primary item)
    iinf (size 0x43D, entry_count = 0x33 = number of items stored)
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x1, item_type = hvc1, item_name = "" [Tile 1]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x2, item_type = hvc1, item_name = "" [Tile 2]
        ...
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x30, item_type = hvc1, item_name = "" [Tile 30]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x31, item_type = grid, item_name = "" [derived image from all tiles]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x32, item_type = hvc1, item_name = "" [thumbnail]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x33, item_type = Exif, item_name = "" [EXIF]
    iref (size = 0x94, version = 0, contains array of SingleItemTypeReferenceBox)
        dimg (size = 0x6C, from_item_ID = 0x31, reference_count = 0x30, to_item_ID = 0x1, 0x2 ... 0x30) [derived image]
        thmb (size = 0xE, from_item_ID = 0x32, reference_count = 0x1, to_item_ID = 0x31) [thumbnail]
        cdsc (size = 0xE, from_item_ID = 0x33, reference_count = 0x1, to_item_ID = 0x31) [content description ref / exif]
    iprp (size = 0x6F3)
        ipco (size = 0x5AD) = ItemPropertyContainerBox = property data*
            colr (size = 0x230) = Colour Information 1
            hvcC (size = 0x70) = decoder configuration 1
            ispe (size = 0x14) = spatial extent 1-1
            ispe (size = 0x14) = spatial extent 1-2
            irot (size = 0x9) = Image rotation 1
            pixi (size = 0x10) = Pixel information 1
            colr (size = 0x230) = Colour Information 2
            hvcC (size = 0x70) = decoder configuration 2
            ispe (size = 0x14) = spatial extent 2-1
            pixi (size = 0x10) = Pixel information 2
        ipma (size = 0x13E) = Item Property Association = connects property data in ipco to item numbers*
            List of 0x32 items. Each item has the structure [item number(2 bytes), size (1 byte), data (size bytes)]
    idat (size = 0x10)
    iloc (size = 0x340, version = 1, offset_size = 4, length_size = 4, base_offset_size = 0, index_size = 0, item_count = 0x33 )
        [item_id = 0x1, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X1, extent_length = Y1]
        [item_id = 0x2, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X2, extent_length = Y2]
        ...
        [item_id = 0x33, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X33, extent_length = Y33]
mdat (size = variable, contains data on EXIF / thumbnail / image data)

It looks a little daunting (and this doesn't even show all of the boxes/fields!) but once you figure out which fields are relevant, its not too bad. We've color coded some sections to make it more followable/wake up those weary eyes.

The ftyp section declares the 'majorbrand' (i.e. file type) as "heic".
The meta section declares how to interpret the raw data stored in the mdat section. Notable sub-boxes include:
    hdlr = The 'handler_type' is set to "pict" which means this is an image (as opposed to a video).
    pitm = Specifies the Primary Item number (eg item_ID 0x31)
    iinf = Contains a list of ItemInfoEntrys. The number of items and sizes will change with resolution/shape (eg camera specs, square photo). From the 2017 WWDC 513 presentation and actual iOS samples we've observed, images are divided/stored as tiles.
            For a 4032 x 3024 resolution image, there were 0x33 items declared in each .heic file. These consisted of:
            0x30 items with each item_type = 'hvc1'. Each item corresponds to a 512x512 tile.
            1 'grid' item represents the full derived image
            1 'hvc1' item is used for the 320x240 thumbnail
            1 'Exif' item is used for storing EXIF data
    iref = contains array of SingleItemTypeReferenceBox items. From this section we can see that item_ID = 31 is a derived image ('dimg') that refers to item_IDs 0x1 to 0x30 (tiles). There are also references to the thumbnail and exif items.
    iprp = connects item_IDs in the 'ipma' sub-section to properties in the 'ipco' sub-section. *We were unable to find much public documentation on how this is implemented (apart from the Nokia HEIF Github source code).
    iloc = contains file offsets for each item_ID section. e.g. For EXIF (item_ID = 0x33), the extent_offset = 0x000043DB, extent_length = 0x000007F8. So if we go to the file offset at 0x000043DB, we will see the EXIF item data.
The mdat section contains the raw image data, thumbnail and Exif information.

Due to the tiling, full file recovery could be ~~a bastard~~ a lot more complicated compared to recovering a jpeg (where you can carve everything between the 0xFFD8 and 0xFFD9 markers).
As iOS 11 also uses file based encryption, it *should* be impossible to carve & recover .heic files anyway.
However, if those .heic files were also copied to a separate non-encrypted device (eg PC) and then corrupted/deleted, it *may* be possible to repair or recover some/all of the tiles (theoretically!).

OK, suck it up buttercups ...because there's more!

Here's the file structure for a 6.73 second 7.3 MB Apple HEVC / H.265 encoded .mov taken with an iPhone 8 Plus:

ftyp (size = 0x14, majorbrand = 'qt ')
wide (size = 0x8)
mdat (size = 0x00746120, contains HEVC / H.265 video data)
moov (size = 0x0028FA)
    mvhd (size = 0x6C, version = 0, creation_time = 0xD5FFE81E (secs since 1JAN1904), modification_time = 0xD5FFE825,
timescale = 0x0000258 = 600 dec. units per sec, duration = 0x00000FC7 = 4039 dec units => 4039/600 = 6.73 secs,
                next_track_ID = 0x5)
    trak (size = 0x0FE6)
        tkhd (size = 0x5C, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x1,
                  duration = 0xFC7, width = 0x07800000 => 0x780 = 1920 decimal, height = 0x04380000 => 0x438 = 1080 decimal)
        tapt (size = 0x44)
        edts (size = 0x24)
        mdia (size = 0xF1A) = media box
            mdhd (size = 0x20, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x258,
                        duration = 0xFC7)
            hdlr (size = 0x31, component type = mhlr = media handler, component subtype = vide, component manufacturer = appl,
                     component name = "Core Media Video"
            minf (size = 0xEC1) = contains file offsets to samples/chunks of samples
    trak (size = 0x07B4)
        tkhd (size = 0x5C, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x2,
                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        mdia (size = 0x72C)
            mdhd (size = 0x20, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x000AC44 =
                        44100 samples/sec, duration = 0x00049000 = 299008 samples = 6.78 sec)
            hdlr (size = 0x31, component type = mhlr = media handler, component subtype = soun, component manufacturer = appl,
   component name = "Core Media Audio"
            minf (size = 0x6D3) = contains file offsets to samples/chunks of samples
    trak (size = 0x042E)
        tkhd (size = 0x5C, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x3,
                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        tref (size = 0x20)
        mdia (size = 0x386)
            mdhd (size = 0x20, version = 0, creation_time = 0xD5FFE81E, modification_time = D5FFE825, timescale = 0x258,
                        duration = 0xFC7)
            hdlr (size = 0x34, component type = mhlr = media handler, component subtype = meta, component manufacturer = appl,
   component name = "Core Media Metadata"
            minf (size = 0x32A) = contains file offsets to samples/chunks of samples
    trak (size = 0x0271)
        tkhd (size = 0x5C, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x4,
                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        tref (size = 0x20)
        mdia (size = 0x1C9)
            mdhd (size = 0x20, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x258,
                        duration = 0xFC7)
            hdlr (size = 0x34, component type = mhlr = media handler, component subtype = meta, component manufacturer = appl,
   component name = "Core Media Metadata"
            minf (size = 0x16D) => contains file offsets to samples/chunks of samples
udta (size= 0x08)
free (size = 0x400)
meta (size = 0x5BD)
    hdlr (size = 0x22, component subtype = mdta)
    keys (size = 0xC9) => contains various metadata field names
    ilst (size = 0xCA) => contains various metadata field values
    free (size = 0x400)
free (size = 0x88)

We can see some familiar 4 letter strings (reckon you might be spouting some others of your own by now ...) and the offset information is now contained in the 'moov' section (recall that offset info is stored in the 'meta' / 'iloc' section for a .heic).
Also, instead of utilising "items" like .heic, the movie is organised into traks (eg video trak, sound trak). These 'trak' boxes include file offsets to the 'mdat'section (via 'trak' / 'mdia' / 'minf').
The 'moov' box has a movie header atom labelled 'mvhd'. This shows the length of the movie and creation/modified dates (amongst other things).
There were 4 traks recorded - one for video (track_ID=1), one for sound (track_ID=2) and two for meta data (track_ID=3 and 4). The second (smaller) metadata trak (track_ID=4) may be extending the first (track_ID=3) metadata trak (due to space limitations?) as the metadata strings are different but seem related.
'free' marks boxes that can be ignored/skipped
'meta' marks a box containing metadata however, the 'meta' structure from a .mov *will not* match the 'meta' data structure from a .heic image. Presumably Exiftool will grab metadata from both 'trak' and 'meta' boxes.

In other observed H.265 .mov files (both smaller and larger), multiple 'mdat' sections were observed. This may be related to the existence of a 'hoov' box which we couldn't find any documentation on. The 'hoov' box appeared at an lower (earlier) file offset than the 'moov' and also contained 'mvhd' and 'trak' boxes etc.
Perhaps the 'hoov' box was a previous 'moov' box that had its name modified so its data can be overwritten? eg as file grows in size, data gets re-written? #SpeculatorMonkey

Final Thoughts

Oh, my aching lederhosen!
And this post has only just scratched the ~~arse~~ surface of the HEIF-y beast.
There are a lot more possibilities with HEIF than what Apple has currently implemented. The Nokiatech Github site demonstrates a bunch of different image file possibilities (eg single images, sequences of images, HD movies, combined images/video).

Although we weren't able to capture a native Apple "Live Photo" for examination, we *suspect* it will use a sequences .heics file and have a 'moov' box in addition to 'ftyp', 'mdat' and 'meta' boxes. This was kinda shown on slide 60 of the 2017 Apple WWDC slides for High Efficiency Image File Format.

This post by macrumors.com states that Apple "Live Photos" initially consisted of a 12 MP jpeg with 45 frames of H.264 .mov at 15 fps (i.e. 3 secs video = 1.5 secs before/after button press).
This anandtech forum article states that "Live Photos" are:

"1440x1080 HEVC on certain devices, albeit paired with HEIC images now instead of JPEG. There is also the option of leaving it has 1440x1080 h.264 with JPEG though."

Anyhoo, if you are able to catch a "Live Photos" unicorn file, we'd be very interested to hear about its file structure (leave a comment?).

UPDATE 15OCT2017:
For "Live Photos" we tried directly connecting an iPhone 8 running iOS 11.0.3 to a Windows 7 PC and was able to see the DCIM folders. The iPhone 8 was set to the default "High Efficiency" Photos with Mac/PC transfer set to "Keep Originals". However, after copying the files over, when we looked at the transferred .heic file structures on the PC they were single images.
There were no 'moov' or 'trak' items. So it looks like iOS is not openly exposing their "Live Photo" file structure to non-Apple devices. Boo! :'(

Finally, if you know of any other easy to install/use .heic viewers or have any thoughts/suggestions, please leave a note in the comments.

↧

A Monkey Forays Into USB Flashdrives

March 6, 2020, 5:43 am

≫ Next: Recovering and Replaying Garmin Voice Instructions

≪ Previous: Monkey takes a .heic

What a Feeling Indeed!

Recently monkey was tasked with extracting data from a broken USB flash drive that had previously been "repaired" by another party. It still did not work however.
The following post details the journey to getting the device working again.
It also shows the power of reaching out to more experienced experts. You never know where/when you might find that missing piece of the puzzle!

Special Thankyous to the following monkey-enablers for their assistance/advice/enduring my endless emails:
- Sasha Sheremetov from Rusolut
- Jeremy Brock from RecoverMyFlashDrive
- Maggie Gaffney from Teel Technologies USA
- Cory Stenzel (Twitter - Cory also encouraged me to write this post)
- Ryan Olson

We started with a non-functioning 16 GB USB flash drive with no case or obvious branding. The original USB connector had broken off and was replaced with a different one provided by the repairer.
The drive looked similar to this example posted by atomcrusher on Reddit:

Example of a repaired connector for a USB Flash Drive (Note: this is NOT our repair device)

Initial Observations

- Plugging the device in to a USB power source via a USB Volt/Current meter showed it was not consuming much/if any current. We did not plug it into a PC initially because we did not know what data was stored on the device (e.g. USB Rubber Ducky ). We later used an older sacrificial standalone PC during subsequent testing.

- The USB drive only used 4 pads for the connector (GND, 5V, D+, D-) so it was probably USB2. USB3 uses 9 pins (USB3 uses more data channels). The electroschematics.com website has a good introduction to USB devices here.

- There appeared to be a missing component near the activity LED.

- There was a Phison PS2251-68-5 NAND controller on one side of the circuit board and a Toshiba 16 GB Embedded MultiMediaCard (e.MMC) chip on the other side.
For older USB flash drives, the NAND controller is usually a squared shaped chip (e.g. LQFP48 = Low Profile Quad Flat Pack with 48 pins) similar to the one shown in the Reddit example pic.
The controller chip is responsible for translating the host device's (ie PC's) USB read/write instructions into commands that the memory chip can understand. The controller also looks after wear levelling of memory and determines how each write is stored (physical location/any error correction/data deletion).
The presence of an e.MMC memory chip was somewhat unexpected. Older USB flash drives typically use a single NAND controller chip and separate NAND memory chips (usually TSOP48 chips where TSOP48 = Thin Small Outline Package with 48 pins.). Here's an example diagram of a TSOP48 package from the Elnec chip reader website.
An e.MMC chip is different because it combines its own onboard memory controller with some NAND memory. The e.MMC chip package is usually BGA (BGA = Ball Grid Array). ie the signals travel via solder balls on underside of the chip. There are no other external pins like LQFP48. See Wikipedia for futher details .
A quick Google of the Toshiba part number written on the BGA confirmed that it was a 16 GB e.MMC chip.
Hmmm ... e.MMC chips are usually more expensive than regular NAND memory. Why would a USB flash drive manufacturer use e.MMC chips when there's already a dedicated Phison NAND controller chip on the board?
Sasha from Rusolut mentioned this possibility during his very informative Visual NAND Reconstructor course. Typically, e.MMC chips used in this type of arrangement are discounted factory seconds - they have a faulty/disabled internal controller but the NAND memory is OK so they're sold as cheaper NAND chips.
Interestingly, Rusolut also have a solution to read e.MMC chips via NAND interface points (ie it accesses the NAND memory directly and avoids talking to the internal controller). Unfortunately, I don't have access to that wonderful tool and NAND reconstruction into a filesystem can be very complicated and heavily device dependent. For example, two flash drives which use the same controller but run different firmwares can organize the NAND in different ways.
Normal functioning e.MMC chips can also be removed and read via adapters (e.g. USB-eMMC adapters) that talk to the internal e.MMC controller. However, because our device's e.MMC probably has a faulty/disabled on-board controller, such a read would not be accurate/reliable.

So our initial strategy was to try to repair the flash drive, get it recognized by a PC and then image it via FTK Imager.

The Journey begins!

We started by gathering as much information about the repair drive as possible.
We Googled for any datasheets for:
- the Phison controller model
- the Toshiba memory chip model
We didn't have a lot of success finding datasheets however, we also noticed there was a serial number (POVK568FS1400311) printed on the PCB.
Googling for the serial number led us to this post by Jeremy from RecoverMyFlashDrive.
From the post's pictures, we could see the same circuit design layout, the same component labels and the same controller chip as the repair job but a different e.MMC chip (still a Toshiba e.MMC though).
To better conceptualize the layout, I recommend you check out the board pictures via the RecoverMyFlashDrive link above and have them open in a separate tab to follow along.
After finding Jeremy's post, I wrote him an email (not really expecting a reply) but Jeremy ended up becoming an incredible help.

Maggie Gaffney from Teel Technologies is a great friend always willing to help and she also teaches a course in "Board Level Repair for Digital Forensic Examiners". So naturally, she was one of the first people I reached out to for advice.
Maggie suggested soldering to the exposed TSOP48 pads and then trying to get a read from the e.MMC chip. Unfortunately, this would also require some NAND reconstruction which I was trying to avoid.
Maggie also provided some tips about using hot air to reflow the e.MMC BGA chip. Reflowing / heating up a BGA chip with a hot air rework station can help re-connect any solder balls which have become disconnected. However, if there's any epoxy/filler between the chip and the PCB, the reflow may not work. Because there appeared to be some sort of translucent silicon filler used, I held off doing any reflow on the e.MMC chip.

I recently saw a Rusolut training testimonial Youtube video by Cory Stenzel and recognized his name from his posts on various digital forensics Google groups. Hoping that he might also be able to help, I sent him an email and was pleasantly surprised when he replied that he was familar with this monkey from attending both SANS FOR585 and Maggie's board repair course.
Cory also had some great ideas/research to share. For starters, Cory sent me this link about "Test mode" for USB Controllers.
Using the power of GoogleTranslate baked into Chrome, it seemed like a promising lead however, Cory noted that he hadn't been able to use it on an exhibit yet.
So I checked with Sasha from Rusolut who added that test mode might help troubleshoot if the (Phison) controller is working but it will not allow for data transfer from the NAND. D'OH!
However, Cory's link also had a diagram listing the pinouts for the Phison USB Controller PS2251-67. This was helpful because I was unable to find a pinout diagram for the PS2251-68-5 controller on our repair device. Ass-uming the pinouts would not have changed much/at all for such a revision, this diagram could help us determine if the controller was being supplied with the correct voltage.
Here's the pinout pic with the relevent pinout on the right hand side:

PS2251-67 Controller pinout from https://www.usbdev.ru/articles/testmod/ (use Right Hand Side)

Cory had some other good suggestions such as:
- Apply a little bit of pressure to each of the (Phison) controller pins to check that they are soldered correctly to the board. I took Cory's advice a fraction further and used a soldering iron and some flux gel to ensure each Phison pin was electrically connected to its corresponding pad. I then used a multimeter to verify the soldering reflow did not introduce any shorts between adjacent pins.
- Using a good thermal imaging camera to look for hotspots (ie potential short circuits) when the device was plugged in. Unfortunately, I don't have a thermal camera but it does seems like a worthwhile future investment.

I also asked for advice from the Teel Technologies "Physical Mobile Forensics" Google Group and Ryan Olson replied that most of the success stories he's seen involved transplanting the memory chip onto an identical donor. Unfortunately, we probably would not be able to confidently source an identical donor and transplanting a BGA chip is not as easy as a TSOP48 (at least not for this monkey). Additionally, Rusolut support mentioned that if the controller hardware and firmware is not an exact match, the controller on the new board may erase/format data.

So to summarize our progress so far - we have a USB2 flash drive that does not draw current and is not recognized by an isolated Windows PC when its plugged in.
Plugging in a known working USB2 flash drive into the same port using the same cabling works OK before/after the damaged device. So there's something wrong with our repair device and not our test PC.

Looking at the pictures from Jeremy's post we can see that besides the controller and the e.MMC components, most of the other components are either resistors (small black rectangular components) or capacitors (varying size brown rectangular components).

After unsuccessfully plugging it into a PC, we noticed there was a missing component (labelled "BC9") next to the LED (labelled "D1"). Sourcing a replacement component was somewhat tricky because we did not know the exact make/model of device. However, by opening up a bunch of test USB2 flash drives with the same approximate age/capacity, we eventually found a similar looking LED arrangement on a Verbatim Store N Go flash drive. This reference drive used a different Phison controller but had a similar LED layout configuration. The missing component "BC9" was a capacitor connected to the LED. This was also confirmed later when we saw Jeremy's post pictures.
So using a pair of soldering tweezers and some flux, we transplanted the LED and capacitor from our test Verbatim Store N Go drive to our repair drive.
We tried our repair device again but it still could not connect. Perhaps this was all a "LED herring" after all? *crickets*

After double checking the repaired USB connector against the following pic from electroschematics.com

USB A Connector Pinouts

it appeared the previous repairer had soldered on the connector in REVERSE (ie 5 V was connected to Ground and vice versa). Some choice adult phrases may have been uttered. To quote Blackadder: "I think the phrase rhymes with clucking bell".
Silly monkey was also angry that he did not check the connector orientation first.

Hint: The outer metal of the USB connector is typically connected to Ground. By using a multimeter (on the continuity setting) with one probe on the USB ground pin and the other probe on the USB connector should indicate a connection.
Another helpful hint is to cut off the PC end (male) plug of a USB extension cable and then connect the other end of the cable (female) to your test device. This will make it easier to probe the various USB signals later. Rather than trying to fit a multimeter probe into the flash drive's USB connector (or on the pads only located on one side of the PCB), you can now connect a probe to the exposed wires of the extension cable and probe easier. Here's pic of a similarly modified cable from a stackexchange.com post here:

Use a modified USB extension cable as a tester cable

So we now know the USB Ground and 5V pins were reversed - what about the other USB pins, D- and D+? Were they reversed as well?
Fortunately, we had another test device which used the same Phison controller so we traced the USB D- / D+ pins to the test Phison controller and connected our repair device accordingly.

We plugged in our newly corrected device and still nothing ... Mother of a baboon!
Perhaps some damage was done by the reversed voltages?
Using the multimeter, we checked all capacitors and resistors for continuity and there were no shorts detected on any capacitors. Some resistors were reading 0 Ohms but apparently this is not uncommon when a manufacturer wishes to either future proof a design or use the 0 Ohm resistor as a kind of overcurrent component that blows before whatever else is downstream.
While the device was plugged into the PC, I started measuring some voltages across some capacitors and was surprised when Windows played the USB connection sound and briefly allowed File Explorer to view some directory contents. This only lasted about 30-60 seconds (long enough to grab a screenshot of a directory) but it was a good sign - the drive wasn't complete cactus. Interestingly, the volume name displayed was "Verbatim16". Unfortunately, I couldn't reconnect any further despite many subsequent attempts.

I shared this development with Jeremy from RecoverMyFlashDrive and he helpfully found a similar drive and shared some capacitor voltages that he saw on his device. One voltage in particular was different to our repair device. Jeremy observed 5 V across a capacitor "C7" connected to pin 47 of the Phison controller. On our repair device, the voltage to pin 47 was 3.7 V.
Looking at the usbdev.ru pinout showed pin 47 was the controller's 5 V supply pin. So if we could provide it with 5 V, it *might* just work.

For completeness, the controller was also getting its expected 3.3V on various controller pins - it just seemed to be the 5 V pin that was undervolted. Interestingly, the TSOP48 pins for Vcc were also getting 3.3 V so a read via the TSOP pins and reconstruction via Rusolut VNR probably would have worked as well but it would have taken me a lot more time.

My first thought was to solder a copper jumper wire from the 5 V USB pin direct to pin 47 but after cross checking with Jeremy, it was decided not to do this in case it bypassed some internal controller safety mechanisms.
After unplugging the repair device from the PC, I measured the resistance between the USB 5 V pin and pin 47. On the repaired device, the resistance was 170 Ohms. On a reference device with the same controller (but different PCB layout), the resistance between USB 5V and pin 47 was ~2 Ohms. Bit of a difference!
So on the repair device, I traced the path between the USB 5V and pin 47 and found most of the resistance seemed to be coming from "R1".
If "R1" had failed and had increased its nominal resistance, then there would be less available current to pin 47. Remember: Voltage (V) = Current (I) x Resistance (Ohms).
So I decided to replace "R1" with a 1 Ohm resistor instead. This would make the USB 5 V to pin 47 resistance on the repair device comparable in value to the reference device.

I plugged our newly modified test device into the test PC and ... BINGO!

The drive connected / was recognized and stayed connected. I was able to grab some screenshots of each directory and then finished imaging it via FTK Imager and a software write blocker program.

Bananas all round!

Further Thoughts

Reversing the voltage on a USB flash drive isn't necessarily a permanent drive killer.

When performing data recovery, don't reach for the nuclear option first (eg chipoff / NAND reconstruction)- it might just be one or two components that require replacement.

Don't be afraid to reach out to others for advice.

With the increasing levels of on device encryption, there will be a corresponding demand for repairing damaged devices instead of removing the memory and reading off-device.
Consequently, having basic hardware troubleshooting skills will be increasingly useful.

If anyone is interested in repair courses, Maggie teaches a "Board Level Repair for Digital Forensic Examiners" course. I've been wanting to attend for a while and if the travel gods and the current global COVID-19 outbreak permits, hopefully I will be attending the April 2020 course.

For further repair tips/techniques, check out these Youtubers:

HDD Recovery Services
https://www.youtube.com/user/hddrecoveryservices

Justin from The Art Of Repair
https://www.youtube.com/channel/UCG8Y3ARZq5s-FyasBOGNrnQ

Jessa Jones ipadrehab
https://www.youtube.com/channel/UCPjp41qeXe1o_lp1US9TpWA

Louis Rossman
https://www.youtube.com/user/rossmanngroup

Finally, if you have any comments, suggestions or resources that can help others troubleshoot USB Flash devices, please leave a comment below.

PS Please don't ask me to recover your personal Flash Drives. Ask Google instead :)

↧

Recovering and Replaying Garmin Voice Instructions

May 24, 2020, 7:26 am

≫ Next: iOS14 Maps History BLOB Script

≪ Previous: A Monkey Forays Into USB Flashdrives

Wait a minute monkey, did you say Carmen or Garmin?

We had a damaged Garmin nuvi 56LM GPS unit from which we recovered a text file containing a voice log.
It was a bit of an unusual process so we thought it might be interesting to share the story.
As a result of this effort, monkey wrote a Python3 script (parse_garmin56LM.py) that uses the free espeak-ng library to convert the Garmin 56LM voice log text to WAV files for better place name recognition. The script is available from GitHub.

Special Thanks to the following people for their assistance:
Sasha Sheremetov from Rusolut for his advice with the data recovery
Ken Case from Berla for his advice regarding GPS data logs
Benjamin "BJ" Duncan for sharing his findings regarding the voice log
Katie Russ for creating her helpful website

Our story begins with a damaged Garmin nuvi 56LM which "provides easy-to-follow, spoken turn-by-turn directions with street names". Well, it used to!
Wikipedia states it was released in 2014.

Google found an interesting paper from around that time -"Garmin satnavs forensic methods and artefacts: An exploratory study" by Alexandre Arbelet (August 2014).
And while it did not cover our model, it mentioned GPX files as a potential source of GPS tracklogs.
Sounds promising eh?

Unfortunately, our device was damaged beyond repair so chipoff was our only option.
The chip was a SanDisk 8 GB eMMC chip. Multiple reads of the chip produced the same hash so the chip seemed pretty stable.
X-Ways, Autopsy, Oxygen Forensic Detective and FTK Imager did not recognize partitions from the dump.
Cellebrite Physical Analyzer's Garmin Legacy chain also did not not extract any information from the dump.
ASCII plaintext was visible in dump though so not all hope was lost.

Ken from Berla advised that Garmin usually use a FAT32 partition which contains the GPX tracklogs.
Unfortunately, the first 512 byte sector of the dump did not end with the usual 55AA for an MBR so we needed to find a way to extract the filesystems.

Sasha from Rusolut suggested using R-studio to recover/extract the filesystems from the dump.
Great Success!
There were 2 extracted partitions - FAT16 (128 MB) and FAT32 (3.3 GB - a little smaller than expected. Maybe Garmin used 8 GB chips for commonality/ease of upgrade reasons?)

Some noteworthy files were found while looking for timestamped latitude/longitude coordinates ...
Note: This was based on the contents of ONE device, other devices/models probably store their data differently.

FAT16/.System/SQLite/RecentStops.db
Contained a "history" table with scaled latitude/longitude numbers. By multiplying these raw numbers by 180/2^31, we were able to obtain plottable latitude/longitude coordinates.

FAT16/.System/SQLite/pre.db
The "route_segment" table contained timestamp and latitude/longitude route info (unsure if these were travelled).
The "history" table had start latitude/longitude, end latitude/longitude and start times.
Note: Garmin timestamps measure seconds since 31 DEC 1989 (Garmin launch date) - see here for further details.
By adding 631065600 seconds to the Garmin numeric timestamp, you have the number of seconds since the 1970 Unix epoch which then makes it easier to find the human readable time (there are more tools supporting Unix epoch time than for Garmin time).

FAT16/.System/Diag/EventLogs/*.TXT
Contained possible latitude/longitude coordinates with timestamps. However, the file also contained non-ASCII/binary bytes.

FAT16/.System/SQLite/quick_search_list.db
FAT16/.System/SQLite/recent_searches.db
FAT16/.System/Logs/searches.txt
Contained potential timestamped search information.

FAT16/.System/GPS/ARC.bin
Contained ASCII latitude/longitude strings (search for "GPS main")

FAT16/.System/Trips/Current.trip
Unknown file format which possibly contains the current trip log?

FAT32/Garmin/GarminDevice.xml
Contained various settings, version and model info.
It also mentioned a GPX directory with references to .gpx files (which did not exist in our device).
Sasha from Rusolut confirmed that his test unit had these files.
I'm not sure why there was a discrepancy - perhaps it was due to regional differences or the user settings?
Looking up the file format for GPX logs showed that they are XML text files which use certain keywords/field names to record the latitude and longitude.
Searching the entire dump for the "trkpt" and "trkType" keywords did not find any GPX formatted data though.

All semi-interesting info so far ...
However, BJ also brought our attention to an interesting artifact that spawned this post ...

FAT32/Voice/logs/vpm_log_all.log
This text file appears to chronologically log GPS spoken instructions. Due to our lack of test devices, we can't guarantee the vehicle was at the exact location spoken but it might be used to show the user was in or aware of the area.

As an example, the voice log contained lines like:

D[2019/06/12 07:55:03] {22ce6e88} [vpm_tts_parse.c:vpm_tts_parse:3770] Navigation phrase selected: Keep right $USR_TO_NEXT_ROAD. (229)
D[2019/06/12 07:55:03] {22ce6e88} [vpm_tts_log.c:vpm_tts_log_phonetics:277] Map Phonetics: "nju "IN|gl@nd *"haI|%we (MDB Lang: 23)

The spoken voice section is the string occurring between "Map Phonetics: " and "(MDB Lang: 23)".
e.g. "nju "IN|gl@nd *"haI|%we
Another example could be:
"wE|st@n *"mo|t@|%we

The line format can be generalized as:
D[YYYY/MM/DD HH:MM:SS] {4byte hex id? process/thread id?} [vpm_tts_log.c:vpm_tts_log_phonetics:277] Map Phonetics: VOICE_STRING (MDB Lang: 23)

Where VOICE_STRING looks like a localized pronunication guide for the system.

Trivia note:
There were also various lines with the string "Voice Language: Australian English-Karen (TTS)" which seems to indicate which voice the user heard.
A bit of Googling foundthe voice of Karen who has also done other recognisable voice over work (e.g. Garmin GPS – Australian Karen, Navman GPS – Australian Karen, Apple iPhone 4s & 5 – Australian Voice of Siri).

"OK, Karen", there must be a system involved with the pronunciation of the string but at the time, monkey thought it was probably proprietary to Garmin.
Fast forward to a couple of weeks ago and monkey had a mini-breakthrough.
There's a system for pronunciation called the International Phonetic Alphabet (IPA).
If you've read Wikipedia or a dictionary you've probably seen these weird pronunciation symbols and sarcastically thought "Yeah that helps".
For example, the Wikipedia page for "Cooking banana" uses IPA to descibe how to pronounce "plantain".
See the highlighted text containing the weird symbols in the picture below:

Example of IPA pronunciation (highlighted text)

However, if we are limited to the 95 character printable English ASCII set, we need an additional method of encoding those weird IPA symbols.
There are a several methods available but for our purposes, we will limit the discussion to the Kirshenbaumand X-SAMPA systems.

Looking again at the voice strings from our log - it appears they are using the X_SAMPA system. e.g. double quotes for emphasis, use of { symbols etc.

Conveniently, Katie Russ has created a website using Amazon's Polly Speech API that can take (X-SAMPA) IPA strings and convert them into sound. And with configurable voices/accents too!
You can find her website here:
http://ipa-reader.xyz/

If you're so inclined, try copying and pasting the following text:
"nju "IN|gl@nd *"haI|%we
into the website and you should hear the equivalent "New England Highway" pronounced. Very cool!

IPA-Reader Website Example

This got monkey thinking - a website is good for one or two strings but copying and pasting hundreds of entries from a device log is not practical.
Amazon Polly isn't free either so that prompted a search for a free alternative.

We found an open source C library called espeak-ng.
While you can compile/build it from scratch, Ubuntu also has it as an installable debian package. Much easier!
Note: We had issues trying to install it on Ubuntu 16.04 (probably because its no longer supported) but had no issues with Ubuntu 20.04.

Here is the espeak-ng help page for Ubuntu 20.04.
It allows users to input a (Kirshenbaum) IPA string and then hear/record the corresponding audio to a WAV file.

To install it on Ubuntu 18+, type:

sudo apt-get install espeak-ng

Then you can use it like:

espeak-ng "[[Hello w3:ld]]"

Note: string is enclosed in double quotes

Its not as polished as Amazon Polly and it can get a bit confused by some strings but it works reasonably well (if you don't mind some Steeeephen Haaaawking like ASMR)
Also note it uses Kirshenbaum strings as input and not X-SAMPA strings (like in the voice log) so some conversion is necessary.
Note: We found that the comparison/conversion chart listed in Wikipedia did not quite translate for all of our data so our script had to do some customized conversions. These conversions may not sound correct to other users depending on the language.

Here's a summary of the conversion process we figured out:
- Enclose the input string between '[[' and ']]' chracters to have the symbols interpreted rather than spelled out
- Replace '{' characters with 'a'
- Replace double quotes " with single quotes ' for primary stress indication
- Replace '%' with ',' for secondary stress indication
- Replace 'A' with 'a'

For more details on the Kirshenbaum system see here.

For example, the voice log string is:
"h{|m@nd *"{|v@n|ju
which converts to:
[['ha|m@nd *'a|v@n|ju]]

And to hear it spoken as "Hammond Avenue" type:

espeak-ng "[['ha|m@nd *'a|v@n|ju]]"

Note: enclosing double quotes when entering via command line.

To save it as a WAV file you can use:

espeak-ng "[['ha|m@nd *'a|v@n|ju]]" -w output.WAV

Some converted strings may not sound right / recognizable so for those (hopefully rare) occasions, you can enter the voice log string into the ipa-reader.xyz website to hear the spoken phrase. It may also help to adjust the espeak-ng playback speed using the -s argument.

The Script

Now that we know how to get an audio file for one voice string, lets try automating the string extractions from the entire voice log.

Input file:vpm_log_all.txt
Output files: An HTML report table containing the original line text and line number, the converted text file for input into espeak-ng and a link to the output WAV file.

Here's the generalized script logic:
Open/Read vpm_log_all.txt
For each line:
    Extract the voice string from the line text
    Convert and write the voice string to LINENUMBER.txt
    Call "espeak-ng -s 100 -w LINENUMBER.WAV -f LINENUMBER.txt" to generate the WAV file
    Store (LINENUMBER, line text, converted voice text, WAV filename) in list

Read list
Print HTML table from list ("Log Line No.", "Log Line Text", "Processed espeak-ng string" (linked to text file), "Audio File" link)

And here is how to run it - this outputs .WAV, .txt and Report.html files to the given "op" output directory:

python3 parse_garmin56LM.py -f vpm_log_all.log -o op
parse_garmin56LM.py 2020-05-17 Initial
Directory op Created
185.WAV = [[*'maks|,wEl 'strit]]
...
2225.WAV = [['ha|m@nd *'a|v@n|ju]]

Processed 875 voice entries. Exiting ...

Here's the contents of the "op" output directory:

And here's what the "Report.html" looks like (with redacted timestamps):

From the above screenshot, you can see the table is pretty simple - click on the links to view either the input text file or open the WAV audio file.

Note: We tried calling "espeak-ng" from the script using the converted voice string (instead of a text file containing the string) but the generated WAV file kept getting truncated for an unknown reason i.e. words were missing. Using the text file input seemed to avoid this issue.

The script was written/tested with Python3 on Ubuntu 20.04 LTS but we only had one set of test data so it probably needs some tweaking.

Final Thoughts

We have succesfully written a script to extract and convert IPA strings from a Garmin nuvi 56LM (2014) voice log into WAV files.

The script's code (see "process_voicestring" function) for converting the voice log string to the espeak-ng input string may require some adjustment depending on the user data/language settings.

The script may also work for other models of Garmin GPS but this has not been tested.

If you see/have seen similar voice logs in other devices, it would be great to hear from you in the comments section.

↧

iOS14 Maps History BLOB Script

November 27, 2020, 6:34 am

≫ Next: Monkey Test Drives a Honda Accord

≪ Previous: Recovering and Replaying Garmin Voice Instructions

Another BLOBBY SQL (Sequel)!

A quick post to introduce a new iOS 14 Apple Maps History helper script ...
Thanks to Heather Mahalik for sharing her research and for both her and her associate Sahil's testing.
You can read about Heather's iOS14 research at her blog here.

The script (ios14_maps_history.py) focuses on the Apple Maps app's MapsSync_0.0.1 SQLite database which can contain the last 3-5 directions/searches.
There are 32 tables in the database but as we see in Heather's query below - most of the history info is stored in a table called ZHISTORYITEM and in the ZMIXINMAPITEM table. Both tables can contain protobuf BLOBs which are extracted by the script for further processing by the user along with an HTML summary report.

SELECT
ZHISTORYITEM.z_pk AS 'Item Number',
CASE
when ZHISTORYITEM.z_ent = 14 then 'coordinates of search'
when ZHISTORYITEM.z_ent = 16 then 'location search'
when ZHISTORYITEM.z_ent = 12 then 'navigation journey'
end AS 'Type',
datetime(ZHISTORYITEM.ZCREATETIME+978307200,'UNIXEPOCH','localtime') AS 'Time Created',
datetime(ZHISTORYITEM.ZMODIFICATIONTIME+978307200,'UNIXEPOCH','localtime') AS 'Time Modified',
ZHISTORYITEM.ZQUERY AS 'Location Search',
ZHISTORYITEM.ZLOCATIONDISPLAY AS 'Location City',
ZHISTORYITEM.ZLATITUDE AS 'Latitude',
ZHISTORYITEM.ZLONGITUDE AS 'Longitude',
ZHISTORYITEM.ZROUTEREQUESTSTORAGE AS 'Journey BLOB',
ZMIXINMAPITEM.ZMAPITEMSTORAGE as 'Map Item Storage BLOB'
from ZHISTORYITEM
left join ZMIXINMAPITEM on ZMIXINMAPITEM.Z_PK=ZHISTORYITEM.ZMAPITEM;

From the query we can see there 3 types of entry:
- "Location search"
- "Coordinates of search" (usually has a "Map Item Storage" BLOB in ZMAPITEMSTORAGE column)
- "Navigation journey" (usually has a "Journey" BLOB in ZROUTEREQUESTSTORAGE column)

Please note as per Heather's blog - "Time Created" (and presumably "Time Modified") are NOT accurate records of when the search was executed.

"Location search" entries (i.e. search location text) seem to be followed by "Coordinates of search" entries (containing the latitude/longitude of the search location).

When directions are requested, a "Navigation journey" entry is created with a "Journey BLOB" which contains the start/end locations.
However, "Navigation journey" entries also seem to be generated even if the user does not explicitly ask for a journey to be calculated. There were 2 such entries in Sahil's data despite him not navigating with the device.

Further research is required into these BLOBs - hence the script :)

The ios14_maps_history.py runs Heather's query and creates an HTML report (called iOS14-MapsReport.html) with links to the extracted BLOB files.
Each BLOB is extracted from the database and stored in the user's nominated output directory.
The script has been tested with Python3 on Ubuntu 20 and Win10x64.

Usage example for Ubuntu 20.04 LTS with Python 3.8.2:
python3 ios14_maps_history.py -d MapsSync_0.0.1 -o optest
This will output Heather's query to an HTML table with hyperlinks to each extracted BLOB file.
All files will be created in the user nominated "optest" directory.

The corresponding command line output looks like:

Running ios14_maps_history.py v2020-09-19

Processed 15 entries

Please refer to iOS14-MapsReport.html in "optest" directory
Exiting ...

Usage example for Win10 with Python 3.6:

c:\Python36\python.exe ios14_maps_history.py -d MapsSync_0.0.1 -o opdir
Here's what the output HTML table looks like:

Note: lat/longs were redacted to protect the not-so-innocent ;)

The outputted BLOBs can be researched further with tools like protobuf_inspector.
This is a very funky tool which pretty prints a protobuf and also interprets 64bit fields in multiple ways (useful for finding potential lat/longs).
Its also "pipping" easy to install ...

pip install protobuf-inspector

Monkey has used protobuf_inspector on extracted Apple Maps protobufs to find some destination Yelp reviews and what appears to be epoch millisecond timestamps (ref. 1JAN1970) which occur just after a GUID.
However, because its not my test data and its a small sample size, I can't confirm if its the time of search ...

Anyway, this is where this post ends ... Good luck to those about to dive further into the protobuf swamp!
If you happen to use this script to find something interesting, please leave a comment and share the knowledge :)

↧

Monkey Test Drives a Honda Accord

March 26, 2021, 5:26 am

≫ Next: Mike & the Monkey Dumpster Dive Into Samsung Gallery3d App Trash

≪ Previous: iOS14 Maps History BLOB Script

"The red ones go faster!" - original picture sourced from caranddriver.com

Monkey recently "test drove" ("test-parsed"?) a data dump from a 2016 Honda Accord (USA).
This post will describe that wonderful journey.

Special Thanks to Manny Fuentes who generously shared his Honda data. Without this data, this post and associated scripts would not exist.

The scripts are available from GitHub.

Parsing the dump with X-Ways Forensics showed 7 partitions, 6 of which contained the EXT4 filesystem. The first partition was not recognized by X-Ways.

Here is the breakdown according to X-Ways:

Partition1 = 251 MB Unknown
Partition2 = 879 MB EXT4 (System part1, ~8332 files)
Partition3 = 251 MB EXT4 (System part2, ~1275 files)
Partition4 = 1.7 GB EXT4 (User data, ~15115 files)
Partition5 = 251 MB EXT4 (21 files, contained timestamped logs)
Partition6 = 1.1 GB EXT4 (49 files, appears to contain Speech related data)
Partition7 = 125 MB EXT4 (966 files, mostly stored in "data_org.tar.gz")

Note: Two partitions (Partition2 and Partition3) contained /system directories.

Based on strings found in Partition2:\system\build.prop
The system was running Android 4.2.2 (ro.build.version.release=4.2.2) and it seems to be made by Clarion (ro.product.manufacturer=Clarion, ro.board.platform=r8a7791
ro.build.description=T2X-user 4.2.2 9TXX9211 211 release-keys).
The build date was 6AUG2015 16:57:16 UTC (ro.build.date.utc=1438880236).
Partition2:\system\app also contained various .apk and .odex files.

Partition3:\system\build.prop similarly confirmed the previous Android properties
ro.build.version.release=4.2.2
ro.build.date=2015
ro.build.date.utc=1431482880
ro.product.model=MY15ADA
ro.product.brand=Honda
...
ro.product.manufacturer=Clarion
ro.board.platform=r8a7791

Partition4 contained Android /user data (mainly under com.android, com.clarion, com.honda directories)
Also found on Partition4:

\property\persist.sys.timezone [contained ASCII text set to "US/Central"]

\system\alps\evolution\paired_device_list.txt [contained ASCII text listing various BT addresses and their device names. Was consistent with data found in bluetoothsettings.db]

\system\usagestats\usage-history.xml [contained a log of various timestamped Android Activitys. Not validated]

\data\com.android.settings\shared_prefs\bluetooth_settings.xml [contained a timestamp string value for "last_discovering_time". Appears to be millisecs since 1JAN1970. Not validated]

\data\com.clarion.displayaudio.apps.generalsettings\shared_prefs\com.clarion.displayaudio.apps.generalsettings_preferences.xml
[contained string values for "currentTimeZoneIdNotDaylight" (e.g. "US/Central") and "currentTimeZoneName" (e.g. "CST UTC-6")]

\data\com.clarion.displayaudio.apps.telephonyapp\shared_prefs\activity.PhoneTopActivity.xml
[contained the string "DEVICENAME" and what appears to be a MAC address - possibly the most used device?]

\data\com.honda.displayaudio.navi\Garmin\sqlite\quick_search_list.db [contained a "quick_search_list" table which was empty]

Partition5 contained various timestamped logs (e.g. ErrorLevelPower.log, ErrorLevelSoft.log, ErrorLevelHard.log)

Partition6 appears to contain various Speech related files.

Partition7 contained most of its files in "data_org.tar.gz".

The most interesting user related information was found in various SQLite databases under Partition4:/user.
For our data dump, there were 4 x SQLite databases which were of interest:

\data\com.honda.displayaudio.navi\Garmin\sqlite\RecentStops.db
\data\com.honda.telematics.core\databases\crm.db
\data\com.clarion.bluetooth\databases\phonedb.db
\data\com.clarion.bluetooth\databases\bluetoothsettings.db

Consequently, four Python3 parsing scripts were written/tested on Win10x64 running Python 3.9.
The four scripts work a similar manner - they take an input argument to the respective SQLite database and an output argument for the output TSV filename. They then run SQLite queries for the relevant data and output selected fields to the TSV file.

For our data, in addition to phonedb.db, there was a Write-Ahead-Log (phonedb.db-wal) in the same directory.
It is recommended to run the accord_2016_phonedb.py script twice in this type of scenario and compare the two outputs:
1. Run the script WITHOUT the phonedb.db-wal file present in the same directory as the specified phonedb.db
2. Run the script WITH the phonedb.db-wal file present in the same directory as the specified phonedb.db

Users do not have to specify the -wal file at the command line as SQLite will auto-magically incorporate the -wal file if present.
For our data, including the phonedb.db-wal file resulted in an extra 10 calls being found/output.

On to the scripts ...

accord_2016_recentstops.py reads RecentStops.db "history" table and outputs details to TSV file.
This table appears to document timestamped lat/long coordinates. We're not sure what triggers an entry.

RecentStops.db can be found at: \data\com.honda.displayaudio.navi\Garmin\sqlite\RecentStops.db

SQLite query used:"SELECT time, lat, lon, name FROM history ORDER BY time ASC;"

Usage example:

c:\Python39\python.exe accord_2016_recentstops.py -d RecentStops.db -o rsop.txt
Running accord_2016_recentstops.py v2021-03-22

Processed/Wrote 26 entries to: rsop.txt

Exiting ...

Output TSV format:

time lat lon name

Notes:
time is displayed as ISO formatted string interpreted as Garmin secs since 31DEC1989 (UTC)
lat & lon has the scaling factor applied = 180 / 2^31 to convert to degrees
name can include cross-street or location strings which can help confirm the calculated lat/long

accord_2016_crm_eco_logs.py reads crm.db "eco_logs" table and outputs details to TSV file.
This table appears to log various timestamped journey legs (timestamped odometer / trip range measurements).

crm.db can be found at: \data\com.honda.telematics.core\databases\crm.db

SQLite query used:"SELECT _id, trip_date, trip_id, mileage, start_pos_time, start_pos_odo, finish_pos_time, finish_pos_odo, fuel_used, driving_range FROM eco_logs ORDER BY _id ASC;"

Usage example:

c:\Python39\python.exe accord_2016_crm_eco_logs.py -d crm.db -o crmop.txt
Running accord_2016_crm_eco_logs.py v2021-03-22

Processed/Wrote 300 entries to: crmop.txt

Exiting ...

Output TSV format:

_id trip_date trip_id mileage start_pos_time start_pos_odo finish_pos_time finish_pos_odo fuel_used driving_range

Note:
trip_date, start_pos_time, finish_pos_time are displayed as ISO formatted strings interpreted as millisecs since 1JAN1970 (UTC)

accord_2016_phonedb.py reads phonedb.db "callhistory", "contact", "contactnumber" tables and outputs details to TSV file.
This database appears to log call history and contacts information.

phonedb.db can be found at: \data\com.clarion.bluetooth\databases\phonedb.db

Call History SQLite query:"SELECT _id, address, phonenum, calldate, calltype FROM call_history ORDER BY calldate ASC;"

Contacts SQLite query:"SELECT contact._id, contact.address, contact.firstName, contact.lastName, contact.phonename, contactnumber.number, contactnumber.numbertype FROM contact JOIN contactnumber ON contactnumber.contact_id = contact._id ORDER BY contact._id ASC;"

Usage example:

c:\Python39\python.exe accord_2016_phonedb.py -d phonedb.db -o op
Running accord_2016_phonedb.py v2021-03-22

Processed/Wrote 98 CALL entries to: op_CALLS.txt

Processed/Wrote 41 CONTACT entries to: op_CONTACTS.txt

Exiting ...

CALLS output TSV format:

_id address phonenum calldate calltype

Notes:
address appears to be a MAC address
calldate is displayed as ISO formatted string interpreted as millisecs since 1JAN1970 (UTC)

CONTACTS output TSV format:

_id address firstname lastname phonename contactnum contacttype

Note:
address appears to be a MAC address

accord_2016_bluetoothsettings.py reads bluetoothsettings.db "bluetooth_device" table and outputs details to TSV file.
This table appears to log Bluetooth device names and MAC addresses.
bluetoothsettings.db can be found at: \data\com.clarion.bluetooth\databases\bluetoothsettings.db

Note: There was also a "speed_dial" table but it was empty so we're not sure about how this table is populated

SQLite query used:"SELECT device_bank, device_addr, device_name FROM bluetooth_device ORDER BY device_bank ASC;"

Usage example:

c:\Python39\python.exe accord_2016_bluetoothsettings.py -d bluetoothsettings.db -o btop.txt
Running accord_2016_bluetoothsettings.py v2021-03-23

Processed/Wrote 6 entries to: btop.txt

Exiting ...

Output TSV format:

device_bank device_addr device_name

Note:
device_addr appears to be a MAC address

Final Thoughts

If you have a Honda dump of similar vintage, we'd appreciate if you could run the scripts and let us know how it goes.
Obviously, as the scripts were written using one set of data, there may be bugs / mis-ass-umptions.

Or if you can shed any more light on a Honda Android dump, we'd appreciate hearing about your findings.

Finally, if you can share a dump for any another vehicle and would like us to write some parsing scripts, please let us know.

Comments and Suggestions are also welcome in the comments section below ...

↧

Mike & the Monkey Dumpster Dive Into Samsung Gallery3d App Trash

January 7, 2022, 12:26 am

≫ Next: Monkey Attempts To Digest Some Google Takeout (DetectedActivitys)

≪ Previous: Monkey Test Drives a Honda Accord

Monkey assists Mike with another dive into the Samsung Gallery3d App

It all started with a post by Michael Lacombe(iacismikel at gmail.com) on the Physical and RAW Mobile Forensics Google Group in early November 2021.

The post involved a case where a Samsung mobile phone owner claimed that specific images were received but they were immediately deleted after being accessed. Mike was asked if it was possible to determine this. Not knowing the immediate answer to that question, he began to analyze the Samsung Android 9 device.

Mike found some Samsung Gallery3d app deletion artifacts visible in the local.db SQLite database. Specifically, he saw various timestamped log entries in the "log" table which were associated with encoded strings.After a helpful hint about the strings being base64 encoded from forum member "Tony", Mike set off further down the rabbit hole.

Along the way, he found this previous Cheeky4n6monkey post from 2016, Comparing that information to his current case data, he saw that things had changed considerably over the years but it was enough of a nudge to dig a little deeper. Mike asked if this monkey wanted to tag along and so the adventure began...

Here are some things we have learned on our journey... (mostly Mike, I was just the script monkey)

There are always new things to research

The Samsung Gallery3d app has been around for years and according to GooglePlay, it was last updated in 2019 with version 5.4.11.0
Opening the AndroidManifest.xml file from a test device's Gallery3d Android Package (APK) in Android Studio shows:

android:versionCode="1020000021"
android:versionName="10.2.00.21"

According to the Android Developer documentation, versionName is displayed to the user where as versionCode is a positive integer which increases with each release and can be used to prevent downgrades to earlier versions.

This app is updated frequently. When searching for test data, we found that nearly every device we looked at contained a different version of the app, which in turn, contained different information stored within the application folder and the database itself.

Digging into app artifacts can lead to additional information that is not currently being parsed

As far as we could ascertain, there were no commercial or non-commercial forensic tools which process the Samsung Gallery3d app database for deletion artifacts.

Some open source tools that we used to analyze the data and the APK include:
For Data Analysis:
-DB Browser for SQLite for viewing/exporting SQLite databases
-Cyberchef to base64 decode strings
-Base64 Decode and Encode website to base64 decode strings
-Epochconverter to confirm timestamp types
-Android Studio
For APK reversing:
-dex2jar to convert an APK's classes.dex to Java .jar
-JD-GUI to view source code from a .jar file
-JADX to view source code directly from APK file
We also wrote our own Python3 scripts to assist with batch conversion of base64 encoded strings and output to Tab Separated Variable (TSV) format.
These scripts are available here

Some observations for the Samsung Gallery3d app

This is a stock app installed on Samsung devices. It has library dependencies that are part of the Samsung Android framework. Consequently, there doesn’t appear to be an easy way (if at all) to install the application on a non-Samsung device.

The Samsung Gallery3d app is located on the user data partition at:

/data/com.sec.android.gallery3d
Files that are sent to the trash from within the app are located at
/media/0/Android/data/com.sec.android.gallery3d
Due to differences in each version of the application and that the research was driven by Mike’s case, we decided to focus this blog on that application version (10.2.00.21).
Within the /data/com.sec.android.gallery3d directory, there was a cache directory and a databases directory.

Cache Directory

There are multiple Cache sub-directories contained within data/com.sec.android.gallery3d/cache/

In this instance, the /0 folder contained larger thumbnail images, ranging in widths of 225-512 pixels and heights of 256-656 pixels while the /1 folder had smaller thumbnails ranging in widths of 51-175 pixels and heights of 63-177 pixels. There were also /2, /3 and /4 folders. /2 and /3 were empty and /4 had a single thumbnail that was 320x320 in size.

There doesn’t seem to be anything useful here beyond the thumbnails themselves. The names of the thumbnails seem to be generated using a hash algorithm.

Databases Directory

Contained within/data/com.sec.android.gallery3d/cache/databases/ is the local.db SQLite database.

This database contains various information including:

- Albums in the gallery ("album" table)

- A log that records various actions associated with the app ("log" table). eg move to trash, empty trash.

- Items that are currently in the Trash bin ("trash" table)

In later versions, we noticed another table called "filesystem_monitor". This contained timestamp, app package names (e.g. com.sec.android.gallery3d) and base64 encoded file paths. However, as this table was not present in Mike's case data and we are not sure what triggers these records, it requires further research and will not be discussed further.

Table Observations

"album" Table

Here is the "album" table schema:

CREATE TABLE album (
_id INTEGER PRIMARY KEY AUTOINCREMENT,
__bucketID INTEGER UNIQUE NOT NULL,
__absPath TEXT,
__Title TEXT,
folder_id INTEGER,
folder_name TEXT,
default_cover_path TEXT,
cover_path TEXT,
cover_rect TEXT,
album_order INTEGER,
album_count INTEGER,
__ishide INTEGER,
__sefFileType INTEGER DEFAULT 0,
__isDrm INTEGER DEFAULT 0,
__dateModified INTEGER DEFAULT 0
)

Here are some screenshots of an example "album" table:

"album" Table Screenshot 1

"album" Table Screenshot 2

Here are some selected "album" table fields of interest:

Field Name	Description
_bucketID	This is generated via callingthe Java hashcode algorithm on the full path of the album ("_abspath"). See java-hashcode.py script for a proof of concept script. Example value: -1313584517
_abspath	The path of the album. Example: /storage/emulated/0/DCIM/Screenshots
default_cover_path	The image associated with the corresponding album. Example: /storage/emulated/0/DCIM/Screenshots/Screenshot_20200530-054103_One UI Home.jpg
album_count	The current number of files stored within the album. Example: 14

Due to the file paths staying the same, _bucketID values have been found to be consistent across devices. This can help to show whether there are/were custom albums that were created, as well as application specific albums such as Facebook, Snapchat, etc. Recovering deleted records here can show deleted albums and names of deleted images that were once used as album covers. Cover path information can show potential files names of interest with many of them normally containing timestamp information in the file name. This can potentially assist with tying usage of a particular app at a specific time.

No extraction script was written for the "album" table as DB Browser for SQLite can be used directly to copy/paste the album data.

"log" Table

Here is the "log" table schema:

CREATE TABLE log (
_id INTEGER PRIMARY KEY AUTOINCREMENT,
__category INTEGER NOT NULL,
__timestamp TEXT,
__log TEXT
)

Here is a screenshot of an example "log" table:

"log" Table Screenshot

Here are some selected "log" table fields of interest:

Field Name

Description

_timestamp

Timestamp text string (formatted YYYY-MM-DD HH:MM:SS in Local Time) when a particular log entry occurred.

Example: 2020-01-09 16:17:14

_log

Proprietary formatted text string which lists the "action" performed (see next table) and the base64 encoded paths of relevant files.

Example:

[MOVE_TO_TRASH_SINGLE][1][0][location://timeline?position=6&mediaItem=data%3A%2F%2FmediaItem%2F-1566891466&from_expand=false][oKHi/x4pePL+KXj3N0b3Lil49h4pePZ2XimIUvZW3imIV14pePbOKYhWHil4904pePZeKYhWTimIUv4pePMC9E4piFQ0nimIVNL+KYhUZh4pePY+KYhWXil49ib2/imIVr4pePL+KXj0ZCX+KYhUnil49N4pePR1/imIUx4piFNeKYhTfimIU4NOKYhTnimIUw4piFNzTimIU1N+KXjzPimIUy4piFLuKXj2rimIVwZw==ST1puy1]

Some observed log "actions" include:

Log Action	Description
MOUNTED	Unknown when this is triggered. It tells how many files are currently in the trash. Example: [MOUNTED][10][0][0][/storage/emulated/0]
MOVE_TO_TRASH_SINGLE	This occurs when the user moves a single file to the trash from the timeline or gallery view. Example: [MOVE_TO_TRASH_SINGLE][1][0][location://timeline?position=0&mediaItem=data%3A%2F%2FmediaItem%2F2000598506&from_expand=false][2sx0p44piFL3N04pePb+KXj3Lil49h4pePZ+KYhWUv4pePZeKXj23imIV14piFbOKXj2Hil4904pePZWTimIUv4pePMOKYhS/imIVE4piFQ0lNL+KXj1Nj4pePcmXil49l4pePbnPil49ob+KYhXTimIVzL+KXj1Pil49j4pePcuKXj2Xil49lbnPil49o4pePb+KYhXRf4piFMjDil48y4piFMDDil482MeKYhTgt4piFMOKYhTjil48xMOKXjzPil4854piFX+KXj09uZeKYhSBV4pePSeKYhSDimIVIb23il49lLuKYhWril49w4pePZw==7nsjIZn]
MOVE_TO_TRASH_MULTIPLE	This occurs when the user moves more than one file to the trash from the timeline or gallery view. Each trash entry is enclosed in brackets. Example: [MOVE_TO_TRASH_MULTIPLE][2][0][location://timeline][QxMwh4pePL+KXj3N04pePb+KXj3Lil49hZ+KYhWXimIUv4piFZW3il491bOKXj2HimIV04piFZeKXj2Qv4piFMOKXjy/il49EQ0nimIVNL+KYhVPil49j4piFcuKYhWXimIVl4piFbuKYhXPimIVo4piFb3Til49z4piFL1Nj4piFcmVlbnPimIVo4pePb+KYhXTil49fMjAy4pePMDDimIUxMTPil48tMeKYhTbimIUxOTDimIUz4pePX03imIVF4piFR+KYhUEu4pePauKYhXBnmcIJudJ][rYxa4pePL3Pil4904pePb+KYhXLimIVhZ+KXj2XimIUvZeKYhW114piFbOKYhWHil4904pePZeKYhWTimIUv4pePMOKYhS/il49EQ+KXj0nimIVN4pePL+KXj1Pil49j4pePcuKYhWXimIVlbuKXj3Pil49ob+KXj3Rz4pePL+KYhVNj4piFcuKYhWXil49lbuKYhXPimIVo4pePb+KXj3TimIVf4pePMuKXjzDil48y4pePMOKYhTDimIUxMTMt4piFMTYx4pePOOKYhTXimIU3X01F4pePR0Eu4piFanBnrO4E+di]
EMPTY_SINGLE	This occurs when the trash is manually emptied and a single file is in the trash at that time. Example: [EMPTY_SINGLE][1][0][location://trash][MywhK4pePL+KXj3N04pePb3LimIVh4piFZ+KXj2Xil48v4piFZeKXj23imIV1bOKYhWF04pePZeKYhWTil48v4piFMOKXjy/imIVBbuKYhWTil49y4pePb2lk4piFL+KXj2TimIVhdOKYhWHil48vY2/imIVt4piFLuKXj3Nl4piFY+KYhS7il49hbuKYhWTil49yb+KYhWnimIVk4piFLmdh4pePbOKXj2zimIVl4piFcuKXj3kzZC9m4piFaWzimIVl4piFc+KYhS/il48u4pePVOKYhXLimIVhc+KYhWjimIUv4piFLeKXjzfimIU0NTPil48w4piFNTXimIUzNOKYhTYzMOKYhTDil48x4piFNzfil480OeKYhTg=Oac5vx1]
EMPTY_MULTIPLE	This occurs when the trash is manually emptied and contains more than one file. Each empty entry is enclosed in brackets. Example: [EMPTY_MULTIPLE][5][0][location://trash][zFqL+KXj3PimIV04pePb3LimIVhZ2Xil48v4pePZeKXj23imIV14piFbGHimIV04piFZWTil48v4piFMOKYhS/imIVB4piFbuKXj2Til49y4pePb2nil49kL+KXj2Til49hdOKXj2Hil48vY2/imIVt4pePLuKXj3Nl4pePYy7il49h4piFbuKXj2Ry4piFb2nimIVk4piFLmfil49h4piFbOKXj2zimIVlcuKYhXnimIUz4piFZC/imIVm4pePaeKYhWxlc+KXjy8uVOKYhXLimIVh4piFc+KYhWgvLeKYhTXimIU14pePNeKXjzA0MuKYhTDil482MuKXjzjimIUyNuKYhTg34piFMOKXjzPil48z4piFMTk=SHqNRK9][8ez12A4piFL+KXj3Pil4904pePb+KYhXJhZ+KXj2XimIUv4piFZW114pePbGHil4904piFZWTil48v4piFMOKXjy/imIVB4piFbuKXj2Ry4piFb2lk4piFL+KYhWTil49h4piFdGHil48vY+KXj2/il49t4piFLuKYhXNl4pePYy7il49h4piFbmTimIVyb+KXj2lkLmdh4piFbGxl4pePcuKYhXnil48z4piFZOKYhS9m4pePaeKYhWxlc+KYhS/imIUu4pePVOKXj3LimIVh4piFc+KYhWgvLeKYhTLimIUy4pePOeKXjzkzNTDimIU14pePNOKYhTE44pePNuKXjznil48x4pePMeKYhTbil481NuKYhTc=/Rncev0][ni1xK4piFL+KYhXPimIV04piFb3LimIVh4piFZ2XimIUv4pePZW3il4914piFbOKYhWF04pePZeKYhWTimIUv4pePMOKYhS/imIVB4piFbmTil49yb+KXj2nil49k4pePL+KYhWTimIVh4piFdGEv4pePY+KYhW9tLnPil49l4piFYy7il49h4pePbmTil49y4piFb2lkLuKXj2dh4piFbOKYhWxlcuKXj3nil48z4piFZOKYhS/imIVm4pePaeKYhWzimIVl4pePc+KXjy8u4piFVOKXj3LimIVh4pePc+KXj2jil48vLeKYhTQ44piFMeKXjzPil48wOeKYhTI1OOKXjzTimIU0NuKXjzLimIU2MuKYhTHimIU2OQ==ulivJQR][6Zkyd34pePL+KXj3N04pePb3LimIVh4piFZ2XimIUvZeKYhW3imIV14pePbGHimIV0ZeKYhWTil48v4pePMC/il49B4piFbuKXj2Ry4piFb+KXj2lk4piFL+KXj2Rh4piFdOKYhWHil48v4pePY2/il49t4pePLuKYhXPimIVl4piFY+KYhS7imIVhbmRyb+KXj2lkLmfimIVhbOKXj2xl4piFcnnimIUzZOKXjy/il49m4pePaeKYhWzil49lcy/imIUu4piFVHJh4pePc+KXj2jil48vNDjil485NDQ5NOKYhTjil4824piFNjfimIU3MOKXjzYw4pePMOKXjzDil4814piFMQ==ATnf6Un][byd4piFL3PimIV04pePb3Jh4pePZ+KYhWXimIUvZW3imIV1bOKXj2Hil490ZWQvMOKXjy/imIVBbmTimIVyb+KYhWnil49kL2Til49hdOKYhWHil48vY+KXj2/il49tLuKXj3NlYy7imIVh4pePbuKYhWTimIVy4pePb+KXj2nimIVk4piFLuKXj2fimIVh4pePbOKXj2zimIVl4piFcuKYhXnimIUz4piFZOKXjy9m4piFaWzimIVlcy/imIUu4piFVOKYhXLil49h4piFc2jil48vNuKYhTfimIU34piFOOKXjzHil4844pePMOKXjzk4OeKYhTEz4piFNeKXjzfil4854pePM+KXjzXimIU34pePNQ==nSCzmep]
EMPTY_EXPIRED	This occurs when a file is auto-deleted after staying in the trash for a predetermined amount of time as described in the settings for the app. Example: [EMPTY_EXPIRED][22][22][2020-05-14 00:00:00]

Other operations not observed in our data but declared in the source code (see TrashHelper class, DeleteType enum):

DELETE_MULTIPLE

DELETE_SINGE

Here is an example of how to manually decode the base64 encoded string from a "__log" field:
The original value is:
[MOVE_TO_TRASH_SINGLE][1][0][location://timeline?position=9&mediaItem=data%3A%2F%2FmediaItem%2F-575841975&from_expand=false][eTgcy4piFL3Pil4904piFb+KXj3LimIVh4pePZ2Uv4pePZeKXj2114pePbOKYhWHimIV0ZeKXj2TimIUvMOKYhS9EQ+KYhUlN4piFL1PimIVj4piFcmXil49l4pePbuKXj3Pil49ob3TimIVzL1PimIVj4piFcmXil49lbnNo4pePb3TimIVf4piFMjAxOTHimIUy4pePM+KXjzEtMeKXjznimIUyMeKXjzXil4844piFX+KXj1Nu4pePYeKYhXBjaGF0LmrimIVw4pePZw==bakWlla]

We copy the base64 string enclosed by the [ ] (highlighted in Yellow):
eTgcy4piFL3Pil4904piFb+KXj3LimIVh4pePZ2Uv4pePZeKXj2114pePbOKYhWHimIV0ZeKXj2TimIUvMOKYhS9EQ+KYhUlN4piFL1PimIVj4piFcmXil49l4pePbuKXj3Pil49ob3TimIVzL1PimIVj4piFcmXil49lbnNo4pePb3TimIVf4piFMjAxOTHimIUy4pePM+KXjzEtMeKXjznimIUyMeKXjzXil4844piFX+KXj1Nu4pePYeKYhXBjaGF0LmrimIVw4pePZw==bakWlla

Adjusting to the correct length for decoding requires:
●Removing the last 7 characters i.e. "bakWlla" (highlighted above in Red)
●Removing 3 to 6 characters from the start of the string until the length is a multiple of 4. ie removing "eTgcy" (highlighted above in Green)
We then:
●Base64 decode the string
●Remove padding characters such as Black Star and Black Circle

For our example above, we adjust the base64 string to:
4piFL3Pil4904piFb+KXj3LimIVh4pePZ2Uv4pePZeKXj2114pePbOKYhWHimIV0ZeKXj2TimIUvMOKYhS9EQ+KYhUlN4piFL1PimIVj4piFcmXil49l4pePbuKXj3Pil49ob3TimIVzL1PimIVj4piFcmXil49lbnNo4pePb3TimIVf4piFMjAxOTHimIUy4pePM+KXjzEtMeKXjznimIUyMeKXjzXil4844piFX+KXj1Nu4pePYeKYhXBjaGF0LmrimIVw4pePZw==

which decodes via CyberChef or base64decode.org to:
★/s●t★o●r★a●ge/●e●mu●l★a★te●d★/0★/DC★IM★/S★c★re●e●n●s●hot★s/S★c★re●ensh●ot★_★20191★2●3●1-1●9★21●5●8★_●Sn●a★pchat.j★p●g

We can then manually remove the following randomly added padding characters:
Unicode Code PointU+2605 = "Black Star"
Unicode Code PointU+25CF = "Black Circle"
Unicode Code PointU+25C6 = "Black Diamond"

Here is what the output from Cyberchef looks like:

Base64 Decode using Cyberchef

Cyberchef has a handy feature of showing the number of characters in the input string ("length"). This can be used when determining how many characters to remove to get an input length that is a multiple of 4.
Here is the base64decode.org output:

Base64 Decode using base64decode.org

The log table's "__log" field format varies according to APK version. We have only looked at versions v10.0.21.5, v10.2.00.21 (main focus) and v11.5.05.1

Consequently, two versions of a "log" table parsing script were written: samsung_gallery3d_log_parser_v10.pyand samsung_gallery3d_log_parser_v11.py

"trash" Table

Here is the "trash" table schema:

CREATE TABLE trash (

__absPath TEXT UNIQUE NOT NULL,

__Title TEXT,

__absID INTEGER,

__mediaType INTEGER,

__width INTEGER,

__height INTEGER,

__orientation INTEGER,

__originPath TEXT,

__originTitle TEXT,

__deleteTime INTEGER,

__storageType INTEGER,

__burstGroupID INTEGER,

__bestImage INTEGER,

__cloudServerId TEXT,

__cloudTP TEXT,

__restoreExtra TEXT,

__volumeName TEXT,

__volumeValid INTEGER,

__expiredPeriod INTEGER

)

Here are some screenshots of an example "trash" table:

"trash" Table Screenshot 1

"trash" Table Screenshot 2

There are only 10 entries stored in this example table. The entries in this table correspond with live files in the .Trash directory. All other files located in .Trash are overwritten files with “_Title” file names but no date/time information.
Here are some selected trash table fields of interest:

Field Name	Description
__absPath	Current path and filename of the deleted file. Example: /storage/emulated/0/Android/data/com.sec.android.gallery3d/files/.Trash/135138193438761664
__Title	Current filename of the deleted file. Example: 135138193438761664
__originPath	Original path and filename. Example: /storage/emulated/0/Download/unnamed.jpg
__originTitle	Original filename. Example: unnamed.jpg
__deleteTime	UNIX ms time in UTC. Example: 1592678711438
__restorExtra	JSON formatted and contains various metadata such as: "__dateTaken"(in UNIX ms time in Local Time), "__latitude", "__longitude". Example: {"__is360Video":false,"__isDrm":false,"__isFavourite":false,"__cloudOriginalSize":0,"__cloudRevision":-1,"__fileDuration":0,"__recordingMode":0,"__sefFileSubType":0,"__sefFileType":-1,"__cloudTimestamp":1592678711350,"__dateTaken":1592669230000,"__size":98526,"__latitude":0,"__longitude":0,"__capturedAPP":"","__capturedURL":"","__cloudServerPath":"","__hash":"","__mimeType":"image\/jpeg","__resolution":"","__recordingType":0,"__isHdr10Video":false}

The "__Title" value (as seen in "__absPath") is derived by calling a proprietary Crc::getCrc64Longfunction on the "__originPath" value. Note: This value is generated via a different method to the album table's "__bucketID" field.
One script was written to parse the "trash" table: samsung_gallery3d_trash_parser_v10.py
There are other tables in local.db but due to time constraints and available test data, we concentrated on the "log" and "trash" tables.
On some later app versions, we noticed a "filesystem_monitor" table which listed fields such as:
package, date_event_occurred (suspected ms since 1JAN1970), __data (base64 encoded filename), event_type (meaning currently unknown). This table requires further research.

Scripting

Some initial Python 3 scripts were written for parsing the "log" and "trash" tables.
No extraction script was written for the "album" table as DB Browser for SQLite can be used directly to copy/paste the album data.
Due to the different "__log" field formats observed, two versions were written for the "log" table: samsung_gallery3d_log_parser_v10.py and samsung_gallery3d_log_parser_v11.py.
Both of these scripts extract various fields from the "log" table and base64 decode any encoded path names that we have observed in our data. The v11 version was written to handle the differently formatted "__log" field values.
Here is the help text for samsung_gallery3d_log_parser_v10.py (main focus of research):

python3 samsung_gallery3d_log_parser_v10.py -h

usage: samsung_gallery3d_log_parser_v10.py [-d inputfile -o outputfile]

Extracts/parses data from com.sec.android.gallery3d's (v10) local.db's log table to output TSV file

optional arguments:

-h, --help show this help message and exit

-d DATABASE SQLite DB filename i.e. local.db

-o OUTPUT Output file name for Tab-Separated-Value report

Here is a usage example (Note: a "__log" field may contain multiple base64 encoded file paths. The script should find/extract all of them):

python3 samsung_gallery3d_log_parser_v10.py -d s767vl-local.db -o s767vl-log-output.tsv

Running samsung_gallery3d_log_parser_v10.py 2021-11-20

_id = 1

Found valid path = /storage/emulated/0/DCIM/Facebook/FB_IMG_1578490745732.jpg

for: 4pePL+KXj3N0b3Lil49h4pePZ2XimIUvZW3imIV14pePbOKYhWHil4904pePZeKYhWTimIUv4pePMC9E4piFQ0nimIVNL+KYhUZh4pePY+KYhWXil49ib2/imIVr4pePL+KXj0ZCX+KYhUnil49N4pePR1/imIUx4piFNeKYhTfimIU4NOKYhTnimIUw4piFNzTimIU1N+KXjzPimIUy4piFLuKXj2rimIVwZw==

_id = 2

Found valid path = /storage/emulated/0/DCIM/Screenshots/Screenshot_20200106-112041_Instagram.jpg

for: 4pePL+KXj3N04pePb+KYhXJh4pePZ+KYhWXil48v4piFZeKXj23il4914piFbOKYhWHil4904piFZeKYhWTimIUv4pePMC/il49EQ0nimIVN4piFL+KXj1Nj4piFcuKXj2XimIVl4pePbuKXj3Pil49o4piFb+KYhXTil49z4pePL+KXj1Nj4pePcmXimIVl4pePbuKXj3Pil49ob3Til49f4pePMuKXjzAy4piFMOKXjzDil48x4pePMOKXjzYt4piFMeKYhTEy4piFMOKXjzQx4pePX+KYhUlu4pePc3TimIVhZ+KXj3Jh4piFbeKXjy5qcGc=

_id = 3

Found valid path = /storage/emulated/0/DCIM/Screenshots/Screenshot_20191231-192158_Snapchat.jpg

for: 4piFL3Pil4904piFb+KXj3LimIVh4pePZ2Uv4pePZeKXj2114pePbOKYhWHimIV0ZeKXj2TimIUvMOKYhS9EQ+KYhUlN4piFL1PimIVj4piFcmXil49l4pePbuKXj3Pil49ob3TimIVzL1PimIVj4piFcmXil49lbnNo4pePb3TimIVf4piFMjAxOTHimIUy4pePM+KXjzEtMeKXjznimIUyMeKXjzXil4844piFX+KXj1Nu4pePYeKYhXBjaGF0LmrimIVw4pePZw==

_id = 4

Found valid path = /storage/emulated/0/DCIM/Screenshots/Screenshot_20191231-191604_Snapchat.jpg

for: 4piFL3Pil490b+KXj3Lil49hZ+KXj2XimIUv4piFZeKXj23imIV14pePbGF04pePZeKXj2QvMOKYhS9E4pePQ0nil49N4pePL+KXj1Pil49j4piFcuKYhWXimIVl4pePbuKXj3PimIVo4piFb+KXj3TimIVz4pePL+KXj1Nj4piFcuKYhWXimIVlbuKXj3Pil49ob+KYhXTimIVf4piFMjDimIUx4piFOTHil48yMzHimIUt4piFMeKYhTkx4pePNuKXjzDimIU04pePX+KYhVPimIVu4piFYeKXj3DimIVj4pePaOKYhWHil490LmpwZw==

_id = 5

_id = 6

Found valid path = /storage/emulated/0/Download/39dc29a626fe74cef450f2dc931e134809ea3228.jpeg.jpg

for: 4piFL3N0b+KYhXLil49hZ+KXj2Uv4pePZeKYhW114piFbGF0ZWTil48v4pePMC/il49Eb3fil49u4pePbG/il49h4piFZOKXjy/il48z4pePOeKYhWRj4piFMjlh4pePNuKYhTLil482ZuKYhWU34piFNGPil49l4piFZuKXjzTil4814piFMGYy4pePZOKYhWPil485MzFl4pePMTM04piFODDil485ZWHil48z4pePMuKXjzLimIU44piFLuKXj2ril49w4pePZeKXj2cu4piFauKYhXDimIVn

Found valid path = /storage/emulated/0/Download/26349f2e382fbacceb2ee3951bc8b30ff7369234.jpeg.jpg

for: 4pePL+KYhXN04pePb+KXj3Lil49h4piFZ2Uv4piFZW11bGF04pePZeKYhWQv4piFMC9E4piFb+KXj3fil49ubOKXj2/imIVh4piFZOKYhS/imIUy4piFNuKXjzPil480OWbimIUy4piFZeKYhTPil4844piFMuKXj2bimIVi4pePYWNjZWLimIUy4piFZWXil48zOTXil48x4pePYmM4YuKYhTPil48w4pePZmY34piFMzbil485MuKYhTPimIU04pePLuKXj2ril49wZeKXj2fil48uauKYhXDil49n

Found valid path = /storage/emulated/0/Download/user1010953_863e8addc06c.jpg

for: L+KXj3PimIV0b+KYhXJhZ+KYhWUv4pePZeKYhW3imIV1bOKYhWHil4904piFZWTil48v4pePMOKXjy9Eb3fimIVu4pePbOKYhW/il49h4pePZOKXjy914pePc+KYhWXimIVyMeKXjzAx4piFMOKXjznimIU1M+KXj1/imIU4NuKXjzNl4pePOGHil49kZGPimIUw4piFNuKYhWPimIUuauKYhXDil49n

Found valid path = /storage/emulated/0/Download/11b5a77b2141eb3ec139240fb4cf7d1288d8bbfe.png

for: 4piFL+KYhXN0b3Lil49hZ2Xil48vZW3il4914piFbOKYhWF04piFZeKYhWTimIUv4pePMOKYhS/il49E4pePb+KXj3dubOKXj29hZOKXjy8xMeKXj2Lil481YeKXjzc34pePYuKYhTIxNOKXjzHimIVl4piFYuKXjzPil49l4pePY+KYhTHimIUz4piFOeKXjzLimIU04pePMGbil49i4piFNOKYhWNm4pePN+KYhWQx4piFMuKYhTg44piFZDhi4piFYuKYhWZlLuKXj3DimIVuZw==

Found valid path = /storage/emulated/0/Download/7cd62fdba485f143f88370ed838778e2ae222c26.jpeg.jpg

for: L+KYhXPil4904pePb3Jh4piFZ+KYhWXil48vZeKYhW3il491bOKYhWHimIV0ZeKYhWTil48v4pePMC9E4pePb3fil49ubOKXj2/il49h4pePZOKXjy/imIU3Y+KXj2Q24piFMuKXj2Zk4piFYuKXj2Hil4804piFOOKXjzVm4piFMeKXjzTil48z4pePZjjil4844pePM+KXjzfil48wZeKXj2Q44piFM+KXjzjil483NzjimIVlMmHil49l4piFMjIy4pePYzI2LuKYhWpwZWfimIUuanDil49n

Found valid path = /storage/emulated/0/Download/8a0f38ee75d695af81ab0a7ca8b2d1f6efcf5630.jpeg.jpg

for: 4pePL+KYhXPimIV04piFb3LimIVh4piFZ+KYhWUvZeKXj211bOKYhWHil4904piFZeKXj2TimIUvMOKYhS/il49Eb3fimIVubOKYhW9hZC844piFYeKYhTBm4pePM+KXjzhlZeKYhTfil4814piFZDbimIU5NWFm4piFOOKYhTHil49h4pePYuKYhTDimIVh4piFN+KYhWNhOOKYhWLimIUy4pePZOKXjzHil49m4piFNuKYhWXil49m4piFY+KYhWbil4814piFNuKXjzMw4piFLuKXj2rimIVw4piFZeKXj2fil48uauKYhXBn

Found valid path = /storage/emulated/0/Download/15510fd162dfe3bbdd3795cee6776c8db408577c.jpeg.jpg

for: L+KXj3Pil4904piFb+KYhXLimIVh4piFZ2XimIUv4pePZeKXj23imIV14piFbGHimIV04pePZWQvMOKXjy9E4piFb3fil49u4pePbOKXj2/il49h4piFZOKYhS/il48x4pePNeKYhTXil48x4pePMOKXj2ZkMeKYhTbimIUy4pePZOKYhWZl4pePM+KXj2Lil49i4pePZGTil48z4piFN+KXjznimIU1Y+KXj2XimIVl4pePNuKYhTfimIU34pePNmPil4844pePZGLimIU04piFMOKXjzjil4814piFN+KXjzdj4piFLuKYhWpw4pePZeKYhWfimIUu4pePanDimIVn

Found valid path = /storage/emulated/0/Download/5b4565b2076c02e09eea562ae70c36bc419e2ed1.jpeg.jpg

for: 4piFL+KXj3N04piFb3Jh4piFZ2XimIUvZeKYhW114piFbOKXj2F0ZeKYhWTil48v4pePMOKXjy9E4pePb+KXj3fimIVu4piFbG9hZOKYhS/il4814pePYjTil4814pePNuKYhTXil49i4pePMuKYhTDimIU34pePNmPimIUw4piFMuKYhWUw4piFOeKYhWXimIVl4piFYeKXjzXimIU24pePMuKXj2Hil49l4piFN+KXjzDimIVjM+KYhTbil49i4piFY+KYhTTil48xOeKXj2Xil48y4piFZeKYhWQx4piFLmrimIVwZeKYhWfil48u4piFauKXj3Dil49n

Found valid path = /storage/emulated/0/Download/6f559e2f1df8ca033a6f7190416d9351b1bcc942.jpeg.jpg

for: L+KYhXPil4904piFb+KXj3LimIVh4pePZ+KXj2XimIUv4piFZW114piFbOKXj2HimIV04piFZWTimIUv4pePMC9E4pePb+KYhXdu4piFbOKXj29h4piFZOKYhS82ZjU14pePOWXimIUy4pePZuKXjzHimIVkZuKXjzjil49jYTAz4pePM+KXj2HimIU24pePZuKXjzfil48x4pePOTDimIU04piFMTbimIVkOeKYhTM14piFMWLil48xYmPimIVj4pePOeKXjzTimIUy4piFLmril49wZWcu4piFanBn

Found valid path = /storage/emulated/0/Download/c854ab3ee061d52038320a605694c002d117648d.jpeg.jpg

for: L+KYhXN04pePb3LimIVh4pePZ2UvZeKXj23imIV14pePbGHil4904pePZWTimIUvMC/imIVEb+KYhXfil49u4piFbG/imIVhZOKXjy/imIVj4piFOOKXjzU04piFYeKYhWLil48z4piFZWXimIUw4piFNjFkNeKYhTIwM+KXjzgzMjBh4piFNuKYhTA1NuKXjznil4804pePYzDimIUw4piFMuKYhWTimIUx4piFMeKYhTc24piFNOKYhTjimIVk4pePLuKXj2pw4pePZeKYhWfimIUu4pePauKXj3Bn

[redacted for brevity]

Found valid path = /storage/emulated/0/Download/mj03kueepo441.jpg

for: 4piFL+KYhXN0b3JhZ+KYhWXimIUv4piFZW3imIV14piFbOKXj2HimIV04piFZeKXj2Til48vMC/il49E4piFb+KYhXdu4piFbOKYhW/imIVh4pePZOKYhS/imIVtajDimIUz4piFa+KYhXXil49lZeKXj3Dil49vNOKYhTTimIUxLuKYhWrimIVwZw==

Found valid path = /storage/emulated/0/Download/Ab3UCPAD5XSzfTI0rqFuknyufbEe9PWkKnJOicjJhFg.jpg

for: 4pePL3N0b+KXj3Lil49hZ+KYhWXil48v4piFZW3il4914piFbOKXj2HimIV04pePZWTimIUvMOKXjy9E4piFb3fil49ubOKYhW9h4pePZOKYhS/il49B4piFYuKYhTNV4piFQ+KXj1Dil49BRDXimIVY4piFU3pm4piFVOKXj0nil48w4piFcuKYhXHil49G4piFdeKXj2vimIVueeKXj3XimIVmYuKXj0XimIVl4pePOeKYhVDil49Xa+KXj0tu4pePSk/imIVpY+KXj2rimIVKaOKYhUbimIVnLuKYhWrimIVw4piFZw==

Processed/Wrote 364 entries to: s767vl-log-output.tsv

Exiting ...

Here is a screenshot of the output TSV (s767vl-log-output.tsv) imported into a LibreOffice Calc spreadsheet:

samsung_gallery3d_log_parser_v10.py TSV Output

Note: If you have issues with data not appearing correctly in MS Excel / LibreOffice Calc, please ensure the Import column type is set to TEXT.

A Python 3 script was also written to parse the "trash" table: samsung_gallery3d_trash_parser_v10.py

Here is the help text for samsung_gallery3d_trash_parser_v10.py:

python3 samsung_gallery3d_trash_parser_v10.py -h
usage: samsung_gallery3d_trash_parser_v10.py [-d inputfile -o outputfile]

Extracts/parses data from com.sec.android.gallery3d's (v10) local.db's trash
table to output TSV file

optional arguments:
-h, --help show this help message and exit
-d DATABASE SQLite DB filename i.e. local.db
-o OUTPUT Output file name for Tab-Separated-Value report

Here is a usage example:

python3 samsung_gallery3d_trash_parser_v10.py -d s767vl-local.db -o s767vl-trash-output.tsv
Running samsung_gallery3d_trash_parser.py v2021-11-12

Processed/Wrote 14 entries to: s767vl-trash-output.tsv
Exiting ...

Here is a screenshot of the output TSV (s767vl-trash-output.tsv) imported into a LibreOffice Calc spreadsheet:

samsung_gallery3d_trash_parser.py TSV Output

Note: If you have issues with data not appearing correctly in MS Excel / LibreOffice Calc, please ensure the Import column type is set to TEXT.

Some additional scripts were written and included in the GitHub repo.

The java-hashcode.py script was written to convert a given path to a "__bucketID" value as seen in the "album" table.

Here is the help for the java-hashcode script:

python3 java-hashcode.py -h
usage: java-hashcode.py [-l | -u] -i inputfile

Read input strings/paths from a text file (one per line) and prints out the
equivalent Java hashcode

optional arguments:
-h, --help    show this help message and exit
-i INPUTFILE Input text filename
-l            (Optional) Converts input string to lower case before hashing
-u            (Optional) Converts input string to UPPER case before hashing

Here is an example of how to calculate a bucketID for the following paths - "/storage/emulated/0/DCIM/Screenshots" and "/storage/emulated/0/Download".

We start by writing the 2 paths (one per line) to a text file called "inputhash.txt"

Example inputhash.txt for java-hashcode.py

Next, we call the java-hashcode.py script with "inputhash.txt" set as the input file.

Note: The usage of the "-l" (lowercase L) argument to convert the path to lowercase before calling the hashcode function.

Here is the command line example:

python3 java-hashcode.py -i inputhash.txt -l
Running java-hashcode.py 2021-12-23

/storage/emulated/0/dcim/screenshots = -1313584517
/storage/emulated/0/download = 540528482

Processed 2 lines - Exiting ...

We can see the hashcode values for those paths match the values recorded in the "album" table:

The /storage/emulated/0/dcim/screenshots path converts to a bucketID= -1313584517

The /storage/emulated/0/download path converts to a bucketID = 540528482

"album" Table bucketID Example

Other scripts (requiring further testing) include:

●samsung_gallery3d_log_parser_v11.py

●samsung_gallery3d_filesysmon_parser_v11.py

These scripts were written for parsing test data from app version 11.5.05.1 (which differed from our targeted app version 10.2.00.21). The version 11 scripts have been included in the GitHub repo but will not be described further in this post.

Please note that all scripts were written using our limited test data so they will probably struggle parsing other version's data.

Script wise, the most interesting part was automating the encoded path decoding.

Using tools such as dex2jar, JD-GUI and JADX we were able to find the code responsible for the path encoding (see Logger.class::getEncodedString method) and wrote a corresponding Python function to base64 decode the encoded path string.

Depending on your APK, using JD-GUI might require first extracting the classes.dex from the APK, then running dex2jar on the classes.dex before viewing the .jar file in JD-GUI. However in our case, Mike was able to run dex2jar on his APK directly and then use JD-GUI to view the Java code.

JADX can open/reverse an APK without the dex2jar step.

Some methods/variable names were not translated in JD-GUI and some were not translated in JADX so it’s probably worth trying both JD-GUI and JADX.

As mentioned previously, the path decode process is:

Remove the last 7 base64 encoded chars
Remove 3-6 characters at start of encoded string until a valid base64 length (multiple of 4 bytes)
Perform base64 decode
Remove any special padding chars eg Black Star, Black Circle

So the basic process for the "log" table script (samsung_gallery3d_log_parser_v10.py) was:

●        Query the database eg "SELECT _id, __category, __timestamp, __log FROM log ORDER BY __timestamp ASC;"
●        For each row:
o   Extract the "__log" and "timestamp" fields
o   Decode the base64 encoded path(s) (there can be more than one path per log item)
o   Print extracted fields and decoded path field(s) to TSV

The process for the "trash" table script (samsung_gallery3d_trash_parser_v10.py) was:

●        Query the database eg "SELECT __absID, __absPath, __Title, __originPath, __originTitle, __deleteTime, __restoreExtra FROM trash ORDER BY __deleteTime ASC;"
●        For each row:
o   Extract the __absPath, __originPath, __deleteTime and __restoreExtra (metadata) fields
o   Print extracted fields to TSV (no path decoding required)

Summary

All this research led to a deeper understanding of reverse engineering Android apps, new and unique hashcode algorithms and different encoding techniques. Looking further into app databases that may/may not be parsed by existing tools can still lead to new information, folders of interest, log files, etc. You may discover new data that was introduced in newer versions of Android or the particular app.

For Mike's case, using the research and scripts from this post showed that the user was in the habit of taking screenshots or downloading images and then deleting them and emptying the trash a short time later. The web browser was used to access the images in question but deleting web history was also a frequent process. The recovered names of the screenshots showed that the user had used the web browser at specific times. Unfortunately these dates and times didn’t match the times in question but it did lead to other times to investigate that weren’t found in other parsed data.
Researching this post with Mike allowed Monkey to learn more about the Samsung Gallery app, gain further experience with reversing an Android APK and keep his Python skills fresh. Like any language, fluency deteriorates with lack of use.

Various Python 3 scripts were written to assist with parsing the "log" and "trash" tables from the Samsung Gallery3d app (v10.2.00.21). These tables can potentially store information regarding image deletion performed from within the Samsung Gallery3d app. e.g. timestamps and original file paths.

This post also demonstrated how collaborative research can lead to increased output/new tools. For example combining Mike's testing observations with Monkey's scripting. The opportunity to work with someone else with different knowledge, skills, experience and a fresh perspective is invaluable. Utilizing this experience can be just as good as, if not better than, attending a training class or a webinar.

Special Thanks to Mike for sharing his research and co-authoring this post - hopefully, we can collaborate again in the future :)

↧

Monkey Attempts To Digest Some Google Takeout (DetectedActivitys)

February 26, 2022, 5:15 am

≪ Previous: Mike & the Monkey Dumpster Dive Into Samsung Gallery3d App Trash

Careful What You Eat, Monkey!

One of Monkey's co-workers (Troy) was able to provide investigators with a location of interest by looking at the device owner's Google Takeout "Location History.json".

Specifically, Troy looked at the DetectedActivity classification strings which Google (Play Services) uses to estimate whether the device was still, in a vehicle, on foot etc. There was a transition from IN_VEHICLE to STILL which indicated the vehicle had stopped at a location (and not just passed through).

Being part of Google Play services means this DetectedActivity data is likely only available for Google Accounts involving Android devices. We have yet to see any DetectedActivity in iOS device related Google Takeouts.

Special Thanks to:

- Troy for sharing his findings

- Rasmus Riis Kristensen, Mike Lacombe, Heather Mahalik and Lee Crognale for checking their Google Takeout data and script testing.

- Josh Hickman for providing more test Android Takeout data and additional feedback regarding the new Takeout format & velocity / heading / altitude fields.

The main DetectedActivity categories are defined as:

IN_VEHICLEThe device is in a vehicle, such as a car.

ON_BICYCLEThe device is on a bicycle.

ON_FOOTThe device is on a user who is walking or running.

RUNNINGThe device is on a user who is running.

STILLThe device is still (not moving).

TILTINGThe device angle relative to gravity changed significantly.

UNKNOWNUnable to detect the current activity.

WALKINGThe device is on a user who is walking.

For further details see here.

Note: We have also observed undocumented activity types such as IN_ROAD_VEHICLE, IN_FOUR_WHEELER_VEHICLE, IN_CAR, IN_RAIL_VEHICLE.

While some forensic tools already display Google Takeout information, highlighting/filtering by DetectedActivity was not easily done/not possible with those tools.

For Troy's Takeout (and in Josh Hickman's Android 12 Google Takeout), there's a "Location History" folder and in the root of that folder is the "Location History.json" file.

There's also some sub folders but they are subject to account owner editing. Apparently the "Location History.json" is raw and unaffected by user deleted locations/trips.

This blog post by Ross Donnelly has some more details.

Initially, Monkey wrote a protoype Python3 script for the "Location History.json" but then noticed that recent test Takeouts (done in Jan-Feb 2022) no longer had this JSON file.

UPDATE 28FEB2022: The prototype "gLocationHistoryActivity.py" script for "Location History.json" is now available from GitHub here. Note: This has been tested with small 15-30 MB files only and does not use the "ijson" library.

Instead, Google seems to have replaced it with a file called "Records.json". The newer file is also JSON formatted but has some slight differences eg timestamp format, extra fields.

Monkey could not find any documentation on this file format.

Checking in with other DF folks confirmed their Takeouts also had "Records.json" instead of "Location History.json".

It is currently unknown if this change was implemented at the Google server end or whether the Android/Google Play version on the device affects which .json file gets created.

UPDATE 28FEB2022: Josh Hickman performed another Takeout on his test Android 12 data and got a "Records.json" instead of a "Location History.json".

So it appears the change was implemented at the Google server end.

So ass-uming any Takeouts will now export to "Records.json", Monkey wrote a script to process the "Records.json" file which can be described as follows:

1 locations list which can store many location element records.
Each location element has the following fields:

"source" (usually "UNKNOWN" but have also seen "CELL" here)
"deviceTag" (device identifier)
"platformType" (usually "ANDROID")
"formFactor" (eg "PHONE")
"serverTimestamp" (eg "2022-02-04T04:40:17.685Z", not always present, can be same for multiple location elements)
"deviceTimestamp" (eg "2022-02-04T04:40:16.214Z", not always present, can be same for multiple location elements)
"timestamp" (eg "2022-02-02T00:55:06.311Z", changes with each element record, to avoid confusion Monkey is calling this the "element timestamp")
"latitudeE7" (in degrees scaled by 10 000 000)
"longitudeE7" (in degrees scaled by 10 000 000)
"altitude" (not always present, units are in metres per Josh Hickman)
"heading" (not always present, units are degrees clockwise from True North per Josh Hickman eg 90 = East, 180 = South)
"velocity" (not always present, units are metres per second per Josh Hickman)
"accuracy" (units unknown, suspected to be in m)
"verticalAccuracy" (not always present, units unknown, suspected to be in m)

Each location element may/may not have an "activity" list.
Each activity list has an activity "timestamp" and can store 1 or more "subactivitys".

Note: "subactivity" is a monkey term used to label any child activitys which are listed in the activity list (see example below).

Each "subactivity" can have multiple type and confidence (percentage) pairs (eg type = ON_FOOT, confidence = 3 and type = STILL, confidence = 89).

Here's an example of a parent activity declaration in a "Records.json":

"activity": [{
"activity": [{
"type":"STILL",
"confidence": 89
}, {
"type": "ON_FOOT",
"confidence": 3
}, {
"type":"WALKING",
"confidence": 3
}, {
"type":"UNKNOWN",
"confidence": 3
}, {
"type":"IN_VEHICLE",
"confidence": 2
}, {
"type":"ON_BICYCLE",
"confidence": 2
}, {
"type":"IN_RAIL_VEHICLE",
"confidence": 2
}, {
"type": "IN_ROAD_VEHICLE",
"confidence": 1
}, {
"type":"IN_FOUR_WHEELER_VEHICLE",
"confidence": 1
}, {
"type": "IN_CAR",
"confidence": 1
}],
"timestamp": "2022-02-04T05:52:43.071Z"
}]

In the above example, we can see 1 parent activity list [highlighted in red] which has an activity timestamp ("2022-02-04T05:52:43.071Z" in yellow) and 1 subactivity child [in blue].

The subactivity has multiple type/confidence fields [in green] which classify what activity Google thinks the device is doing (eg 89% STILL).

We have observed that more than 1 subactivity can be listed per parent activity.

Scripting

Initially, Monkey used the standard Python json library to load the whole "Records.json" into memory. While this worked for .json files in the MB size range, this caused a memory error when a large 1.3GB dataset (~ 10 years of data) was processed.

After Rasmus helpfully pointed Monkey to the following articles:

https://stackoverflow.com/questions/40399933/memoryerror-when-loading-a-json-file

https://stackoverflow.com/questions/40330820/load-an-element-with-python-from-large-json-file

Monkey used the third party "ijson" library to successfully iteratively load the 1.3 GB "Records.json" file.

The revised script (gRecordsActivity_ijson_date.py) searches through all location elements for elements with an "activity". It then processes/stores each DetectedActivity per element timestamp into a Python dictionary. There can be more than one activity per element timestamp.

The DetectedActivity data for each element timestamp date is then output to a Tab Separated Variable (TSV) file and a Keyhole Markup Language (KML) file for analysis.

Here is how to use the script (gRecordsActivity_ijson_date.py) which is available from GitHub here.

First step - install the ijson library via pip. On Ubuntu 20.04 LTS, you can use this command:

pip3 install ijson

You can now run the script by pointing it to the Takeout Records.json and an output directory.

Here is the usage help:

python3 gRecordsActivity_ijson_date.py -h
usage: gRecordsActivity_ijson_date.py [-i input_file -o output_dir -a start_isodate -b end_isodate]

Extracts/parses "Detected Activity" data from Google Takeout "Records.json" (large files) and outputs TSV and KML files to given output dir

optional arguments:
-h, --help show this help message and exit
-i INPUT Input Records filename
-o OUTPUT Output KML/TSV directory
-a START Filter FROM (inclusive) Start ISO date (YYYY-MM-DD)
-b END Filter BEFORE (inclusive) End ISO date (YYYY-MM-DD)

Here is an abbreviated example command output for some test data:

python3 gRecordsActivity_ijson_date.py -i Records-4.json -o records4_output
Running gRecordsActivity_ijson_date.py v2022-02-26

ACTIVITY =>
[{'activity': [{'type': 'STILL', 'confidence': 99}, {'type': 'UNKNOWN', 'confidence': 1}], 'timestamp': '2022-02-04T03:49:57.246Z'}]
Element timestamp: 2022-02-04T03:49:55.263Z
Element serverTimestamp: 2022-02-04T04:22:27.214Z
Element deviceTimestamp: 2022-02-04T04:22:25.715Z
No. (sub)Activitys => 1
(sub)Activity #1 timestamp = 2022-02-04T03:49:57.246Z
No. (sub)Activity types: 2

ACTIVITY =>
[{'activity': [{'type': 'STILL', 'confidence': 99}, {'type': 'UNKNOWN', 'confidence': 1}], 'timestamp': '2022-02-04T03:51:01.973Z'}]
Element timestamp: 2022-02-04T03:50:48.614Z
Element serverTimestamp: 2022-02-04T04:22:27.214Z
Element deviceTimestamp: 2022-02-04T04:22:25.715Z
No. (sub)Activitys => 1
(sub)Activity #1 timestamp = 2022-02-04T03:51:01.973Z
No. (sub)Activity types: 2

[Output Redacted for brevity ...]

ACTIVITY =>
[{'activity': [{'type': 'TILTING', 'confidence': 100}], 'timestamp': '2022-02-04T04:24:24.209Z'}, {'activity': [{'type': 'STILL', 'confidence': 99}, {'type': 'UNKNOWN', 'confidence': 1}], 'timestamp': '2022-02-04T04:26:35.525Z'}, {'activity': [{'type': 'UNKNOWN', 'confidence': 40}, {'type': 'IN_VEHICLE', 'confidence': 10}, {'type': 'ON_BICYCLE', 'confidence': 10}, {'type': 'ON_FOOT', 'confidence': 10}, {'type': 'WALKING', 'confidence': 10}, {'type': 'RUNNING', 'confidence': 10}, {'type': 'STILL', 'confidence': 10}, {'type': 'IN_ROAD_VEHICLE', 'confidence': 10}, {'type': 'IN_RAIL_VEHICLE', 'confidence': 10}, {'type': 'IN_FOUR_WHEELER_VEHICLE', 'confidence': 10}, {'type': 'IN_CAR', 'confidence': 10}], 'timestamp': '2022-02-04T04:30:11.757Z'}]
Element timestamp: 2022-02-04T04:30:36.434Z
Element serverTimestamp: 2022-02-04T04:40:17.685Z
Element deviceTimestamp: 2022-02-04T04:40:16.214Z
No. (sub)Activitys => 3
(sub)Activity #1 timestamp = 2022-02-04T04:24:24.209Z
No. (sub)Activity types: 1
(sub)Activity #2 timestamp = 2022-02-04T04:26:35.525Z
No. (sub)Activity types: 2
(sub)Activity #3 timestamp = 2022-02-04T04:30:11.757Z
No. (sub)Activity types: 11

[Output Redacted for brevity ...]

ACTIVITY =>
[{'activity': [{'type': 'STILL', 'confidence': 99}, {'type': 'UNKNOWN', 'confidence': 1}], 'timestamp': '2022-02-04T10:00:45.110Z'}]
Element timestamp: 2022-02-04T10:00:32.756Z
Element serverTimestamp: 2022-02-04T10:45:20.367Z
Element deviceTimestamp: 2022-02-04T10:45:18.854Z
No. (sub)Activitys => 1
(sub)Activity #1 timestamp = 2022-02-04T10:00:45.110Z
No. (sub)Activity types: 2

Total no. of elements with at least one Activity = 36
No. of elements with multiple Activitys = 2

Processing Activitys ... Number of days = 1
Processing 2022-02-04 = 39 entries

Processed/Wrote 39 Total Activity entries to: records4_output

Exiting ...

This resulted in the following files being created in the records4_output directory:

2022-02-04.kml

2022-02-04.tsv

While the previous command will extract all dates, the script can also accept date ranges in ISO format.

The date range filter arguments are:

-a start isodate_string (eg 2021-01-31)

-b end isodate_string

The date args can be used singly or together.

eg everything before xxx would use "-b xxx", everything after zzz would use "-a zzz"

To specify a date range, use both args ie. "-a zzz -b xxx"

For example:

python3 gRecordsActivity_ijson_date.py -i Records.json -o outputdir -a 2022-01-01 -b 2022-02-03

will extract dates between 2022-01-01 and 2022-02-03 (inclusive) into the outputdir given.

This can be helpful if there are years of data in the Takeout but only certain days are of interest.

Once you have the output directory containing the per day .KML and .TSVs, you can load the TSV for the day of interest into a spreadsheet app (eg Excel, OpenOffice Calc) and search for DetectedActivity transitions etc.

Example TSV Output

Note1: Rows are sorted by "element_timestamp". In the picture, there are 3 rows shown selected with the same "element_timestamp" value but they have 3 distinct "activity_timestamp"s. This is an example of an element with multiple (sub)activitys. Each subactivity has its own timestamp.

Note2: The "detected_activity" column can be used to more readily detect transitions between DetectedActivitys.

Note3: The "num_subactivity_types" column shows the number of DetectedActivitys listed for each (sub)activity.

Once you have found a DetectedActivity entry of interest, you can open the corresponding .KML file in Google Earth Desktop to plot the location and view selected attributes.

Redacted Example KML output viewed in Google Earth Desktop

Note1: The map has been redacted to protect Monkey's jungle gym.

Note2: The label of each point is comprised of the "element timestamp" and the number of (sub)activity types. eg "2022-02-04T04:30:36.434Z, num_subactivity_types = 11"

Note3: The "accuracy" field can indicate how much a location plot should be trusted.

Note4: Selecting the checkbox next to the parent ISO date folder name (eg "2022-02-04") will plot all of the points in the folder. All points are not plotted on screen by default to improve Google Earth Desktop performance.

Final Thoughts

A script has been written to parse the Google Takeout "Records.json" for DetectedActivitys.

By using this script, an analyst can minimize the clutter / maximize Google Earth performance and still visualize DetectedActivity locations and transitions (eg IN_VEHICLE to STILL).

Hopefully, this script will prove useful when analyzing a user's pattern of life / looking for anomalies.

↧