Code
from __future__ import division
from utils import * The goal of this post is to visualize flights taken from Google location data using Python * We will create a .gif progressing through individual flights and a .png of all flights * This post utilizes code from Tyler Hartley’s visualizing location history blog post
Google Takeout is a Google service that allows users to export any personal Google data. We’ll use Takeout to download our raw location history as a one-time snapshot. Since Latitude was retired, no API exists to access location history in real-time.
Download location data: * Go to takeout. Uncheck all services except “Location History” * The data will be in a json format, which works great for us. Download it in your favorite compression type. * When Google has finished creating your archive, you’ll get an email notification and a link to download. * Download and unzip the file, and you should be looking at a LocationHistory.json file. Working with location data in Pandas. Pandas is an incredibly powerful tool that simplifies working with complex datatypes and performing statistical analysis in the style of R. Chris Albon has great primers on using Pandas here under the “Data Wrangling” section.
After completing the setup, we’ll read in the LocationHistory.json file from Google Takeout and create a DataFrame.
with open('data/LocationHistory/2018/LocationHistory.json', 'r') as location_file:
raw = json.loads(location_file.read())
# use location_data as an abbreviation for location data
location_data = pd.DataFrame(raw['locations'])
del raw #free up some memory
# convert to typical units
location_data['latitudeE7'] = location_data['latitudeE7']/float(1e7)
location_data['longitudeE7'] = location_data['longitudeE7']/float(1e7)
# convert timestampMs to seconds
location_data['timestampMs'] = location_data['timestampMs'].map(lambda x: float(x)/1000)
location_data['datetime'] = location_data.timestampMs.map(datetime.datetime.fromtimestamp)
# Rename fields based on the conversions
location_data.rename(columns={'latitudeE7':'latitude',
'longitudeE7':'longitude',
'timestampMs':'timestamp'}, inplace=True)
# Ignore locations with accuracy estimates over 1000m
location_data = location_data[location_data.accuracy < 1000]
location_data.reset_index(drop=True, inplace=True)accuracy int64
activity object
altitude float64
heading float64
latitude float64
longitude float64
timestamp float64
velocity float64
verticalAccuracy float64
datetime datetime64[ns]
dtype: object
| accuracy | altitude | heading | latitude | longitude | timestamp | velocity | verticalAccuracy | |
|---|---|---|---|---|---|---|---|---|
| count | 745660.000000 | 101260.000000 | 44100.000000 | 745660.000000 | 745660.000000 | 7.456600e+05 | 58874.000000 | 4921.000000 |
| mean | 58.997173 | 67.057525 | 186.597551 | 37.748367 | -102.506537 | 1.417774e+09 | 7.769678 | 23.099776 |
| std | 125.358984 | 242.209547 | 101.643968 | 9.004123 | 23.609836 | 3.356510e+07 | 11.790783 | 45.139324 |
| min | 1.000000 | -715.000000 | 0.000000 | 13.689757 | -123.260751 | 1.376790e+09 | 0.000000 | 2.000000 |
| 25% | 22.000000 | -18.000000 | 98.000000 | 29.817569 | -122.306596 | 1.391259e+09 | 0.000000 | 2.000000 |
| 50% | 31.000000 | 2.000000 | 181.000000 | 29.986634 | -95.246060 | 1.413249e+09 | 1.000000 | 2.000000 |
| 75% | 50.000000 | 60.000000 | 270.000000 | 47.664284 | -94.995603 | 1.428049e+09 | 13.000000 | 30.000000 |
| max | 999.000000 | 6738.000000 | 359.000000 | 50.105984 | 23.782015 | 1.519330e+09 | 208.000000 | 473.000000 |
print("earliest observed date: {}".format(min(location_data["datetime"]).strftime('%m-%d-%Y')))
print("latest observed date: {}".format(max(location_data["datetime"]).strftime('%m-%d-%Y')))
earliest_obs = min(location_data["datetime"]).strftime('%m-%d-%Y')
latest_obs = max(location_data["datetime"]).strftime('%m-%d-%Y')earliest observed date: 08-17-2013
latest observed date: 02-22-2018
Consult this post for more info about degrees and radians in distance calculation.
degrees_to_radians = np.pi/180.0
location_data['phi'] = (90.0 - location_data.latitude) * degrees_to_radians
location_data['theta'] = location_data.longitude * degrees_to_radians
# Compute distance between two GPS points on a unit sphere
location_data['distance'] = np.arccos(np.sin(location_data.phi)*np.sin(location_data.phi.shift(-1)) *
np.cos(location_data.theta - location_data.theta.shift(-1)) +
np.cos(location_data.phi)*np.cos(location_data.phi.shift(-1))) * 6378.100
# 6378.100 is the radius of earth in kmflight_data = pd.DataFrame(data=
{'end_lat':location_data.latitude,
'end_lon':location_data.longitude,
'end_datetime':location_data.datetime,
'distance':location_data.distance,
'speed':location_data.speed,
'start_lat':location_data.shift(-1).latitude,
'start_lon':location_data.shift(-1).longitude,
'start_datetime':location_data.shift(-1).datetime,
}).reset_index(drop=True)flights = flight_data[(flight_data.speed > 40) & (flight_data.distance > 80)].reset_index()
# Combine instances of flight that are directly adjacent
# Find the indices of flights that are directly adjacent
_f = flights[flights['index'].diff() == 1]
adjacent_flight_groups = np.split(_f, (_f['index'].diff() > 1).nonzero()[0])
# Now iterate through the groups of adjacent flights and merge their data into
# one flight entry
for flight_group in adjacent_flight_groups:
idx = flight_group.index[0] - 1 #the index of flight termination
flights.loc[idx, ['start_lat', 'start_lon', 'start_datetime']] = [flight_group.iloc[-1].start_lat,
flight_group.iloc[-1].start_lon,
flight_group.iloc[-1].start_datetime]
# Recompute total distance of flight
flights.loc[idx, 'distance'] = distance_on_unit_sphere(flights.loc[idx].start_lat,
flights.loc[idx].start_lon,
flights.loc[idx].end_lat,
flights.loc[idx].end_lon)*6378.1
# Now remove the "flight" entries we don't need anymore.
flights = flights.drop(_f.index).reset_index(drop=True)
# Finally, we can be confident that we've removed instances of flights broken up by
# GPS data points during flight. We can now be more liberal in our constraints for what
# constitutes flight. Let's remove any instances below 200km as a final measure.
flights = flights[flights.distance > 200].reset_index(drop=True)This algorithm worked nearly 100% of the time for me with less than 5 false positives or negatives; however, the adjacency-criteria of the algorithm is fairly brittle. The core of it centers around the assumption that inter-flight GPS data will be directly adjacent to one another. That’s why the initial screening on line 1 of the previous cell had to be so liberal.
Now, the flights DataFrame contains only instances of true flights which facilitates plotting with Matplotlib’s Basemap. If we plot on a flat projection like tmerc, the drawgreatcircle function will produce a true path arc just like we see in the in-flight magazines.
flights2018 within the output directory to save all .pngsfig = plt.figure(figsize=(18,12))
current_date = time.strftime("printed: %a, %d %b %Y", time.localtime())
# Plotting across the international dateline is tough.
# One option is to break up flights by hemisphere.
# Otherwise, you'd need to plot using a different projection like 'robin' and
# potentially center on the Int'l Dateline (lon_0=-180)
# Western Hemisphere Flights
# flights = flights[(flights.start_lon < 0) & (flights.end_lon < 0)]
# Eastern Hemisphere Flights
# flights = flights[(flights.start_lon > 0) & (flights.end_lon > 0)]
xbuf = 0.2
ybuf = 0.35
min_lat = np.min([flights.end_lat.min(), flights.start_lat.min()])
min_lon = np.min([flights.end_lon.min(), flights.start_lon.min()])
max_lat = np.max([flights.end_lat.max(), flights.start_lat.max()])
max_lon = np.max([flights.end_lon.max(), flights.start_lon.max()])
width = max_lon - min_lon
height = max_lat - min_lat
m = Basemap(llcrnrlon=min_lon - width* xbuf,
llcrnrlat=min_lat - height*ybuf,
urcrnrlon=max_lon + width* xbuf,
urcrnrlat=max_lat + height*ybuf,
projection='merc',
resolution='l',
lat_0=min_lat + height/2,
lon_0=min_lon + width/2,)
m.drawmapboundary(fill_color='#EBF4FA')
m.drawcoastlines()
m.drawstates()
m.drawcountries()
m.fillcontinents()
for idx, f in flights.iterrows():
m.drawgreatcircle(f.start_lon, f.start_lat, f.end_lon,
f.end_lat, linewidth=3, alpha=1, color='#ffd700' )
m.plot(*m(f.start_lon, f.start_lat), color='g', alpha=0.8, marker='o')
m.plot(*m(f.end_lon, f.end_lat), color='r', alpha=0.5, marker='o' )
fig.text(0.125, .24, "kivanpolimis.com", color='#555555', fontsize=15, ha='left')
fig.text(0.125,.2,
"Plotted using Python, Basemap \n Collected from {0} to {1} on Android \n {2}".
format(earliest_obs, latest_obs, current_date),
ha='left', color='#555555', style='italic')
plt.savefig('output/flights2018/flights_{}.png'.format(idx),
dpi=150, frameon=False, transparent=False,
bbox_inches='tight', pad_inches=0.2)
m.drawgreatcircle(f.start_lon, f.start_lat, f.end_lon, f.end_lat,
linewidth=3, alpha=0.5, color='#800080' )
m.drawgreatcircle(f.start_lon, f.start_lat, f.end_lon, f.end_lat,
linewidth=3, alpha=0.5, color='b' )
m.plot(*m(f.start_lon, f.start_lat), color='k', alpha=0.8, marker='o')
m.plot(*m(f.end_lon, f.end_lat), color='k', alpha=0.5, marker='o' )
flights2018 directory with the glob# code to create .gif from:
# http://superfluoussextant.com/making-gifs-with-python.html
gif_name = 'flights2018'
# Get all the .pngs in the `flights2018` directory
file_list = glob.glob('output/flights2018/*.png')
# Sort the images by number
list.sort(file_list, key=lambda x: int(x.split('_')[1].split('.png')[0]))
with open('image_list.txt', 'w') as file:
for item in file_list:
file.write("%s\n" % item)
# On Windows convert is 'magick'
os.system('magick -loop 0 -delay "10" @image_list.txt output/{}.gif'.format(gif_name))
# On Unix/Mac use convert
#os.system('convert -loop 0 -delay "10" @image_list.txt output/{}.gif'.format(gif_name))0
fig = plt.figure(figsize=(18,12))
current_date = time.strftime("printed: %a, %d %b %Y", time.localtime())
png_name = 'flights2018'
xbuf = 0.2
ybuf = 0.35
min_lat = np.min([flights.end_lat.min(), flights.start_lat.min()])
min_lon = np.min([flights.end_lon.min(), flights.start_lon.min()])
max_lat = np.max([flights.end_lat.max(), flights.start_lat.max()])
max_lon = np.max([flights.end_lon.max(), flights.start_lon.max()])
width = max_lon - min_lon
height = max_lat - min_lat
m = Basemap(llcrnrlon=min_lon - width* xbuf,
llcrnrlat=min_lat - height*ybuf,
urcrnrlon=max_lon + width* xbuf,
urcrnrlat=max_lat + height*ybuf,
projection='merc',
resolution='l',
lat_0=min_lat + height/2,
lon_0=min_lon + width/2,)
m.drawmapboundary(fill_color='#EBF4FA')
m.drawcoastlines()
m.drawstates()
m.drawcountries()
m.fillcontinents()
for idx, f in flights.iterrows():
m.drawgreatcircle(f.start_lon, f.start_lat, f.end_lon, f.end_lat,
linewidth=3, alpha=0.4, color='b')
m.plot(*m(f.start_lon, f.start_lat), color='g', alpha=0.8, marker='o')
m.plot(*m(f.end_lon, f.end_lat), color='r', alpha=0.5, marker='o' )
fig.text(0.125, .24, "kivanpolimis.com", color='#555555', fontsize=15, ha='left')
fig.text(0.125,.2,
"Plotted using Python, Basemap \n Collected from {0} to {1} on Android \n {2}".
format(earliest_obs, latest_obs, current_date),
ha='left', color='#555555', style='italic')
plt.savefig('output/{}.png'.format(png_name),
dpi=150, frameon=False, transparent=False, bbox_inches='tight', pad_inches=0.2)You can leverage this notebook, scripts, and cited sources to reproduce these maps.
I’m working on creating functions to automate these visualizations
Download this notebook, or see a static view here
System and module version information:
Python version:
sys.version_info(major=2, minor=7, micro=14, releaselevel='final', serial=0)
last updated: Thu, 15 Mar 2018 04:28