{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Flights Analysis\n", "The goal of this post is to visualize flights taken from Google location data using Python\n", "* We will create a .gif progressing through individual flights and a .png of all flights\n", "* This post utilizes code from Tyler Hartley's [visualizing location history blog post](http://beneathdata.com/how-to/visualizing-my-location-history/) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview \n", "\n", "1. Setup\n", " * download data\n", " * install modules\n", "2. Data Wrangling\n", " * data extraction\n", " * data exploration\n", " * data manipulation\n", "3. Flight Algorithm\n", "4. Visualize Flights\n", " * create individual .png of each flight to combine into .gif\n", " * create .png of all flights plotted at once\n", "5. Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup\n", "1. Use Google Takout to download your Google location history\n", "* If you've previously enabled Google location reporting on your smartphone, your GPS data will be periodically uploaded to [Google's servers](https://support.google.com/accounts/answer/3118687?hl=en&visit_id=1-636109809748631344-4285616029&rd=1). Use Google Takeout to download your location history.\n", " * The decisions of when and how to upload this data are entirely obfuscated to the end user, but as you'll see below, Android appears to upload a GPS location every 60 seconds. That's plenty of data to work with.\n", "2. After downloading your data, install the required modules\n", "\n", "## Google Takeout\n", "Google [Takeout](https://takeout.google.com/settings/takeout) is a Google service that allows users to export any personal Google data. We'll use Takeout to download our raw location history as a one-time snapshot. Since Latitude was retired, no API exists to access location history in real-time. \n", "\n", "Download location data:\n", "* Go to [takeout](https://www.google.com/settings/takeout). Uncheck all services except \"Location History\" \n", "* The data will be in a json format, which works great for us. Download it in your favorite compression type.\n", "* When Google has finished creating your archive, you'll get an email notification and a link to download. \n", "* Download and unzip the file, and you should be looking at a `LocationHistory.json` file. Working with location data in [Pandas](http://pandas.pydata.org/). Pandas is an incredibly powerful tool that simplifies working with complex datatypes and performing statistical analysis in the style of R. Chris Albon has great primers on using Pandas [here](http://chrisalbon.com/#Python) under the \"Data Wrangling\" section.\n", "\n", "## Install modules\n", "* If you use Anaconda to manage your Python packages, I recommend creating a virtual environment with anaconda to install the dependencies. Copying the lines below the instruction into the terminal creates the environment, requirements.txt, etc.\n", " * conda create -n test-env python=3.5 anaconda\n", " * source activate test-env\n", "* make a requirements.txt file for dependencies \n", " * (echo descartes; echo IPython; echo shapely; echo fiona; echo Basemap) >> requirements.txt \n", "* install requirements.txt\n", " * conda install --yes --file requirements.txt\n", "* Windows users:\n", " * create a python2.7 environment to install relevant modules\n", " * conda create -n py27 python=2.7 anaconda\n", " * source activate py27\n", " * Download and install [Microsoft Visual C++ Compiler for Python 2.7 ](https://www.microsoft.com/EN-US/DOWNLOAD/DETAILS.ASPX?ID=44266)\n", " * install fiona, Shapely, GDAL, descartes, and basemap from https://www.lfd.uci.edu/~gohlke/pythonlibs/\n", "\n", "After completing the setup, we'll read in the `LocationHistory.json` file from Google Takeout and create a DataFrame." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import division\n", "from utils import * " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Wrangling\n", "\n", "## Data Extraction" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "with open('data/LocationHistory/2018/LocationHistory.json', 'r') as location_file:\n", " raw = json.loads(location_file.read())\n", "\n", "# use location_data as an abbreviation for location data\n", "location_data = pd.DataFrame(raw['locations'])\n", "del raw #free up some memory\n", "\n", "# convert to typical units\n", "location_data['latitudeE7'] = location_data['latitudeE7']/float(1e7) \n", "location_data['longitudeE7'] = location_data['longitudeE7']/float(1e7)\n", "\n", "# convert timestampMs to seconds\n", "location_data['timestampMs'] = location_data['timestampMs'].map(lambda x: float(x)/1000) \n", "location_data['datetime'] = location_data.timestampMs.map(datetime.datetime.fromtimestamp)\n", "\n", "# Rename fields based on the conversions\n", "location_data.rename(columns={'latitudeE7':'latitude',\n", " 'longitudeE7':'longitude',\n", " 'timestampMs':'timestamp'}, inplace=True)\n", "\n", "# Ignore locations with accuracy estimates over 1000m\n", "location_data = location_data[location_data.accuracy < 1000]\n", "location_data.reset_index(drop=True, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore Data\n", "* view data and datatypes" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "accuracy int64\n", "activity object\n", "altitude float64\n", "heading float64\n", "latitude float64\n", "longitude float64\n", "timestamp float64\n", "velocity float64\n", "verticalAccuracy float64\n", "datetime datetime64[ns]\n", "dtype: object\n" ] }, { "data": { "text/html": [ "
\n", " | accuracy | \n", "altitude | \n", "heading | \n", "latitude | \n", "longitude | \n", "timestamp | \n", "velocity | \n", "verticalAccuracy | \n", "
---|---|---|---|---|---|---|---|---|
count | \n", "745660.000000 | \n", "101260.000000 | \n", "44100.000000 | \n", "745660.000000 | \n", "745660.000000 | \n", "7.456600e+05 | \n", "58874.000000 | \n", "4921.000000 | \n", "
mean | \n", "58.997173 | \n", "67.057525 | \n", "186.597551 | \n", "37.748367 | \n", "-102.506537 | \n", "1.417774e+09 | \n", "7.769678 | \n", "23.099776 | \n", "
std | \n", "125.358984 | \n", "242.209547 | \n", "101.643968 | \n", "9.004123 | \n", "23.609836 | \n", "3.356510e+07 | \n", "11.790783 | \n", "45.139324 | \n", "
min | \n", "1.000000 | \n", "-715.000000 | \n", "0.000000 | \n", "13.689757 | \n", "-123.260751 | \n", "1.376790e+09 | \n", "0.000000 | \n", "2.000000 | \n", "
25% | \n", "22.000000 | \n", "-18.000000 | \n", "98.000000 | \n", "29.817569 | \n", "-122.306596 | \n", "1.391259e+09 | \n", "0.000000 | \n", "2.000000 | \n", "
50% | \n", "31.000000 | \n", "2.000000 | \n", "181.000000 | \n", "29.986634 | \n", "-95.246060 | \n", "1.413249e+09 | \n", "1.000000 | \n", "2.000000 | \n", "
75% | \n", "50.000000 | \n", "60.000000 | \n", "270.000000 | \n", "47.664284 | \n", "-94.995603 | \n", "1.428049e+09 | \n", "13.000000 | \n", "30.000000 | \n", "
max | \n", "999.000000 | \n", "6738.000000 | \n", "359.000000 | \n", "50.105984 | \n", "23.782015 | \n", "1.519330e+09 | \n", "208.000000 | \n", "473.000000 | \n", "