Updated Script to Download Google History

A little over a week ago, I completed the first version of a script (in the form of a JavaScript bookmarklet) that allows you to download your Google Web History. Shortly afterwards, I posted some comments on a few other pages with similar scripts. Someone saw one of my comments and posted a comment on my article (posted as metal123, this person normally goes by the handle “Naka”).

Naka informed me of additional parameters for the Google Web History RSS feed of which I was unaware: yr, month, and day. I may have actually seen them earlier, but their significance and how I could use them did not dawn on me until I read Naka’s comment thoroughly. There was a major problem with the first revision of my script: it would only successfully obtain about 4k or so records of history.

No matter what I did with the parameters I was using (namely num and start), the script would only download a partial history. However, with some modifications to the code, I was able to take advantage of the date parameters, and update the script to download a Google History in its entirety.


Installation and Usage

The usage is the same, with the exception of some additional new features, which are fairly easy to use (you will need Flash for it to work). To use it, drag and drop this bookmarklet to your bookmarks bar:

Download Google Web History

Then visit https://www.google.com/history, log into your Google History, and click the bookmarklet. Accept any warnings that may come up about insecure content being loaded (it loads some JavaScript and Flash movie from my server), and click the bookmarklet again to begin the download process. For more info about the security dialogs you may encounter, refer to the usage section of the original post.

Please note that no private information of yours is ever transmitted/received to/from any server other than Google’s. Read the privacy info on my Google History Download script to learn more. The new download dialog box looks like this:

Be prepared to wait quite a while for your full history to download, if you want to get it all. Just to give you an idea of what to expect: I have over 40k searches and it downloaded over 135k records and the CSV was about 28MB in size. If you get sick of waiting, click the Cancel button and get what has been downloaded so far.

Notice that a cancel button has been added, as well as a note at the bottom of the window indicating the oldest record that has been downloaded. When you cancel the download, you can download a CSV with the records completed thus far. You’ll also be given the option to resume your download:

If ever your Google prompts for a password on the RSS feed behind the scenes, a dialog will appear with instructions for logging back into your Google Web History and resuming the download:

You may see this dialog if you have your history up and are idle for a while. It does also appear sometimes when a lot of requests are made to the RSS feed (which the script does indeed do).


New features

I’ve included a number of improvements in the new version:

  • Now obtains the entire Google Web History
  • Given that it may take a long time to download one’s entire history, I’ve added a cancel button
  • There is also a button to resume, if the download is cancelled
  • History login timeout is more gracefully handled, resuming where it left off after the user logs back into history using another browser window
  • As records are being downloaded, the oldest date loaded is displayed at the bottom of the window — this helps to know how much history has been gathered as it progresses


Google Web History RSS Parameters

Here’s the full list of RSS feed parameters of which I’m now aware, including additional ones Naka mentioned:

  • num: Number of records to output (1000 is the most you can get a time)
  • start: The record number starting point (starting from 1) — Note, using num=1000 and incrementing the start parameter by 1000 will only get you so far (to about 4k records or so)
  • month: 2-digit month
  • day: 2-digit day of the month
  • yr: 4-digit year
  • max: Some kind of modified UNIX timestamp or something — Not particularly useful without fully understanding what it is, and it appears to have some of the same limitations as start
  • st=web: Limit to web search
  • st=img: Limit to image search
  • st=frg: Limit to product (formerly known as Froogle) search
  • st=ad: Limit to sponsored ad links
  • st=vid: Limit to video search
  • st=maps: Limit to map search
  • st=blogs: Limit to blog search
  • st=books: Limit to book search
  • st=news: Limit to news search


Additional Technical Information

The new version of my script starts off by downloading the first 1000 records as it did before. However, from that point on, it loads the next 1000 records by setting the yr, month, and day parameters to the date of the 1000th entry. I did run into some odd problems with consecutive days with a lot of history.

In a few instances on days with a lot of history, the script would obtain the date of the 1000th record to load the next 1000. However, instead of starting on that day, it would start on the prior day. Then the original day would be the 1000th record again, and the script would end up in an infinite loop.

For example, when the starting date for the feed was set to July 1, 2009, the date of the 1000th record for that request was June 30, 2009. However, when the RSS feed parameters yr=2009&month=06&day=30 were used to load records starting from June 30, the RSS feed once again started with July 1. This resulted in an infinite loop, because the date of the 1000th record was June 30, 2009 and the feed parameters were being set to yr=2009&month=06&day=30 over and over again.

To avoid this infinite loop, I put in some checks for the prior date that was loaded and if it is the same as the 1000th entry, it decrements the date by a day. Unfortunately, this could potentially result in some records in the download being lost. However, this is a lot better than an infinite loop. Fortunately, the occurrence of this problem appears to be seldom. The code also checks for repeat history entries (by checking the date/time), to avoid duplicity in the results.

About GeekLad

Geeklad is a technology enthusiast and programming hobbyist. Occasionally he will put together useful little bits of code (be it JavaScript or PHP) and share them with the world. He also enjoys creating and sharing howtos, describing how to do the things people want to do with their computers.
Tagged , . Bookmark the permalink.
  • Martin Dluhos

    Hi, I am having the same problem as David and Denver. The problem is that the URL has changed from google.com/history to history.google.com/history and the script does not parse the updated URL properly anymore. Could you please change the script to recognize the new URL? This should require just a quick fix in the regex. Thanks!

  • http://www.facebook.com/davidalex250 David Alexandre

    You have to update the script.
    Change “https://www.google.com/history” to “https://history.google.com/history/”

    Please…

  • http://yah-underworld.com/ jeana

    vizual basic
    wow awsome topic of the day great i found this article bookmarking it

  • http://yah-underworld.com/ jeana

    visual vasic
    great post by the author nice job!

  • Richard

    Unfortunately, like David Alexandre, I am late to the party; please can you update your script to allow for its continued use?

  • theanphibian

    On using this with Firefox:
    1. the security exception can’t be done without extra work. It pops up, and allows you to make an exception, but it won’t apply it for the next time you try the bookmarklet. You need an addon to this this: https://addons.mozilla.org/en-us/firefox/addon/toggle-mixed-active-content/
    2. the domain still needs to be changed. AFAIK, it has not yet been updated to resolve this yet.

  • Guest
  • desperado

    Geeklad, please update the script! Please… I am desperate.

  • noob

    Is GeekLad still alive?

  • noob

    Please update the script. I am considering divorcing Google, but I want to keep all my data first.

  • Soham Thaker

    Gives an error message always:
    Instructions
    Please visit https://www.google.com/history/ and log into your Google Account. Then click the bookmarklet again.
    I use Chrome with Windows 8.
    Do you have any solution?