Naka informed me of additional parameters for the Google Web History RSS feed of which I was unaware: yr, month, and day. I may have actually seen them earlier, but their significance and how I could use them did not dawn on me until I read Naka’s comment thoroughly. There was a major problem with the first revision of my script: it would only successfully obtain about 4k or so records of history.
No matter what I did with the parameters I was using (namely num and start), the script would only download a partial history. However, with some modifications to the code, I was able to take advantage of the date parameters, and update the script to download a Google History in its entirety.
Installation and Usage
The usage is the same, with the exception of some additional new features, which are fairly easy to use (you will need Flash for it to work). To use it, drag and drop this bookmarklet to your bookmarks bar:
Please note that no private information of yours is ever transmitted/received to/from any server other than Google’s. Read the privacy info on my Google History Download script to learn more. The new download dialog box looks like this:
Be prepared to wait quite a while for your full history to download, if you want to get it all. Just to give you an idea of what to expect: I have over 40k searches and it downloaded over 135k records and the CSV was about 28MB in size. If you get sick of waiting, click the Cancel button and get what has been downloaded so far.
Notice that a cancel button has been added, as well as a note at the bottom of the window indicating the oldest record that has been downloaded. When you cancel the download, you can download a CSV with the records completed thus far. You’ll also be given the option to resume your download:
If ever your Google prompts for a password on the RSS feed behind the scenes, a dialog will appear with instructions for logging back into your Google Web History and resuming the download:
You may see this dialog if you have your history up and are idle for a while. It does also appear sometimes when a lot of requests are made to the RSS feed (which the script does indeed do).
I’ve included a number of improvements in the new version:
- Now obtains the entire Google Web History
- Given that it may take a long time to download one’s entire history, I’ve added a cancel button
- There is also a button to resume, if the download is cancelled
- History login timeout is more gracefully handled, resuming where it left off after the user logs back into history using another browser window
- As records are being downloaded, the oldest date loaded is displayed at the bottom of the window — this helps to know how much history has been gathered as it progresses
Google Web History RSS Parameters
Here’s the full list of RSS feed parameters of which I’m now aware, including additional ones Naka mentioned:
- num: Number of records to output (1000 is the most you can get a time)
- start: The record number starting point (starting from 1) — Note, using num=1000 and incrementing the start parameter by 1000 will only get you so far (to about 4k records or so)
- month: 2-digit month
- day: 2-digit day of the month
- yr: 4-digit year
- max: Some kind of modified UNIX timestamp or something — Not particularly useful without fully understanding what it is, and it appears to have some of the same limitations as start
- st=web: Limit to web search
- st=img: Limit to image search
- st=frg: Limit to product (formerly known as Froogle) search
- st=ad: Limit to sponsored ad links
- st=vid: Limit to video search
- st=maps: Limit to map search
- st=blogs: Limit to blog search
- st=books: Limit to book search
- st=news: Limit to news search
Additional Technical Information
The new version of my script starts off by downloading the first 1000 records as it did before. However, from that point on, it loads the next 1000 records by setting the yr, month, and day parameters to the date of the 1000th entry. I did run into some odd problems with consecutive days with a lot of history.
In a few instances on days with a lot of history, the script would obtain the date of the 1000th record to load the next 1000. However, instead of starting on that day, it would start on the prior day. Then the original day would be the 1000th record again, and the script would end up in an infinite loop.
For example, when the starting date for the feed was set to July 1, 2009, the date of the 1000th record for that request was June 30, 2009. However, when the RSS feed parameters yr=2009&month=06&day=30 were used to load records starting from June 30, the RSS feed once again started with July 1. This resulted in an infinite loop, because the date of the 1000th record was June 30, 2009 and the feed parameters were being set to yr=2009&month=06&day=30 over and over again.
To avoid this infinite loop, I put in some checks for the prior date that was loaded and if it is the same as the 1000th entry, it decrements the date by a day. Unfortunately, this could potentially result in some records in the download being lost. However, this is a lot better than an infinite loop. Fortunately, the occurrence of this problem appears to be seldom. The code also checks for repeat history entries (by checking the date/time), to avoid duplicity in the results.