Exporting Google Code issues to JIRA

Schema exploration fun!

Posted by Ian Maxon on August 22, 2015

Google Code’s shutdown is well underway now, going read only as of 3 days from the writing of this post. Getting the code itself out of a project is pretty easy, but unless you feel like exporting to Github, the bugs are pretty hard to export. In my particular case, I wanted to export them to JIRA.

I’ll walk through how I did this for the project I was trying to export, so some of it will be specific to that (staticly set names, etc.). If you’re wanting to do the same thing, obviously those sort of things should be changed.

The export format from Google Takeout for Google Code projects is a large JSON file. The format is roughly like this (JIRA-relevant portions included only)

        { "kind": "projecthosting#user"
            ...
          "projects" : [ {
            "kind" : "projecthosting#project",
            ...
            "issues" : {
                ...
                "items":[ }
                    "kind" : projecthosting#issue",
                    "id": 1,
                    "title" : ...
                    "summary" : ...
                    ...
                    "status" : ...
                    "state" : ...
                    "labels" : [ ...
                    "author" : {
                            "kind" : "projecthosting#issuePerson",
                            "name" ...
                    }
                    "owner" : {
                            "kind" : ""projecthosting#issuePerson",
                            ...
                    ...
                    "projectId": ...
                    "comments" : {
                        "kind" : "projecthosting#issueCommentList",
                        "items" : [ {
                            ...
                            "content": ...
                        }
                    }
                  }
             }
         }

There’s quite a lot of stuff there, but only a portion that’s interesting for JIRA. It’s also worth noting that all the projects you are an admin for on Google Code will appear in the Takeout file. JIRA’s JSON import format is documented, however there’s some things that are not explicitly said about it. For one, if a field is present, it shouldn’t be null, the key should not appear. Similar goes for empty lists and so on.

Another hiccup is that JIRA requires all users mentioned/used in the import to be registered within JIRA. Not sure if there’s a bulk way to do this, but I wasn’t aware of one when doing this (If you know, please comment!). I chose to just use a dummy importer user.

That being said, we need a tool to transform beween both of these formats. Oddly enough, the project I was exporting (AsterixDB) turned out to be well suited to the task. AsterixDB has a lot of features, but here we just kind of want to use it like XQuery, but for JSON. In short, we treat the Google Takeout export as an external dataset, and then query it to produce the transformed data.

To install AsterixDB for this purpose, using the normal installer for a cluster (managix) is kind of overkill. If you have Docker, you could go ahead and pull parshimers/asterix-playground and run it with the proper ports forwarded and a folder with your Takeout JSON present, like docker run -p 19001:19001 -p 19002:19002 -v ~/Work/:/files -t parshimers/asterixdb-playground . This image just starts up a single-node AsterixDB instance. This vagrant will also get AsterixDB started up with a minimum amount of configuration (just place the Takeout file where you clone, in this case). If neither of those appeal to you though, feel free to follow the single-node instructions, but be sure to use the latest snapshot instead of old stable.

First, let’s roughly define the fields we want in the returned data. This isn’t striclty necessary (AsterixDB doesn’t require schema), but I found it useful.

        drop dataverse issues if exists;
        create dataverse issues if not exists;
        use dataverse issues;

        create type emptyType as open { }

        create type jiraComment as open{
           body: string,
           author: string,
           created: string?
        }

        create type jiraIssue as open{
           "key": string,
           description: string,
           status: string,
           reporter: string,
           created: string?,
           updated: string?,
           summary: string,
           comments: [jiraComment]
        }


        create type jiraProject as open {
            name: string,
            "key": string,
            description: string?,
            components: [string],
            issues: [jiraIssue]
        }

Next, we’ll create a couple of datasets, depending on how many projects you need to export that are contained in your Takeout file. For me, I was interested in two projects (AsterixDB and Hyracks), so I made a dataset for each. We insert into these datasets from the external dataset that is backed by the Google Takeout export. Note the file path in the create external dataset statement, you will probably have to change that to wherever your Takeout file is located from AsterixDB’s perspective.

        use dataverse issues;

        drop  dataset BigIssuesExternal if exists;
        create external dataset BigIssuesExternal(emptyType) 
        using localfs
            (("path"="127.0.0.1:///files/GoogleCodeProjectHosting.json"),
            ("format"="adm"));

        drop dataset jiraIssues if exists;
        create dataset jiraIssues(jiraIssue) primary key "key";

        insert into dataset jiraIssues(
        for $x in dataset BigIssuesExternal
        for $proj in $x.projects where $proj.name = "hyracks"
        for $issue in $proj.issues.items
        return {
           "key" :  string-concat(["ASTERIXDB-",string($issue.id)]),
           "status": $issue.status,
           "created": $issue.published,
           "updated": $issue.updated,
           "reporter": "asterixdb-importer",
           "summary": $issue.summary,
           "description": $issue.title,
           "components": ["Hyracks"],
           "comments":
               for $c in $issue.comments.items
               return{
                   "body": $c.content,
                   "author": "asterixdb-importer",
                   "created": $c.published
               }

        }
        );

        insert into dataset jiraIssues(
        for $x in dataset BigIssuesExternal
        for $proj in $x.projects where $proj.name = "asterixdb"
        for $issue in $proj.issues.items
        return {
           "key" :  string-concat(["ASTERIXDB-",string(
             $issue.id
             + count(
                   for $a in dataset jiraIssues return $a
                )+1
           )]),
           "status": $issue.status,
           "created": $issue.published,
           "updated": $issue.updated,
           "reporter": "asterixdb-importer",
           "summary": $issue.summary,
           "description": $issue.title,
           "components": ["AsterixDB"],
           "comments":
               for $c in $issue.comments.items
               return{
                   "body": $c.content,
                   "author": "asterixdb-importer",
                   "created": $c.published
               }

        }
        );

As is hopefully evident, I’m just picking out what I want from the Takeout file, and loading it into a dataset, so it’s formatted nicely. Now that I have the data how I like it, I just need to get it out of AsterixDB, wrapped so that JIRA will take it. Using the HTTP API is the best way to do this, I chose to use curl to access it

        curl --data 'use dataverse issues;

        {
        "projects" : [ {
            "name" : "asterixdb",
            "key": "ASTERIXDB",
            "description": "AsterixDB is a semi-structured, parallel data management platform.",
            "components": ["AsterixDB","Hyracks"],
            "issues": for $y in dataset jiraIssues return $y
        } ]
        }' -H "Accept: application/x-adm" http://localhost:19002/aql > /files/jira-issues.json

The last step is to remove the wrapper object that AsterixDB returns results in- JIRA won’t accept this unfortunately. I actually used vi to just take it out, but, sed -ie '1s/^.//' jira-issues.json to remove the beginning of the list, and sed -i '$ s/.$//' jira-issues.json to remove the end should do the same thing.

After a pass through jsonlint ( ☺️ ), our final export looks something like this

        {
          "projects": [
            {   
              "name": "asterixdb",
              "key": "ASTERIXDB",
              "description": "AsterixDB is a semi-structured, parallel data management platform.",
              "components": [
                "AsterixDB",
                "Hyracks"
              ],  
              "issues": [
                {   
                  "key": "ASTERIXDB-1",
                  "description": "Need a Hyracks Concepts Guide",
                  "status": "New",
                  "reporter": "asterixdb-importer",
                  "created": "2010-10-16T18:37:10.000Z",
                  "updated": "2010-10-16T18:37:10.000Z",
                  "summary": "Need a Hyracks Concepts Guide",
                  "comments": [
            ...