As you may have seen in parts I and II, I've been thinking about writing some Python code to grab, parse, modify, and visualise KGML files.
tl;dr - I wrote it, and it's up here (https://github.com/widdowquinn/KGML) in rough form. Have fun, and let me know if you've got any problems or suggestions.
StructureThe module has four main files, KGML_parser.py, KGML_pathway.py, KGML_scrape.py and KGML_vis.py. There's also a unit test file test_KGML.py, with example files in the KEGG subdirectory.
KGML_pathway.py contains classes that collectively represent a KGML pathway map. The model follows the KGML specification quite closely, having a 'root' Pathway object that contains Entry, Reaction and Relation objects. These are organised just like KGML's hierarchy, which makes it nice and easy to use ElementTree to recombine elements (possibly after modification or trimming) into valid KGML for output. There's a certain amount of cross-referencing between reactions, relations and other entries to maintain self-consistency, and quite a few property decorations so that we can handle 'bounding boxes' for graphics elements, composite features, and have a more sensible internal representation for element property values, but it's all fairly straightforward.
KGML_parser.py provides a parser that returns a Pathway object. We only expect one pathway map per KGML file, so the read() function throws an error if it finds more. ElementTree is used to parse the KGML itself.
KGML_vis.py mainly provides a KGMLCanvas object that is a Reportlab Canvas-based representation of the pathway map. The idea is to be as simple as possible for basic use, so that you instantiate a KGMLCanvas with a Pathway, provide some formatting options, and call the draw() method. Since we may want to write out KGML that maintains modifications we make to the pathway and its representation, all changes to the representation of the pathway are made through the Pathway object, directly. Those changes can then be saved by writing the KGML returned by the Pathway.get_kgml() method to a file.
KGML_scrape.py provides helper functions to grab KGML from the KEGG site in raw form, as a stream/handle, as a Pathway object, or to write it to a file. There are also a couple of handy lists of the metabolic and non-metabolic pathway IDs (as at January 2013).
ExamplesThe simplest useful operation is probably just downloading a given KEGG pathway map to a local KGML file. For this, you can use one of the utility functions for grabbing data from KEGG, found in KGML_scrape.py:
This two-liner grabs the ddc00190 pathway map, and writes it to ddc00190.kgml. From there we can treat it like any other KGML file in any pipeline we like.
Alternatively, if we want to deal directly with KGML in our code, and don't want to write an intermediate file, we can use KGML_scrape's functions to obtain the KGML as a handle, a string, or a KGMLPathway object, as we can see from the iPython session:
which is convenient for interactive use.
To see the different forms of representation for one of the 'large' (ko01100, ko01110 and ko01120) maps, we can use this example code:
Here, the (near-)default rendering option is to show only the KGML entries with graphics elements. This renders at full-size, and mutes the colouring of any compounds that don't take part in any reaction for which there is a connecting ortholog.
|KGML element-only rendering of ko01100|
We can also render only the KEGG-drawn .png map, which I prefer for the formatting of the map elements that indicate where the other more specific KEGG pathway maps connect to this large metabolic overview.
|KEGG-drawn .png-only rendering of ko01100|
Finally, we render a hybrid, which retains the KEGG-drawn .png, but overlays the KGML information (which we can also modify).
|Hybrid KEGG-drawn .png with KGML element overlay for ko01100|
For the next example we look at a similar rendering for a non-metabolic pathway, for which we need the KEGG-drawn .png to make sense of the KGML. I'm going for some blatant self-promotion and using Biopython's ColorSpiral utility (more on that, here).
Just rendering the KGML elements shows exactly what is present, and can be modified:
|KGML-only rendering of ko03070|
|KEGG image map .png rendering of ko03070|
And overlaying our data takes advantage of this image for context, but lets us add our own information:
|Hybrid rendering of ko03070|
Now for something a little more complicated. Let's try to enhance the visibility of a set of pathways. The ko01100 pathway map should contain glycolysis and the TCA cycle, so we'll try to show the routes through these processes as thicker lines than usual:
which renders like this:
|ko01100 with selected elements thickened|
|ko01100 with unselected elements thinned|
|ko01100 with unselected elements rendered to grey|