wiki:Google_Summer_of_Code_2009

Google Summer of Code 2009

"Google Summer of Code (GSoC) is a global program that offers student developers stipends to write code for various open source projects. Google will be working with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Historically, the program has brought together nearly 2,500 students with over 180 open source projects to create millions of lines of code." GSoC has several goals:

  • get more open source code created and released for the benefit of all
  • inspire young developers to begin participating in open source development
  • help open source projects identify and bring in new developers and committers
  • provide students the opportunity to do work related to their academic pursuits during the summer
  • give students more exposure to real-world software development scenarios

Find out more

Accepted Projects

GenMAPP GSoC 2009

We are once again pooling the efforts of our colleagues and collaborators for this year's Google Summer of Code 2009. The GenMAPP organization will represent projects from Cytoscape, WikiPathways and PathVisio (see below). You'll notice that some of the projects are aimed at increasing cross-talk across these related projects. We like to get the most out of open source software development! This is a great opportunity to work at the intersection of biology and computing.

GenMAPP

GenMAPP is a pathway visualization and analysis tool for biological data. GenMAPP illustrates the relationships between various genes and proteins to help researchers understand their data in terms of connected, biological pathways. Over 21,000 people from >70 countries have registered to download the GenMAPP program. The GenMAPP group is coordinated by the Conklin Lab at the Gladstone Institutes (University of California, San Francisco). There are 430 publications that reference GenMAPP or use GenMAPP to display data in the context of biological pathways. GenMAPP is 100% open source and all new development is in Java, MySQL, Derby, XML, and Web technologies such as wikis, in collaboration with the UCSF library, BiGCaT Bioinformatics, and the Cytoscape Consortium. Our development team is composed of individuals who are both biologists and programmers, providing a unique perspective on building and using open source tools.Links: Website Wiki

Cytoscape

Cytoscape is a general network visualization tool that integrates network topology with data about the network into the visualization. Cytoscape was developed in and finds most use in the Systems Biology community. With over 2,500 downloads per month Cytoscape is rapidly becoming a standard within the community. Cytoscape consists of a core application and a plugin framework which users exploit to extend the functionality of the application in new ways. Our team consists of programmers and biologists from both academia and industry including: UC San Diego, UC San Francisco, U of Toronto, Agilent, Institute for Systems Biology, Unilever, Sloan-Kettering, Institut Pasteur, UT Health Science Center and others. Links: Website Wiki Javadoc WebStart

Wiki Pathways

WikiPathways is a wiki for biological pathways, it does for pathway archives what WikiPedia does for the encyclopedia. The wiki approach allows biologists with specific domain knowledge to easily create or update pathways. Pathways can be directly modified from a web browser using an embedded applet where you can draw genes, proteins and their interactions like in any popular drawing tool. The pathways can be used as images for publication and in data analysis tools such as GenMAPP, PathVisio and Cytoscape. There are currently about 600 pathways available, divided over 9 different species and 650 registered users. WikiPathways itself is completely open source and is built on top of MediaWiki, using PathVisio as the pathway editor. WikiPathways is developed and maintained by BiGCaT Bioinformatics (University of Maastricht) and the Conklin Lab at the Gladstone Institutes (University of California, San Francisco). Links: Website Source code

Path Visio

PathVisio is another pathway visualization tool. Like GenMAPP, it can display relations between genes, proteins and metabolites. Path Visio is focused more on the pathway creation rather than (microarray) data analysis. Path Visio started out as a test case for the development of GPML, an XML based format for storing pathways. Currently, Path Visio plays an important role as the editor applet of Wiki Pathways. Links: Website API docs Webstart


How to apply

Application Due: April 3rd

We would like to know who you are and how you think. Incorporate the following into your application:

  • Your information
    • Name, email, and website(optional)
    • Brief background: education and relevant work experience
  • Your programming interests and strengths
    • What are your languages of choice?
    • Any prior experience with open source development?
    • What do you want to learn this summer?
  • Your interest and background in biology or bioinformatics
    • Any prior exposure to biology or bioinformatics?
    • Any interest in learning a bit of biology this summer?
  • Your ideas for a project (an original idea or one expanded from our Ideas Page)
    • Provide as much detail as possible
    • Strong applicants include an implementation plan and timeline (hint!)
    • Refer to and link to other projects or products that illustrate your ideas
    • Identify possible hurdles and questions that will require more research/planning
  • What can you bring to the team?
    • Are you enthusiastic?

If you are selected

  • You be working with a small, active group of programmers that also speak biology
  • You will be gaining experience in a rapidly evolving field that interfaces computer and biological sciences
  • You might make more that you would mowing lawns!

Resources

Communication

  • Email: apico[AT]gladstone.ucsf.edu - contact me to find out more about a project or your potential mentor(s).
  • Discussion mailing lists:cytoscape wikipathways - ask about our projects; join the community!
  • Start your own blog!
  • GSoC Planet Blogs

For Students

For Mentors

Pages from prior years


Overview of Ideas

As we are prototyping new features and functions for GenMAPP, Cytoscape, WikiPathways and PathVisio we are exploring a number of areas ideal for Google Summer of Code students. These projects include a broad set of skills, technologies and domains, such as Java GUIs, database integration, algorithms and wikis. Of course, you are also encouraged to propose your own ideas related to our projects. If you have solid CS skills and have an interest in the biological domain (do you think genes are cool?), then you should apply!

IDEA 1: Original Idea

Feel free to propose your own idea. As long as it relates to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers, but make sure your proposal is also relevant.

IDEA 2: Defining cellular location in a pathway

Goal: Implement a feature in PathVisio that allows users to specify cellular location of a pathway entity.

In the current WikiPathways pathways, cellular locations are usually illustrated as a rectangle or ellipse that define the boundaries of the location, in combination with a label that defines the name of the location (see 'mitochondrion' in Apoptosis Pathway for an example). Visually, it is perfectly clear that the genes within that boundary are located in the corresponding cellular location. However, computationally, it's hard to derive this, unless you would stored the location for each of the genes that are within the boundaries. This information can be stored in GPML, but there is no user interface to do that. A user interface would allow users to set the cellular location for each pathway object. It would also be cool to have some kind of cellular location drawing tool. A way this could work for the user: you draw a rectangle by dragging your mouse, all genes within that rectangle will highlight, you release the mouse button and a dialog pops up where you have to choose the cellular location. The end result would look the same as the current shape/label approach, but now the cellular-location is automatically stored as GPML attribute for all including genes. An extra could be that you can choose the cellular locations from an existing ontology, like Gene Ontology and that you could easily change the location's boundaries to include or exclude genes.

Language and Skills: Java, web services

Idea by: Thomas Kelder

Potential Mentors: Thomas Kelder, Martijn Van Iersel, Alexander Pico, sign up here

IDEA 3: Ontologies for Wiki Pathways

Goal: Enable annotation of WikiPathways pathways with ontology terms.

Organizing and categorizing pathways is important to help finding relevant knowledge for a given topic. It can also help to identify related pathways or duplicate information that can be merged. Especially on a WikiPathways, where users are free to create context-specific pathways (e.g. focused at a single tissue, cell-line or experiment), it is important to provide information about this context. Several biological ontologies exist that define a vocabulary of biological context that can be used for pathways. The goal of this project is to implement code that allows WikiPathways users to annotate a pathway with one or more ontology terms. To do this, you will need to use web services from ontology providers such as bioportal to get the ontology information and present this to the user in a nice web interface. More information on this topic can be found here.

Language and Skills: Php, JavaScript (Java should also be possible by using Google Web Toolkit), web services

Idea by: Alexander Pico

Potential Mentors: Thomas Kelder, Martijn Van Iersel, Alexander Pico, sign up here

IDEA 4: GPML-SBML Converter

Goal: Create a converter that can convert pathways between SBML and GPML.

SBML is a format for representing models of biochemical reaction networks. It is used by computational biologists to represent models for running numerical simulations on biological pathways. GPML is the format used by PathVisio and WikiPathways to store biological pathway diagrams.

To allow computational biologists to create a model based directly on a pathway from WikiPathways, it would be useful to be able to export to SBML. There is also a large number of models available in SBML that could be an excellent start for a WikiPathways pathway that can be used in genomics analyses. To make this possible, we need a tool that converts between SBML and GPML pathways.

Resources:

Language and Skills: Java, XML

Idea by: Thomas Kelder

Potential Mentors: Thomas Kelder, Martijn Van Iersel, Alexander Pico, sign up here

IDEA 5: Superpathways in Cytoscape

Goal: Create a Cytoscape plugin that merges multiple pathways into a single Cytoscape network

Biological pathways are highly interconnected. To find out how different processes influence each other, it can be useful to visualize the connectivity between pathways by merging them into a single superpathway. Cytoscape would be an ideal tool for such a visualization, since it has interactive layouts, making it possible to work with a large number of nodes. The goal of this project is to create a Cytoscape plugin that allows you to select multiple pathways from Wiki Pathways and load them in a single network. The plugin needs to merge nodes that represent the same biological entity (e.g. gene, protein or molecule). Existing code already provide some functionality that is needed to make this work:

By reusing as much existing code as possible, it should be possible to create a user friendly tool that allows researchers to combine different pathways on the fly. Features you could think of to make this work are a dialog to search and select pathways, a custom layout that groups nodes from the same pathway and a visualization that indicates the origin pathway of each node (e.g. by color-coding it).

Language and Skills: Java

Idea by: Thomas Kelder

Potential Mentors: Thomas Kelder, Martijn Van Iersel, Alexander Pico, sign up here

Goal: Implement a mechanism to link between different pathways.

The boundaries of a biological pathway are loosely defined, often because of historical reasons or for viewing convenience. In reality, pathways are highly interconnected and form larger networks (for example, see the Boehringer map). In order to keep pathway diagrams human readable, they are often small (<100 nodes) and focused. To capture the connections between pathways, it would be cool to be able to hyperlink to another pathway, just like you can add hyperlinks between websites. This would allow a biologist to browse through different biological processes, just by clicking links on a pathway diagram. The challenge to this project is to create a user-friendly way to define and use these hyperlinks in both PathVisio and WikiPathways.

Language and Skills: Java

Idea by: Martijn Van Iersel

Potential Mentors: Thomas Kelder, Martijn Van Iersel, Alexander Pico, sign up here

IDEA 7: Scientific Karma site

Goal: Make it easier to reward scientists for contributions to wiki's

The problem that scientists face when contributing to wiki's such as WikiPathways, Wikigenes, OpenWetWare, etc, etc (and to a lesser extent wikipedia itself) is that their contributions are not visible scientific output and are not taken into account when their job performance is reviewed.

A first step towards a solution would be a common website where contributions to different online communities can be scored. This is a proposal for a scientific "karma" site. This simply does nothing more than these two things - A site where scientists can register an OpenID account - For each scientist, display a list of wiki's/communities that they have contributed to and a contributors score for that site. - Because contribution scores are not meaningful by themselves, the percentile must be calculated as well

This proposal calls for the creation of a webservice standard for automated querying of contribution scores from different websites. OpenID is chosen because it is a standard distributed user-authentication system. This makes it possible to tie user accounts from multiple websites together.

Language and Skills: PHP, WSDL/SOAP

Idea by: Martijn Van Iersel

Potential Mentors: Thomas Kelder, Martijn Van Iersel, Alexander Pico, sign up here

IDEA 8: Phylogenetic Tree plugin

Phylogenetic tree can be modeled as directed network with some constrains. Edge length should reflect an evolutionary distance between entities such as genes or species. Cytoscape's enhanced graphical abilities can be used to layout a phylogenetic tree, zoom a region, assign colors to groups of nodes and edges, generate publication-quality images, and to perform further studies on nodes of interest. Create a plugin that handles all apects of phylogenetic trees manipulation within Cytoscape:

  • Creation of dendograms
  • Reading dendogram files
  • Rendering a network and displaying a relevant distance bar
  • Applying custom phylogenetic tree layouts

Language and Skills: Java

Idea by: Maital Ashkenazi

Potential Mentors: Maital Ashkenazi, Peng Liang Wang, sign up here

IDEA 9: From Enrichment Maps to Gene Networks

Goal: interface enrichment maps to gene networks.

Description: Gene-set enrichment is commonly used to summarize the results of genomic experiments. We have developed a Cytoscape plug-in, Enrichment Maps, to organize gene-sets enrichment results into networks (click here for a sample); inter-dependent or redundant gene-sets are grouped together, dramatically improving their visualization and exploration. It would be really helpful to map selected gene-sets to interaction networks (e.g. physical interactions, co-expression, pathways) to help better interpret the results; this can be done interfacing Enrichment Maps to existing databases, like Pathway Commons. Another interesting opportunity is to recode the enrichment map as a gene network, while preserving the original topology.

Language and Skills: Java (Cytoscape plug-in development), Matches interests in: information visualization, web-services, database queries.

Idea by: Daniele Merico

Potential Mentors: Ruth Isserlin, Daniele Merico, OliverStueker, sign up here

IDEA 10: Cellular Overview of Pathways

I am thinking of creating a cellular overview of the pathways listed in Wiki Pathways for a given species. It could be used for querying and/or overlaying expression data. Similar views are available with BioCyc (Omics Viewers) and Reactome (SkyPainter). It could also be used for cross-species comparisons.

Language and Skills: PHP, XML

Idea by: PankajJaiswal

Potential Mentors: PankajJaiswal, Alexander Pico, sign up here

IDEA 11: Interactive Pathway Images

It would be nice to have interactive images on the pathway pages of Wiki Pathways. Currently, the images are static and you must enter 'edit' mode before gaining access to the linkouts and properties of the represented GPML. The pathway image itself should support mouseover and context menu content, including linkouts from the objects (e.g., genes, proteins, metabolites) to primary resources, literature references, etc. By using existing technologies, such as OpenLayers and Google Web Toolkit, it should be possible to create a dynamic, web based pathway viewer, in which the user can zoom in/out, and click on proteins or metabolites to view additional annotations.

Language and Skills: PHP, XML, SVG

Idea by: PankajJaiswal

Potential Mentors: PankajJaiswal, sign up here

IDEA 12: GOLayout: Network partitioning and layout driven by GO ontology

We've already made a first pass at developing the GOLayout plugin for Cytoscape. It's basic function is to partition large network hairball into several small subnetworks, each containing genes/proteins associated with a particular biological process. Within each subnetwork, genes/proteins are laid out by cellular compartments and color coded molecular function. The final layout includes graphical annotations for cellular compartments defined by a template file (GPML file). There is still a lot of work to be done to make this useful and really cool. Some ideas include:

  • edge routing: minimizing crossings and going around nodes, etc
  • node placement: layouts for cellular compartment shapes (e.g., nuclear membranes = ovals)
  • expanding GPML template library and develope Path Visio plugin for annotating cellular compartment diagrams (see IDEA 2)

Language and Skills: Java

Idea by: Alexander Pico, Allan Kuchinsky

Potential Mentors: Alexander Pico, Allan Kuchinsky, Kristina Hanspers, sign up here

IDEA 13: Cytoscape 3.0 Editor

Cytoscape has an editor. We need to upgrade it for Cytoscape 3.0 and enhance it with new features:

  • user interface for configuring editor palettes

Language and Skills: Java, Swing

Idea by: Allan Kuchinsky

Potential Mentors: Allan Kuchinsky, Mike Smoot, sign up here

IDEA 14: Good looking nodes in Cytoscape

Apply specular highlight effects to cytoscape nodes to help them pop out of the page:

  • lighting
  • depth
  • texture
  • shadow

Language and Skills: Java

Idea by: Allan Kuchinsky

Potential Mentors: sign up here

IDEA 15: Advanced Network Overviews and Navigation for Cytoscape

  • Venn diagram
  • Tag cloud

These views can be used to search/browse/navigate large networks. They could also be used as figures, providing an alternative representation of your data.

Language and Skills: Java

Idea by: Allan Kuchinsky

Potential Mentors: sign up here

IDEA 16: Virtual Cell with Cytoscape

There is a great deal of interest of exploring how Cytoscape and Virtual Cell could work together more closely. There is the possibility to write a plugin for Cytoscape that would allow the user to augment the network with enough information to run a Virtual Cell simulation. A plugin could also use Cytoscape to load BioPAX data and convert that data to Virtual Cell format.

Language and Skills: Java, XML

Idea by: Scooter Morris

Potential Mentors: Scooter Morris, Peng Liang Wang, sign up here

IDEA 17: Visualize Node properties with NodeRenderers

Cytoscape 3.0's new pluggable rendering framework will allow plugin developers to easily write new NodeRenderers that visualize node properties in interesting ways. Your task would be to write a reusable NodeRenderer plugin that other plugins can re-use.

Some ideas:

  • render a small chart: pie-chart, sparkline, bar chart, histogram, radar chart
  • render a bitmap or an SVG image

(This could possibly be more than one, independent project)

Language and Skills: Java

Idea by: Daniel Abel

Potential Mentors: Daniel Abel, sign up here

IDEA 18: Other presentations

Cytoscape 3.0, with it's modular rendering engine, will support pluggable presentations, i.e. replacing the graph widget used for drawing the network.

Some ideas:

  • animated widget, that can show data according to some parameter, like time or an experimental condition (Either with only the node and edge graphics changing, or, for the more adventurous, the network itself changing, ie. animating the layout, too.)
  • non-graph layout, for example scatterplots (Here the existing widget could be used, with a new layout-calculating code and background graphics.)

Language and Skills: Java

Idea by: Daniel Abel

Potential Mentors: Daniel Abel, sign up here

IDEA 19: Utilize GPU computing in Cytoscape

Modern GPUs are orders of magnitude faster than modern CPUs, if they are running massively parallel programs. Recently, APIs for doing general computation on GPUs are available (CUDA and OpenCL for example). In cytoscape, there are two possible applications of this technology:

  • certain graph layout algorithms, which use a physical, 'springs and weights' model
  • certain graph analysis algorithms (algorithms compute numerical properties of the network)

The project would entail implementing the numerical computation code in CUDA (there are no implementations available yet for OpenCL) and writing a cytoscape plugin that uses it. If needed, ELTE will be able to provide access to an GeForce 8600 GT GPU for testing (this means ssh access, ie. you can log in and compile/run programs, we won't be able to lend you the card).

Language and Skills: CUDA

Idea by: Daniel Abel

Potential Mentors: Daniel Abel, sign up here

IDEA 20: Improve automated test coverage of cytoscape

Automated (unittest, regression test, etc.) test coverage of cytoscape is pretty far from complete. Currently, even a point-release (2.6.2) requires extensive manual testing. Ideally tests would be fully automatic, ie. if a build passes all tests, it is released automatically.

Some ideas:

  • Check test coverage using http://c2.com/cgi/wiki?MutationTesting
  • write automated GUI unittest testcases
  • write benchmarks for testing speed, memory usage
  • write unittests for 3.0 bundles (I'm sure coverage will be sketchy at first)

Note: in all cases, the idea is not to develop a testing tool, but to use an existing one and write testcases (and patches that fix the bugs that are uncovered).

Language and Skills: Java

Idea by: Daniel Abel

Potential Mentors: Mike Smoot, Peng Liang Wang, sign up here

IDEA 21: Processing-Based Visualizer for Cytoscape 3

http://img.skitch.com/20090323-k8hftu52xstaiemcwqcpchbhq2.preview.jpg http://img.skitch.com/20090323-k4w1twdj3g356ngg2q67n67g4f.preview.jpg

Some of the data sets are hard to visualize in current version of Cytoscape. For example:

  • Time series data
  • Multi-layer networks

These can be represented easily if we use 3D space and animation. Processing (http://processing.org/) is an easy-to-use programming environment for visualizing data. It can be used as a library for Cytoscape because it is a Java application. The goal of this project is build a rendering engine for Cytoscape 3 based on Processing. Implementing a full-spec rendering engine is an ultimate goal, but we can devide the problem smaller pieces:

  • Convert Processing to OSGi bundle
  • Implement basic UI for 3D space navigation (pan/zoom/rotate)
  • Port 3D graph layout algorithms
  • Merge it to Cytoscape 3 View framework
  • Time-series data animation
  • Implement more interactive UI (optional)

Example images are available here (these are based on Cytoscape 2.6.x + Procssing 1.0): http://d.hatena.ne.jp/keiono/20081122/1227423548

Language and Skills: Java, Processing, OSGi, Spring, OpenGL

Idea by: Kei Ono

Potential Mentors: Kei Ono, sign up here

IDEA 22: Cytoscape - Taverna 2 Integration

Taverna 2 (http://www.mygrid.org.uk/tools/taverna/taverna-2-0/) is a workflow engine for bioinformatics. It is useful if we can use Cytoscape 3 as a part of Taverna workflow. In addition, if we can integrate Cytoscape to tavena workflow, we can use Taverna's functionality to communicate to other applications including R (http://www.r-project.org/). The goal of this project is develop a framework to use Cytoscae 3 in Taverna and build some example workflows.

Language and Skills: Java, OSGi, Spring, Web Service basics (JAX-WS)

Idea by: Kei Ono

Potential Mentors: Kei Ono, sign up here

IDEA 23: Automatic (Smart) Node and Edge Label Layout

Cytoscape currently has a large number of layout algorithms that place nodes according to various criteria. However, a common problem with all of these layout algorithms is that node and edge labels are not accounted for in the aesthetic criteria of the algorithm. This means that labels frequently end up in awkward locations which are hard to read, overlap or obscure other labels, and otherwise don't look quite right. As a consequence, users are frequently forced to adjust the position of labels s, which is a time consuming and tedious process. To fix this, we propose developing a layout algorithm for labels. Perhaps this algorithm could be integrated with a normal layout algorithm, or perhaps it could be a subsequent step that lays the labels out in an intelligent fashion once the nodes have been placed. Some work has been done on this, but more layouts are needed.

Language and Skills: Java, graph layout algorithms

Idea by: Mike Smoot

Potential Mentors: sign up here

IDEA 24: Pythonic API for scripting and interactive use

Cytoscape supports several scripting languages using general 'access java from Python/Ruby/etc' methods. This allows one to access Cytoscape functionality from the given scripting language, however it has a big drawback: since the API is not modified, one has to, for example, use a java API from python. This is unfortunate since it doesn't allow users to take full advantage of the given scripting language.

Your task would be to write a pythonic wrapper API on top of the auto-translated 'java API in python', to wrap cytoscape 3.0's API. This pythonic API should be easy to use

  • interactively, from the python console
  • in a batched manner (i.e. for writing simple plugins in python)

Note that the problem of 'java API in Foo' is general. I (Daniel Abel) might be able to mentor a python project, others might mentor similar projects for other languages (Ruby, etc.)

Language and Skills: python (maybe a bit of Java)

Idea by: Daniel Abel

Potential Mentors: Daniel Abel, sign up here

IDEA 25: Animated Networks

Add the ability to animate networks to Cytoscape. This animation could be limited to the visual properties, but (if possible) the spatial and content properties of the network should be considered (e.g. animate between two layouts, animate a pathway between two species to highlight new components or connections).

Language and Skills: Java

Idea by: Scooter Morris

Potential Mentors: Scooter Morris, sign up here

IDEA 26: Integrated data mining in Cytoscape 3.0

As the size and complexity of available biological networks rapidly grows, their navigation and interrogation becomes more challenging. Cytoscape offers several data mining tools for locating nodes and edges of interest. Create a single Data Mining bundle that integrates these tools. Search can be based on Apache Lucene search library. The Data Mining bundle will offer:

  • A graphical interface for searching on single or multiple attribute fields using Boolean logic
  • Quick queries for advanced users using a query language
  • Filtering out (hiding) nodes and edges that match a search
  • Search by topology, e.g. find all nodes connected by two edges or more.

Language and Skills: Java, Lucene, Swing

Idea by: Maital Ashkenazi

Potential Mentors: Maital Ashkenazi, Peng Liang Wang, sign up here

IDEA 27: Database-Backend for Cytoscape 3

Although memory modules are getting cheaper and cheaper, still large schale network data with tons of annotations is intractable on memory. To handle such omic-level datasets, Cytoscape needs a mechanism to make a roundtirp between on-memory data and database backend. This includes:

  • Searching and extraction of huge data sets in database
  • Seamless synchlonization of on-memory data and its database backend. This means once user edit some data in Cytpscape, it should be propagated to the backend automatically.

Language and Skills: Java

Idea by: Kei Ono

Potential Mentors: Kei Ono, sign up here

Last modified 7 years ago Last modified on May 22, 2009 12:07:21 PM

Attachments (1)

Download all attachments as: .zip