Hatena::Groupbioruby

"aac".translate #=> "N" このページをアンテナに追加 RSSフィード

2007-04-11

GO Database on ActiveRecord plugin released.

|  GO Database on ActiveRecord plugin released. - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  GO Database on ActiveRecord plugin released. - "aac".translate #=> "N"  GO Database on ActiveRecord plugin released. - "aac".translate #=> "N" のブックマークコメント

I released GO Database on ActiveRecord, an active record models for the GO Database at the BioRuby Rails Plugins. The GO Database is the backend RDB for Amigo, the gene ontology broswer. The plugin works as an alternative SQL query interface for the GO Database.

screenshot screenshot screenshot screenshot

To install the go_database Rails plugin, on a rails application directory,

% script/plugin source svn://rubyforge.org/var/svn/bioruby-annex/rails/plugins/
% script/plugin list
% script/plugin install go_database

.

RDoc manual is available.

% cd vendor/plugins/go_database
% rake rdoc

After that, open rdoc/index.html.

トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20070411

2006-06-21GO Database on ActiveRecord

New Term Model

|  New Term Model - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  New Term Model - "aac".translate #=> "N"  New Term Model - "aac".translate #=> "N" のブックマークコメント

screenshot


term = Term.find_by_name('peroxisome')

term.parents.first.class #=> Term
term.parents.map {|t| t.name } #=> ["microbody"]

term.children.first.class #=> Term
term.children.map {|t| t.name } 
#=> ["peroxisomal membrane", "peroxisome targeting signal ...

term.ancestors.first.class #=> Term
term.ancestors.map {|t| t.name }
#=> ["peroxisome", "microbody", "cytoplasm", ...

term.descendants.first.class #=> Term
term.descendants.map {|t| t.name }
#=> ["peroxisome", "peroxisomal membrane", "integral ...

screenshot

require 'rubygems'
require 'active_record'

ActiveRecord::Base.pluralize_table_names = false

class Dbxref < ActiveRecord::Base
  has_many :term, :through => :term_dbxref
  has_many :term_dbxref
end

class GraphPath < ActiveRecord::Base
  belongs_to :ancestor,  :foreign_key => 'term1_id', :class_name => 'Term'
  belongs_to :descendant,:foreign_key => 'term2_id', :class_name => 'Term'
  has_many :graph_path2term
end

class GraphPath2term < ActiveRecord::Base
  set_table_name 'graph_path2term'
  belongs_to :term
  belongs_to :graph_path
end

class Term < ActiveRecord::Base
  has_many :dbxref, :through => :term_dbxref
  has_many :term_dbxref

  has_many :term2term_as_parent, :foreign_key => 'term1_id', :class_name => 'Term2term'
  has_many :term2term_as_child,  :foreign_key => 'term2_id', :class_name => 'Term2term'
  has_many :parents,    :through => :term2term_as_child
  has_many :children,   :through => :term2term_as_parent

  has_many :graph_path2term

  has_many :graph_path_as_ancestor,   :foreign_key => 'term1_id', :class_name => 'GraphPath'
  has_many :graph_path_as_descendant, :foreign_key => 'term2_id', :class_name => 'GraphPath'
  has_many :ancestors,   :through => :graph_path_as_descendant
  has_many :descendants, :through => :graph_path_as_ancestor
end

class TermDbxref < ActiveRecord::Base
  belongs_to :term
  belongs_to :dbxref
end

class Term2term < ActiveRecord::Base
  set_table_name 'term2term'
  belongs_to :term,   :foreign_key => 'term1_id'
  belongs_to :term,   :foreign_key => 'term2_id'
  belongs_to :parent, :foreign_key => 'term1_id', :class_name => 'Term'
  belongs_to :child,  :foreign_key => 'term2_id', :class_name => 'Term'
end

Unit Test Code for Term Model: test_term.rb

|  Unit Test Code for Term Model: test_term.rb - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Unit Test Code for Term Model: test_term.rb - "aac".translate #=> "N"  Unit Test Code for Term Model: test_term.rb - "aac".translate #=> "N" のブックマークコメント

screenshot


require 'logger'
ActiveRecord::Base.logger = Logger.new('debug.log')
ActiveRecord::Base.establish_connection(:adapter => 'mysql', 
                                        :host => 'localhost', 
                                        :username => 'root', 
                                        :database => 'go')
if __FILE__ == $0
require 'test/unit'

class TestTerm2term < Test::Unit::TestCase
  def test_term2term_find_1
    tt = Term2term.find(1)
    assert_equal(1, tt.id)
    assert_equal(2, tt.relationship_type_id)
    assert_equal(0, tt.complete)
    assert_equal(10, tt.term2_id)
    assert_equal(11, tt.term1_id)
  end
end

class TestTerm < Test::Unit::TestCase
  def test_term_find_1
    term = Term.find(1)
    assert_equal('all', term.name)
  end
  
  def test_term_find_by_name_peroxisome
    term = Term.find_by_name("peroxisome")
    assert_equal("peroxisome", term.name)
    assert_equal("GO:0005777", term.acc)
    assert_equal("cellular_component", term.term_type)
    assert_equal(0, term.is_obsolete)
    assert_equal(2973,  term.id)
    assert_equal(0, term.is_root)
  end

  def test_term_parents
    term = Term.find_by_name('peroxisome')
    assert_equal(["microbody"], term.parents.map {|x| x.name })
  end

  def test_term_children
    term = Term.find_by_name("peroxisome")
    ary = ["peroxisomal membrane", 
           "peroxisome targeting signal receptor complex", 
           "peroxisomal matrix", 
           "P1 peroxisome", 
           "P2 peroxisome", 
           "P3 peroxisome", 
           "P4 peroxisome", 
           "P5 peroxisome", 
           "P6 peroxisome", 
           "peroxisomal lumen", 
           "mannosome"]
    assert_equal(ary, term.children.map {|x| x.name })
  end
end

class TestTermFindGraphPath < Test::Unit::TestCase
  def setup
    @term = Term.find_by_name("peroxisome")
  end

  def test_find_ancestors
    a = @term.ancestors.map {|x| x.name }
    ary = ["peroxisome", 
           "microbody", 
           "cytoplasm", 
           "intracellular", 
           "cell", 
           "cellular_component", 
           "all", 
           "intracellular membrane-bound organelle", 
           "intracellular organelle",
           "intracellular", 
           "cell", 
           "cellular_component", 
           "all", "organelle", 
           "cellular_component", 
           "all", "membrane-bound organelle", 
           "organelle", 
           "cellular_component", 
           "all"]
    assert_equal(ary, a)
  end

  def test_find_every_descendants
    d = @term.descendants.map {|x| x.name }
    ary = ["peroxisome", 
           "peroxisomal membrane", 
           "integral to peroxisomal membrane", 
           "extrinsic to intraperoxisomal membrane", 
           "peroxisome targeting signal receptor complex",
           "peroxisomal matrix", "P1 peroxisome", 
           "P2 peroxisome", 
           "P3 peroxisome", 
           "P4 peroxisome", 
           "P5 peroxisome", 
           "P6 peroxisome",
           "intrinsic to peroxisomal membrane",
           "peroxisomal lumen", 
           "mannosome"]
    assert_equal(ary, d)
  end

  def test_true_path
    term2 = Term.find_by_name('P4 peroxisome')
    assert((@term.descendants.include?(term2) or term2.descendants.include?(@term)))
  end

  def test_find_shared_parent_without_distance
    go8045 = Term.find_by_acc('GO:0008045')
    go7474 = Term.find_by_acc('GO:0007474')

    a = go8045.ancestors
    b = go7474.ancestors
    cap = a & b
    ary = ["biological_process", "all", "development", "morphogenesis"]
    assert_equal(ary, cap.map {|x| x.name  })
  end
end

% ruby test_term.rb
Loaded suite test_term
Started
.........
Finished in 0.051068 seconds.

9 tests, 18 assertions, 0 failures, 0 errors

% rlwrap ruby -r test_term.rb
p Term.find(1)
#<Term:0x6b5514 @attributes={"name"=>"all", "acc"=>"all", 
"term_type"=>"universal", "is_obsolete"=>"0", "id"=>"1", "is_root"=>"1"}>
トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20060621

2006-06-20GO Database on ActiveRecord

New GeneProduct Model

|  New GeneProduct Model - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  New GeneProduct Model - "aac".translate #=> "N"  New GeneProduct Model - "aac".translate #=> "N" のブックマークコメント

screenshotscreenshot

require 'rubygems'
require 'active_record'

ActiveRecord::Base.pluralize_table_names = false

class Association < ActiveRecord::Base
  belongs_to :term
  belongs_to :gene_product
  has_many :evidence
end

class Dbxref < ActiveRecord::Base
  has_many :seq, :through => :seq_dbxref
  has_many :seq_dbxref
  has_many :term, :through => :term_dbxref
  has_many :term_dbxref
  has_many :evidence, :through => :evidence_dbxref
  has_many :evidence_dbxref
end

class Evidence < ActiveRecord::Base
  belongs_to :association
  belongs_to :dbxref
end

class EvidenceDbxref < ActiveRecord::Base
  belongs_to :evidence
  belongs_to :dbxref
end

class GeneProduct < ActiveRecord::Base
  belongs_to :dbxref
  belongs_to :species
  has_many :seq, :through => :gene_product_seq
  has_many :gene_product_seq
  has_many :term, :through => :association
  has_many :association
end

class GeneProductSeq < ActiveRecord::Base
  belongs_to :seq
  belongs_to :gene_product
end

class Seq < ActiveRecord::Base
  has_many :gene_product, :through => :gene_product_seq
  has_many :gene_product_seq
  has_many :dbxref, :through => :seq_dbxref
  has_many :seq_dbxref  
end

class Species < ActiveRecord::Base
  has_one :gene_product
end

class Term < ActiveRecord::Base
  has_many :gene_product, :through => :association
  has_many :association
  has_many :dbxref, :through => :term_dbxref
  has_many :term_dbxref
end

class TermDbxref < ActiveRecord::Base
  belongs_to :term
  belongs_to :dbxref
end

Example codes for the new GeneProduct model

|  Example codes for the new GeneProduct model - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Example codes for the new GeneProduct model - "aac".translate #=> "N"  Example codes for the new GeneProduct model - "aac".translate #=> "N" のブックマークコメント

screenshot

require 'pp'
require 'logger'
ActiveRecord::Base.logger = Logger.new("debug.log")

ActiveRecord::Base.establish_connection(:adapter => 'mysql', 
                                        :host => 'localhost', 
                                        :username => 'root', 
                                        :database => 'go')

pp GeneProduct.find(:first, :limit => 1)

# Don't exec GeneProduct.find(:first, :include => :seq)
pp GeneProduct.find(:first, :limit => 1, :include => :seq)

pp GeneProduct.find(:first, :limit => 1, :include => :term)

pp GeneProduct.find(:first, :limit => 1,
                    :include => :term, 
                    :conditions => ['term.name = ?', 'nucleus'])

pp GeneProduct.find(:first, :limit => 1,
                    :include => :dbxref,
                    :conditions => ['dbxref.xref_dbname = ?', 'MGI'])

pp GeneProduct.find(:first, :limit => 1,
                    :include => {:association => :evidence})

pp GeneProduct.find(:first, :limit => 1,
                    :include => {:association => :evidence},
                    :conditions => ['evidence.code = ?', 'TAS'])

pp GeneProduct.find(:first, :limit => 1,
                    :include => [{:association => :evidence}, :seq, :dbxref, :term],
                    :conditions => ['evidence.code = ? AND dbxref.xref_dbname = ? AND term.name = ?', 
                                    'TAS', 'MGI', 'nucleus' ])

pp GeneProduct.find(:all
                    :include => [{:association => :evidence}, :seq, :dbxref, :term],
                    :conditions => ['evidence.code != ? AND evidence.code != ? AND dbxref.xref_dbname = ? AND term.name = ?', 
                                    'IEA', 'ISS', 'FB', 'transmembrane receptor activity' ])

トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20060620

2006-06-16GO Database on ActiveRecord

GO Database Example Queries #1

|  GO Database Example Queries #1 - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  GO Database Example Queries #1 - "aac".translate #=> "N"  GO Database Example Queries #1 - "aac".translate #=> "N" のブックマークコメント

screenshotscreenshot

require 'rubygems'
require 'active_record'
require 'active_support'

module Bio

class GODatabase

  # Bio::GODatabase.connection(:adapter => 'mysql', :host => 'localhost', 
  #                            :username => 'root', :password => '', :database => 'go')
  def self.connection(args)
    ActiveRecord::Base.establish_connection(args)
  end

  def self.con
    self.connection(:adapter => 'mysql', :host => 'localhost', 
                    :username => 'root', :password => '', :database => 'go')
  end


  # condition generator
  # set_conditoin("term", {:name => 'cell'})
  # set_conditoin("term", "GO:0005777")
  # set_conditoin("term", aBioGODatabaseTerm)
  def self.set_condition(left, right)
    case right
    when /^GO:/
      condition1 = "#{left}.acc = '#{right}'"
    when Hash
      condition1 = "#{left}.name = '#{right[:name]}'"
    when Bio::GODatabase::Term
      condition1 = "#{left}.acc = '#{right[:acc]}'"
    else
      raise ArgumentError
    end
  end


  # queries
  # Finding all the children of a term
  # 
  # Find the child terms of blood coagulation:
  #
  # SELECT child.*
  # FROM term AS parent,
  #      term2term,
  #      term AS child
  # WHERE
  #      parent.name = 'blood coagulation' AND
  #      parent.id = term2term.term1_id AND
  #      child.id  = term2term.term2_id
  #      
  #  Bio::GODatabase.find_all_children('GO:0005777')
  #  Bio::GODatabase.find_all_children(:name => 'peroxisome')
  #  Bio::GODatabase.find_all_children(Term.find(1))
  def self.find_all_children(term)
    sql = ['SELECT', 'child.*', 
           'FROM', ['term AS parent', 
                    'term2term',
                    'term AS child'].join(', '),
           'WHERE', [self.set_condition("parent", term), 
                     "parent.id = term2term.term1_id",
                     "child.id  = term2term.term2_id"].join(' AND '),
           ';'].join(' ')
    Term.find_by_sql(sql)
  end


  # Fetching every descendent of blood coagulation
  # 
  # The GO Graph is stored using the "term" (=node) and "term2term" (=arc) tables. 
  # Queries such as the above present a problem in that they are *recursive* - it 
  # is impossible to do them in a single query using the two tables above. 
  # Fortunately GO also includes a layer of *denormalization* which includes a 
  # table called "graph_path" (see GO Schema Documentation ), which stores all 
  # paths between a term and all its ancestors.
  #
  # SELECT
  #   rchild.* 
  # FROM 
  #   term AS rchild, term AS ancestor, graph_path
  # WHERE
  #   graph_path.term2_id = rchild.id and 
  #   graph_path.term1_id = ancestor.id and 
  #   ancestor.name = 'blood coagulation'
  #      
  # find_every_descendent('GO:0008045')
  # find_every_descendent(:name => 'blood cagulation')
  # find_every_descendent(Term.find(2345))
  def self.find_every_descendent(term)
    sql = ["SELECT", "rchild.* ",
           "FROM", ["term AS rchild",
                    "term AS ancestor",
                    "graph_path"].join(', '),
           "WHERE", ["graph_path.term2_id = rchild.id",
                     "graph_path.term1_id = ancestor.id",
                     self.set_condition("ancestor", term)].join(' AND '),
           ';'  ].join(' ')
    Term.find_by_sql(sql)
  end

  # Finding the ancestors of 'DNA Helicase activity'
  #
  # fetching ancestors is similar to finding descendants. try this for the 
  # ancestors of DNA Helicase:
  #
  # SELECT p.*
  # FROM
  #  graph_path 
  #   INNER JOIN
  #  term AS t ON (t.id = graph_path.term2_id)
  #   INNER JOIN
  #  term AS p ON (p.id = graph_path.term1_id)
  # WHERE t.name = 'DNA Helicase activity';
  # this is the reflexive closure so it includes DNA Helicase itself - to get 
  # the non-reflexive closure (ie to filter out the central term) add the 
  # constraint graph_path.distance > 0
  #
  # self.find_ancestors("GO:0008045")
  # self.find_ancestors(:name => 'DNA Helicase activity')
  # self.find_ancestors(Term.find(8045))
  def self.find_ancestors(term)
    sql = ["SELECT", "p.*",
           "FROM", "graph_path",
           "INNER JOIN", "term AS t ON (t.id = graph_path.term2_id)",
           "INNER JOIN", "term AS p ON (p.id = graph_path.term1_id)",
           "WHERE", self.set_condition("t", term), 
           ';'].join(' ')
    Term.find_by_sql(sql)
  end



  # Finding ancestor terms and their relationships
  #
  # The previous query showed how to find all the ancestor terms for a term; 
  # what if you want to find the ancestor terms, and the relationships for 
  # those terms? This is useful for getting a sub-graph from the whole DAG.
  #
  # SELECT p.*,r.*
  # FROM
  #  graph_path 
  #   INNER JOIN
  #  term AS t ON (t.id = graph_path.term2_id)
  #   INNER JOIN
  #  term AS p ON (p.id = graph_path.term1_id)
  #   INNER JOIN
  #  term2term AS r ON (r.term2_id = p.id)
  # WHERE t.name = 'DNA Helicase activity';
  #
  # This fetches all ancestor terms, and the relationship which have each 
  # ancestor term as a child
  #
  # Note that this will not fetch the root term, because the root term is not 
  # the child of any other term. It will fetch relationships which have the 
  # root as parent. If you also want to fetch the root term, change the last 
  # join to a LEFT OUTER JOIN
  def self.find_ancestor_terms_with_relationships(term)
    sql = ['SELECT', 'p.*',
           'FROM', 'graph_path',
           'INNER JOIN', 'term AS t ON (t.id = graph_path.term2_id)',
           'INNER JOIN', 'term AS p ON (p.id = graph_path.term1_id)',
           "WHERE",  self.set_condition("t", term), 
           ";"].join(' ')
    Term.find_by_sql(sql)
  end

  # Gene products annotated to two terms
  #
  # See also: TRANSITIVE CORRELATIONS
  #
  # Say we want to find the total number pf gene products that are BOTH GTP 
  # binding (GO:0005525) and immune response (GO:0006955)
  #
  # SELECT 
  #   count(DISTINCT a1.gene_product_id)
  # FROM 
  #   term AS t1
  #   INNER JOIN graph_path AS p1 ON (t1.id=p1.term1_id)
  #   INNER JOIN association AS a1 ON (a1.term_id=p1.term2_id)
  #   INNER JOIN term AS t2       ON (t2.id=p2.term1_id)
  #   INNER JOIN graph_path AS p2 ON (a2.term_id=p2.term2_id)
  #   INNER JOIN association AS a2 ON (a2.gene_product_id=a1.gene_product_id)
  # WHERE  t1.acc = 'GO:0005525'  AND  t2.acc = 'GO:0006955'
  #      
  # This query is not generalisable to n terms; fortunately there is another way 
  # of doing things (this version doesn't use the graph):
  #
  # SELECT gene_product.id, count(distinct term_id) AS c
  #   FROM       gene_product
  #   INNER JOIN association ON association.gene_product_id=gene_product.id
  #   INNER JOIN term        ON association.term_id=term.id
  #   WHERE term.name='immune response'
  #   OR    term.name='interleukin-4 receptor binding'
  #   GROUP BY gene_product.id
  #   HAVING c=2;
  #      
  def self.sql_count_gene_products_annotated_to_two_terms(term0, term1, counts = 2)
    ["SELECT", ["gene_product.*", 
                "count(distinct term_id) AS c"].join(', '),
     "FROM", "gene_product",
     "INNER JOIN association ON association.gene_product_id=gene_product.id",
     "INNER JOIN term        ON association.term_id=term.id",
     "WHERE", [self.set_condition("term", term0), # term.name = 'immune response',
               self.set_condition("term", term1), # term.name = 'interleukin-4 receptor binding',
      ].join(' OR '),
     "GROUP BY gene_product.id",
     "HAVING c=#{counts}", 
     ";"].join(' ')
  end

  def self.count_gene_products_annotated_to_two_terms(term0, term1, counts = 2)
    GeneProduct.find_by_sql(self.sql_count_gene_products_annotated_to_two_terms(term0, term1, counts = 2))
  end

  # Finding the shared parent of two nodes
  #
  # Imagine we want to find the minimal common parent of GO:0008045 (motor axon 
  # guidance) and GO:0007474 (wing vein specification).
  #
  # SELECT tp.*,
  #   p1.distance+p2.distance AS total_distance,
  #   p1.distance AS d1,
  #   p2.distance AS d2
  # FROM term AS t1
  #   INNER JOIN graph_path AS p1 ON (t1.id=p1.term2_id)
  #   INNER JOIN term AS tp       ON (p1.term1_id=tp.id)
  #   INNER JOIN graph_path AS p2 ON (tp.id=p2.term1_id)
  #   INNER JOIN term AS t2       ON (t2.id=p2.term2_id)
  # WHERE  t1.acc = 'GO:0008045'  AND  t2.acc = 'GO:0007474'
  # ORDER BY total_distance
  #      
  # The total distance is defined as the sum of the distance between the first 
  # term and the common parent and the second term and the common parent. Note 
  # that there may be multiple paths between a child and a parent.
  #
  def self.sql_find_shared_parent(term1, term2)
    ["SELECT DISTINCT", ["tp.*",
                         "p1.distance+p2.distance AS total_distance",
                         "p1.distance AS d1",
                         "p2.distance AS d2"].join(', '),
     "FROM", "term AS t1",
     "INNER JOIN graph_path AS p1 ON (t1.id=p1.term2_id)",
     "INNER JOIN term AS tp       ON (p1.term1_id=tp.id)",
     "INNER JOIN graph_path AS p2 ON (tp.id=p2.term1_id)",
     "INNER JOIN term AS t2       ON (t2.id=p2.term2_id)",
     "WHERE", [self.set_condition("t1", term1), # t1.acc = '#{term0}'  ,
               self.set_condition("t2", term2), #t2.acc = '#{term1}'",
      ].join(' AND '),
     "ORDER BY total_distance", 
     ";"].join(' ')
  end

  def self.find_shared_parent(term1, term2)
    sql = self.sql_find_shared_parent(term1, term2)
    Term.find_by_sql(sql)
  end




  # Finding the distance between two nodes in the graph
  #
  # Imagine we want to find the distance between GO:0006629 (lipid metabolism) 
  # and GO:0007165 (signal transduction)
  #
  # The graph_path table only stores the distance between nodes that have 
  # recursive parent/child relationships, not siblings (or cousins!). But we 
  # can still do this with a little cunning:
  #
  # SELECT 
  #   min(graph_path1.distance + graph_path2.distance) AS dist
  # FROM 
  #   graph_path AS graph_path1, graph_path AS graph_path2, 
  #   term AS t1, term AS t2
  # WHERE
  #   t1.acc = 'GO:0007165' and t2.acc = 'GO:0006629' and graph_path1.term2_id = t1.id
  #   and graph_path2.term2_id = t2.id and graph_path1.term1_id = graph_path2.term1_id
  #      
  def self.sql_find_distance(term1, term2)
    ["SELECT",  "min(graph_path1.distance + graph_path2.distance) AS dist",
     "FROM", ["graph_path AS graph_path1", 
              "graph_path AS graph_path2",
              "term AS t1", 
              "term AS t2"].join(', '),
     "WHERE", [self.set_condition('t1', term1), # "t1.acc = '#{term0}'",  
               self.set_condition('t2', term2), # "t2.acc = '#{term1}'",
               "graph_path1.term2_id = t1.id",
               # "t1.acc = 'GO:0007165' and t2.acc = 'GO:0006629' and graph_path1.term2_id = t1.id",`
               "graph_path2.term2_id = t2.id",
               "graph_path1.term1_id = graph_path2.term1_id"].join(' AND '),
     ';'].join(' ')
  end

  def self.find_distance(term1, term2)
    Term.find_by_sql(self.sql_find_distance(term1, term2))
  end


      
  # Finding every fly gene product that has been annotated as some kind of 
  # transmembrane receptor by non-sequence based methods
  #
  # SELECT DISTINCT 
  #   gene_product.*, 
  #   dbxref.xref_key AS acc, 
  #   dbxref.xref_dbname AS speciesdb 
  # FROM 
  #   gene_product, dbxref, association, graph_path, evidence, term
  # WHERE
  #   evidence.association_id = association.id and 
  #   code!='IEA' and code!='ISS' and 
  #   gene_product.dbxref_id = dbxref.id and 
  #   association.gene_product_id = gene_product.id 
  #   and graph_path.term2_id = association.term_id 
  #   and graph_path.term1_id = term.id and
  #   dbxref.xref_dbname='FB' and
  #   term.name = 'transmembrane receptor activity'
  #      
  # See also the ::GO::Model::GeneProduct class which represents instances of GeneProducts
  #
  # Bio::GODatabase.find_products(:acc => "GO:0005777", :evcodes => ['!IEA', '!ISS'])
  # Bio::GODatabase.find_products(:acc => "GO:0005777", :dbname => 'FB')
  # Bio::GODatabase.find_products(:acc => "GO:0005777", :dbname => ['FB', 'SGD'])
  def self.find_products(args)
    sql = self.sql_find_products(args)
    GeneProduct.find_by_sql(sql)
  end

  # sql = Bio::GODatabase.sql_find_products(:acc => "GO:0005777", :evcodes => ['!IEA', '!ISS'])
  # sql = Bio::GODatabase.sql_find_products(:acc => "GO:0005777", :dbname => 'FB')
  # sql = Bio::GODatabase.sql_find_products(:acc => "GO:0005777", :dbname => ['FB', 'SGD'])
  def self.sql_find_products(hash)
    excodes_condition = nil
    dbxref_condition = nil
    species_condition = nil
    term_condition = nil

    # evcodes => ["!IEA", "!ISS"] #=> "code != 'IEA' AND code != 'ISS'"
    # evcodes => ["TAS", "IDE"]   #=> "code IN ('TAS', 'IDE')"
    # evcodes => ["!IEA", "ISS"]  #=> ArgumentError
    if evcodes = hash[:evcodes]
      if evcodes.select {|x| x =~ /^\!/ }.size == evcodes.size
        evcodes_condition = evcodes.map {|code| 
          "code != '#{code.sub(/\!/, '')}'" }.join(" AND ")
      elsif evcodes.select {|x| x =~ /^\!/ }.size == 0
        evcodes_condition = [
          "code", "IN", "(",
          evcodes.map {|code| "'#{code}'" }.join(", "),
          ")"].join(' ')
      else
        raise ArgumentError
      end
    end

    # :taxid => 7227         #=> "species.ncbi_taxa_id IN ('7227')"
    # :taxid => [7227, 7228] #=> "species.ncbi_taxa_id IN ('7227', '7227')"
    if taxid = hash[:taxid] # ncbi_taxa_id, taxid => 7227
      taxids = [taxid].flatten
      species_condition = [
        "species.ncbi_taxa_id", "IN", "(",
        taxids.map {|taxid| "'#{taxid}'" }.join(", "),
        ")"].join(' ')
    end

    # :dbname => 'FB'          #=> "dbxref.xref_dbname IN ('FB')"
    # :dbname => ['FB', 'SGD'] #=> "dbxref.xref_dbname IN ('FB', 'SGD')"
    if dbname = hash[:dbname] # product_dbname,  speciesdb => 'FB'
      dbnames = [dbname].flatten
      dbxref_condition = [
        "dbxref.xref_dbname", "IN", "(",
        dbnames.map {|dbname| "'#{dbname}'" }.join(', '), 
        ")" ].join(' ')
    end

    # term_name || term_acc # term => "transmembrane recepor"
    if hash[:acc] || hash[:name]
      term =  hash[:acc] || {:name => hash[:name]}
      term_condition = self.set_condition("term", term)
    end

    ["SELECT DISTINCT", ["gene_product.*", 
                         "dbxref.xref_key AS acc",
                         "dbxref.xref_dbname AS speciesdb"].join(', '),
     "FROM", ["gene_product", 
              "dbxref", 
              "association", 
              "graph_path", 
              "evidence", "term"].join(', '),
     "WHERE", ["evidence.association_id = association.id",
               evcodes_condition,    # code!='IEA' and code!='ISS' 
               "gene_product.dbxref_id = dbxref.id",
               "association.gene_product_id = gene_product.id",
               "graph_path.term2_id = association.term_id",
               "graph_path.term1_id = term.id",
               dbxref_condition,     # dbxref.xref_dbname='FB' 
               species_condition,
               term_condition        #   term.name = 'transmembrane receptor activity'
      ].compact.join(' AND '),
     ';'
    ].join(' ')
  end



  # Fetching sequences for a GO term
  #
  # This finds sequences for gene products associated to inner membrane (GO:0019866) 
  # or any of its children:
  #
  # SELECT *
  # From gene_product
  #     Inner JOIN dbxref ON (gene_product.dbxref_id = dbxref.id)
  #     Inner JOIN association ON (gene_product.id = association.gene_product_id)
  #     Inner JOIN graph_path ON (association.term_id = graph_path.term2_id)
  #     Inner JOIN term ON (graph_path.term1_id = term.id)
  #     INNER JOIN term AS associated_term ON (graph_path.term2_id = associated_term.id)
  #     Inner JOIN gene_product_seq ON (gene_product.id = gene_product_seq.gene_product_id)
  #     Inner JOIN seq ON (seq.id = gene_product_seq.seq_id)
  # Where term.acc = 'GO:0019866'
  #      
  def self.sql_find_sequences_by_term(term)
    ["SELECT", "*",
     "From",  "gene_product",
     "Inner JOIN dbxref ON (gene_product.dbxref_id = dbxref.id)",
     "Inner JOIN association ON (gene_product.id = association.gene_product_id)",
     "Inner JOIN graph_path ON (association.term_id = graph_path.term2_id)",
     "Inner JOIN term ON (graph_path.term1_id = term.id)",
     "INNER JOIN term AS associated_term ON (graph_path.term2_id = associated_term.id)",
     "Inner JOIN gene_product_seq ON (gene_product.id = gene_product_seq.gene_product_id)",
     "Inner JOIN seq ON (seq.id = gene_product_seq.seq_id)",
     "WHERE",
     self.set_condition("term", term), # term.acc = 'GO:0019866'
     ';'
    ].join(" ")
  end

  # find_sequences_by_term("GO:0019866")
  # find_sequences_by_term(:name => "inner membrane")
  def self.find_sequences_by_term(term)
    sql = self.sql_find_sequences_by_term(term)
    GeneProduct.find_by_sql(sql)
  end

      
  # Finding correlations between terms
  #
  # In GO, gene products can be associated with multiple terms in multiple 
  # categories (funtion, process, cellular component). One would expect "protein 
  # biosynthesis" to be correlated with "ribosome" based on gene products in common.
  #
  # We can explore these correlations with SQL queries like the following. 
  # ("protein biosynthesis" is GO:0006412)
  #
  # SELECT
  #  t2.name, t2.acc, count(distinct a2.gene_product_id) AS gpc
  # FROM
  #  term t1,
  #  term t2,
  #  association a1,
  #  association a2
  # WHERE
  #  a1.term_id = t1.id AND
  #  a2.term_id = t2.id AND
  #  a1.gene_product_id = a2.gene_product_id AND
  #  t1.acc = 'GO:0006412' AND
  #  t1.id != t2.id
  # GROUP BY t2.name, t2.acc
  # ORDER BY gpc DESC
  # LIMIT 10
  # This gives the following:
  # +-----------------------------------------------------+------------+------+
  # | name                                                | acc        | gpc  |
  # +-----------------------------------------------------+------------+------+
  # | structural constituent of ribosome                  | GO:0003735 | 1925 |
  # | ribosome                                            | GO:0005840 |  506 |
  # | cytosolic large ribosomal subunit (sensu Eukaryota) | GO:0005842 |  312 |
  # | cytosolic small ribosomal subunit (sensu Eukaryota) | GO:0005843 |  296 |
  # | cytosolic large ribosomal subunit (sensu Bacteria)  | GO:0009282 |  262 |
  # | triplet codon-amino acid adaptor activity           | GO:0030533 |  217 |
  # | RNA binding                                         | GO:0003723 |  214 |
  # | cytosol                                             | GO:0005829 |  205 |
  # | mitochondrial large ribosomal subunit               | GO:0005762 |  180 |
  # | cytosolic small ribosomal subunit (sensu Bacteria)  | GO:0009283 |  158 |
  # +-----------------------------------------------------+------------+------+
  # 10 rows in set (3.77 sec)                                                             
  # (The final row shows the number of gene products that the two terms have in common)
  #
  # We have "discovered" that translation is associated with the ribosome. Woohoo! 
  # Not a startling breakthrough in itself, but the ability to do this query may 
  # prove useful in a variety of contexts eg verifying changes to the ontology or 
  # verifying automatic assignments.
  #
  def self.sql_find_correlated_terms(term)
    ["SELECT", ["t2.name", 
                "t2.acc", 
                "count(distinct a2.gene_product_id) AS gene_product_count"].join(', '),
     "FROM", ["term t1",
              "term t2",
              "association a1",
              "association a2"].join(', '),
     "WHERE", ["a1.term_id = t1.id",
               "a2.term_id = t2.id",
               "a1.gene_product_id = a2.gene_product_id",
               self.set_condition("t1", term), # "t1.acc = 'GO:0006412'" AND
               "t1.id != t2.id"].join(' AND '),
     "GROUP BY t2.name, t2.acc",
     "ORDER BY gene_product_count DESC", ";"].join(' ')
  end

  # Bio::GODatabase.find_correlated_terms("GO:0005777")
  # Bio::GODatabase.find_correlated_terms(:name => 'peroxisome')
  def self.find_correlated_terms(term)
    Bio::GODatabase::Term.find_by_sql(sql_find_correlated_terms(term))
  end

      
  # Transitive Correlations
  #
  # Above we saw direct correlations between two terms. remember GO is a set of 
  # terms in a subsumption architecture, eg if you use the above query to ask 
  # for correlations between "nucleic acid binding" and other terms, you will 
  # miss out on correlations involving "DNA binding". Thankfully we can use the 
  # graph_path table to help us.
  #
  # This example is for finding correlations between transcription factor 
  # (GO:0003700) and other terms:
  #
  # SELECT
  #  t2.name, t2.acc, count(distinct a2.gene_product_id) AS gpc
  # FROM
  #  term t1,
  #  term t2,
  #  graph_path p1,
  #  graph_path p2,
  #  association a1,
  #  association a2,
  #  evidence e1,
  #  evidence e2
  # WHERE
  #  t1.id = p1.term1_id AND
  #  t2.id = p2.term1_id AND
  #  a1.term_id = p1.term2_id AND
  #  a2.term_id = p2.term2_id AND
  #  e1.association_id = a1.id AND
  #  e2.association_id = a2.id AND
  #  a1.is_not = 0 AND
  #  a2.is_not = 0 AND
  #  e1.code != 'IEA' AND
  #  e2.code != 'IEA' AND
  #  a1.gene_product_id = a2.gene_product_id AND
  #  t1.acc = 'GO:0003700'  AND
  #  a1.term_id != a2.term_id
  # GROUP BY t2.name, t2.acc
  # ORDER BY gpc DESC
  # LIMIT 30 
  # (This filters out IEAs which could skew the results)
  #
  # sql_find_transitive_correlations(:acc => 'GO:0006412')
  # sql_find_transitive_correlations(:name => 'DNA helicase activity')
  # sql_find_transitive_correlations(:acc => 'GO:0006412', :term_type => 'cellular_component') 
  def self.sql_find_transitive_correlations(hash = nil) 
    if hash[:acc]
      term = {:acc => hash[:acc]}
    elsif hash[:name]
      term = {:name => hash[:name]}
    end

    if hash[:term_type]
      term_type_condition = "t2.term_type = '#{hash[:term_type]}'"
    else
      term_type_condition = nil
    end

    ["SELECT",
     "t2.*, count(distinct a2.gene_product_id) AS gene_product_count",
     "FROM", ["term t1",
              "term t2",
              "graph_path p1",
              "graph_path p2",
              "association a1",
              "association a2",
              "evidence e1",
              "evidence e2"].join(', '),
     "WHERE", ["t1.id = p1.term1_id",
               "t2.id = p2.term1_id", 
               term_type_condition,
               "a1.term_id = p1.term2_id",
               "a2.term_id = p2.term2_id",
               "e1.association_id = a1.id",
               "e2.association_id = a2.id",
               "a1.is_not = 0",
               "a2.is_not = 0",
               "e1.code != 'IEA'",
               "e2.code != 'IEA'",
               "a1.gene_product_id = a2.gene_product_id", 
               self.set_condition("t1", term), # "t1.acc = 'GO:0003700'",
               "a1.term_id != a2.term_id"].compact.join(' AND '),
     "GROUP", "BY", "t2.name, t2.acc",
     "ORDER", "BY", "gene_product_count", "DESC", ';'
      ].join(' ')
  end


  # Bio::GODatabase.find_transitive_correlations(:acc => 'GO:0006412')
  # Bio::GODatabase.find_transitive_correlations(:name => 'DNA helicase activity')
  # Bio::GODatabase.find_transitive_correlations(:acc => 'GO:0006412', 
  #                                              :term_type => 'cellular_component') 
  def self.find_transitive_correlations(term) 
    Term.find_by_sql(sql_find_transitive_correlations(term))
  end

     
  # Fetching most common interpro domains for a GO term
  #
  # Let's say we want to look at correlations between interpro domains and the 
  # GO term "transmembrane receptor". NOTE: This is independent of the 
  # interpro2go mapping (available from the term_dbxref table). For this you will 
  # need a go database with seqs, as we are looking for interpro accessions 
  # attached to sequence records correlated with terms. (these interpro accessions 
  # become associated with the sequence records via various protein motif searching 
  # programs run at the EBI).
  #
  # SELECT 
  #  term.name, term.acc, dbxref.xref_key, count(dbxref.xref_key) AS c
  # FROM
  #  term,
  #  association,
  #  graph_path,
  #  seq_dbxref,
  #  gene_product_seq AS gps,
  #  dbxref 
  # WHERE
  #  term.name = 'transmembrane receptor' AND
  #  dbxref.xref_dbname = 'interpro' AND
  #  dbxref.id = seq_dbxref.dbxref_id AND
  #  gps.seq_id = seq_dbxref.seq_id AND
  #  gps.gene_product_id = association.gene_product_id AND
  #  association.term_id = graph_path.term2_id AND
  #  association.is_not = 0 AND
  #  graph_path.term1_id = term.id
  # GROUP BY term.name, term.acc, dbxref.xref_key HAVING c > 1 ORDER BY c DESC
  #      
  #
  # sql_find_peptide_motif_by_term(:acc => 'GO:0006412', :motif_dbname => 'interpro')
  def self.sql_find_peptide_motif_by_term(hash)
    motif_condition = nil

    if term = hash[:acc] 
    elsif hash[:name]
      term = {:name => hash[:name]}
    else
      raise ArgumentError
    end

    if dbname = hash[:motif_dbname]
      motif_condition = "dbxref.xref_dbname = '#{dbname}'"
    else
      raise ArgumentError
    end

    ["SELECT", ["term.*",
                "dbxref.id AS dbxref_id",
                "dbxref.*", 
                "count(dbxref.xref_key) AS xref_count"].join(', '),
     "FROM", ["term",
              "association",
              "graph_path",
              "seq_dbxref",
              "gene_product_seq AS gps",
              "dbxref"].join(', '),
     "WHERE", [self.set_condition("term", term), #term.name = 'transmembrane receptor'
               motif_condition, #dbxref.xref_dbname = 'interpro'
               "dbxref.id = seq_dbxref.dbxref_id",
               "gps.seq_id = seq_dbxref.seq_id",
               "gps.gene_product_id = association.gene_product_id",
               "association.term_id = graph_path.term2_id",
               "association.is_not = 0",
               "graph_path.term1_id = term.id"].join(' AND '),
     "GROUP BY term.name, term.acc, dbxref.xref_key",
     "HAVING xref_count > 1", 
     "ORDER BY xref_count DESC", ";"].join(" ")
  end


  # Bio::GODatabase.find_peptide_motif_by_term(:acc => "GO:0005777", :motif_dbanem => 'interpro')
  def self.find_peptide_motif_by_term(hash)
    Term.find_by_sql(sql_find_peptide_motif_by_term(hash))
  end


      
  # Finding pairs of interpro domains associated with terms
  #
  # Let's say we want to know the most common pairing of domains associated 
  # with GPCRs (GO:0004390); we can modify the above query like this:
  #
  # SELECT 
  #  term.name, term.acc, x1.xref_key, x2.xref_key, count(x1.xref_key) AS c
  # FROM
  #  term,
  #  association,
  #  graph_path,
  #  seq_dbxref sx1,
  #  seq_dbxref sx2,
  #  gene_product_seq AS gps,
  #  dbxref x1,
  #  dbxref x2
  # WHERE
  #  term.acc = 'GO:0004930' AND
  #  x1.id < x2.id AND
  #  x1.xref_dbname = 'interpro' AND
  #  x2.xref_dbname = 'interpro' AND
  #  x1.id = sx1.dbxref_id AND
  #  x2.id = sx2.dbxref_id AND
  #  gps.seq_id = sx1.seq_id AND
  #  gps.seq_id = sx2.seq_id AND
  #  gps.gene_product_id = association.gene_product_id AND
  #  association.term_id = graph_path.term2_id AND
  #  association.is_not = 0 AND
  #  graph_path.term1_id = term.id
  # GROUP BY term.name, term.acc, x1.xref_key, x2.xref_key HAVING c > 1 ORDER BY c DESC
  # ;
  #      
  # This gives something like:
  #
  # +----------------------------+--------------+-----------+-----------+----+
  # | name                       | acc          | xref_key  | xref_key  | c  |
  # +----------------------------+--------------+-----------+-----------+----+
  # | G-protein coupled receptor | 'GO:0004930' | IPR003006 | IPR003597 | 34 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR000832 | IPR001879 | 32 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR001828 | IPR000337 | 19 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR000276 | IPR001760 | 17 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR003006 | IPR000353 | 14 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR003597 | IPR000353 | 14 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR003006 | IPR001039 | 13 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR003597 | IPR001039 | 13 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR002106 | IPR000276 | 12 |
  # | G-protein coupled receptor | 'GO:0004930' | IPR001611 | IPR000276 | 11 |
  #
  def self.sql_find_associated_motif_pairs_with_terms(hash)
    if term = hash[:acc] 
    elsif hash[:name] 
      term = {:name => hash[:name]}
    elsif hash[:term] === Bio::GODatabase::Term
      term = hash[:term]
    else 
      raise ArgumentError
    end
    if dbname = hash[:motif_dbname]
      dbname_conditions = ["x1", "x2"].map {|x| "#{x}.xref_dbname = '#{dbname}'" }.join(" AND ")
    else
      raise ArgumentError
    end
    term_condition = set_condition("term", term)

    ["SELECT", ["term.*",
                "x1.xref_key AS xref_key_1",
                "x2.xref_key AS xref_key_2",
                "count(x1.xref_key) AS xref_count"].join(', '),
     "FROM", ["term",
              "association",
              "graph_path",
              "seq_dbxref sx1",
              "seq_dbxref sx2",
              "gene_product_seq AS gps",
              "dbxref x1",
              "dbxref x2"].join(', '),
     "WHERE", [term_condition, #term.acc = 'GO:0004930'
               "x1.id < x2.id",
               dbname_conditions, #  x1.xref_dbname = 'interpro' AND
                                  #  x2.xref_dbname = 'interpro'
               "x1.id = sx1.dbxref_id",
               "x2.id = sx2.dbxref_id",
               "gps.seq_id = sx1.seq_id",
               "gps.seq_id = sx2.seq_id",
               "gps.gene_product_id = association.gene_product_id",
               "association.term_id = graph_path.term2_id",
               "association.is_not = 0",
               "graph_path.term1_id = term.id"].join(" AND "),
     "GROUP BY term.name, term.acc, x1.xref_key, x2.xref_key",
     "HAVING xref_count > 1 ORDER BY xref_count DESC",  
     ';'].join(' ')
  end

  # find_associated_motif_pairs_with_terms(:acc => 'GO:0004930')
  # find_associated_motif_pairs_with_terms(:name => 'DNA helicase activity')
  def self.find_associated_motif_pairs_with_terms(hash)
    Term.find_by_sql(self.sql_find_associated_motif_pairs_with_terms(hash))
  end


  # Batch Queries
  #
  # If you have a list of gene product identifiers that have been annotated with 
  # GO terms in the GO database, you can do a batch query for the GO terms 
  # associated with these:
  # 
  # SELECT term.*
  # FROM 
  #  dbxref
  #   INNER JOIN
  #  gene_product ON (dbxref.id = gene_product.dbxref_id)
  #   INNER JOIN
  #  association ON (association.gene_product_id=gene_product.id)
  #   INNER JOIN
  #  term ON (association.term_id = term.id)
  # WHERE      
  #  dbxref.xref_key IN 
  #  ('Q8BRY9', 'Q8BRX4', 'Q8BRZ4', ...)
  #      
  # If you would like to fetch the gene product dbxref with every term, change the 
  # SELECT clause to term.*,gene_product.*,dbxref.*
  #
end # class GODatabase
end # module Bio            

Modeling GO Database #1

|  Modeling GO Database #1 - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Modeling GO Database #1 - "aac".translate #=> "N"  Modeling GO Database #1 - "aac".translate #=> "N" のブックマークコメント

screenshotscreenshotscreenshotscreenshot

module Bio
class GODatabase

require 'rubygems'
require 'active_record'
require 'active_support'

ActiveRecord::Base.pluralize_table_names = false

# +-----------------------+
# | Tables_in_go          |
# +-----------------------+
# | assoc_rel             |
# +----------------------+---------+------+-----+---------+----------------+
# | Field                | Type    | Null | Key | Default | Extra          |
# +----------------------+---------+------+-----+---------+----------------+
# | id                   | int(11) | NO   | PRI |         | auto_increment |
# | from_id              | int(11) | NO   | MUL | 0       |                |
# | to_id                | int(11) | NO   | MUL | 0       |                |
# | relationship_type_id | int(11) | NO   | MUL | 0       |                |
# +----------------------+---------+------+-----+---------+----------------+
class AssocRel < ActiveRecord::Base
end


# | association           |
# +-----------------+---------+------+-----+---------+----------------+
# | Field           | Type    | Null | Key | Default | Extra          |
# +-----------------+---------+------+-----+---------+----------------+
# | id              | int(11) | NO   | PRI |         | auto_increment |
# | term_id         | int(11) | NO   | MUL | 0       |                |
# | gene_product_id | int(11) | NO   | MUL | 0       |                |
# | is_not          | int(11) | YES  |     |         |                |
# | role_group      | int(11) | YES  |     |         |                |
# | assocdate       | int(11) | YES  |     |         |                |
# | source_db_id    | int(11) | YES  | MUL |         |                |
# +-----------------+---------+------+-----+---------+----------------+
class Association < ActiveRecord::Base
  belongs_to :term
  belongs_to :gene_product
  has_many :db, :through => :source_db_id
  has_many :evidence, :foreign_key => :association_id
end


# | association_qualifier |
# +----------------+--------------+------+-----+---------+----------------+
# | Field          | Type         | Null | Key | Default | Extra          |
# +----------------+--------------+------+-----+---------+----------------+
# | id             | int(11)      | NO   | PRI |         | auto_increment |
# | association_id | int(11)      | NO   | MUL | 0       |                |
# | term_id        | int(11)      | NO   | MUL | 0       |                |
# | value          | varchar(255) | YES  |     |         |                |
# +----------------+--------------+------+-----+---------+----------------+
class AssociationQualifier < ActiveRecord::Base
  belongs_to :association
  belongs_to :term
end



# | db                    |
# +-------------+--------------+------+-----+---------+----------------+
# | Field       | Type         | Null | Key | Default | Extra          |
# +-------------+--------------+------+-----+---------+----------------+
# | id          | int(11)      | NO   | PRI |         | auto_increment |
# | name        | varchar(55)  | YES  | UNI |         |                |
# | fullname    | varchar(255) | YES  | MUL |         |                |
# | datatype    | varchar(255) | YES  | MUL |         |                |
# | generic_url | varchar(255) | YES  |     |         |                |
# | url_syntax  | varchar(255) | YES  |     |         |                |
# +-------------+--------------+------+-----+---------+----------------+
class Db < ActiveRecord::Base
  belongs_to :association
end

# | dbxref                |
# +--------------+--------------+------+-----+---------+----------------+
# | Field        | Type         | Null | Key | Default | Extra          |
# +--------------+--------------+------+-----+---------+----------------+
# | id           | int(11)      | NO   | PRI |         | auto_increment |
# | xref_key     | varchar(255) | NO   | MUL |         |                |
# | xref_keytype | varchar(32)  | YES  |     |         |                |
# | xref_dbname  | varchar(55)  | NO   | MUL |         |                |
# | xref_desc    | varchar(255) | YES  |     |         |                |
# +--------------+--------------+------+-----+---------+----------------+
class Dbxref < ActiveRecord::Base
  has_and_belongs_to_many :seq, :join_table => :seq_dbxref
  has_and_belongs_to_many :term, :join_table => :term_dbxref
  has_and_belongs_to_many :evidence, :join_table => :evidence_dbxref
end


# | evidence              |
# +----------------+--------------+------+-----+---------+----------------+
# | Field          | Type         | Null | Key | Default | Extra          |
# +----------------+--------------+------+-----+---------+----------------+
# | id             | int(11)      | NO   | PRI |         | auto_increment |
# | code           | varchar(8)   | NO   | MUL |         |                |
# | association_id | int(11)      | NO   | MUL | 0       |                |
# | dbxref_id      | int(11)      | NO   | MUL | 0       |                |
# | seq_acc        | varchar(255) | YES  |     |         |                |
# +----------------+--------------+------+-----+---------+----------------+
class Evidence < ActiveRecord::Base
  belongs_to :association, :foreign_key => :association_id
  has_and_belongs_to_many :dbxref, :join_table => :evidence_dbxref
end


# | evidence_dbxref       |
# +-------------+---------+------+-----+---------+-------+
# | Field       | Type    | Null | Key | Default | Extra |
# +-------------+---------+------+-----+---------+-------+
# | evidence_id | int(11) | NO   | MUL | 0       |       |
# | dbxref_id   | int(11) | NO   | MUL | 0       |       |
# +-------------+---------+------+-----+---------+-------+
class EvidenceDbxref < ActiveRecord::Base
  belongs_to :evidence
  belongs_to :dbxref
end


# | gene_product          |
# +----------------------+--------------+------+-----+---------+----------------+
# | Field                | Type         | Null | Key | Default | Extra          |
# +----------------------+--------------+------+-----+---------+----------------+
# | id                   | int(11)      | NO   | PRI |         | auto_increment |
# | symbol               | varchar(128) | NO   | MUL |         |                |
# | dbxref_id            | int(11)      | NO   | UNI | 0       |                |
# | species_id           | int(11)      | YES  | MUL |         |                |
# | secondary_species_id | int(11)      | YES  |     |         |                |
# | type_id              | int(11)      | YES  | MUL |         |                |
# | full_name            | text         | YES  |     |         |                |
# +----------------------+--------------+------+-----+---------+----------------+
class GeneProduct < ActiveRecord::Base
#  has_one :gene_product_count, :dependent => true, :foreign_key => 
  has_many :gene_product_property
  has_many :gene_product_seq
  has_many :gene_product_synonym
  has_many :dbxref, :foreign_key => :id
  has_one :species, :foreign_key => :id
  has_many :association
  has_and_belongs_to_many :term, :join_table => :association
end


# | gene_product_count    |
# +---------------+-------------+------+-----+---------+-------+
# | Field         | Type        | Null | Key | Default | Extra |
# +---------------+-------------+------+-----+---------+-------+
# | term_id       | int(11)     | NO   | MUL | 0       |       |
# | code          | varchar(8)  | YES  | MUL |         |       |
# | speciesdbname | varchar(55) | NO   | MUL |         |       |
# | product_count | int(11)     | NO   |     | 0       |       |
# +---------------+-------------+------+-----+---------+-------+
class GeneProductCount < ActiveRecord::Base
  set_primary_key :term_id
  belongs_to :gene_product
end


# | gene_product_property |
# +-----------------+--------------+------+-----+---------+-------+
# | Field           | Type         | Null | Key | Default | Extra |
# +-----------------+--------------+------+-----+---------+-------+
# | gene_product_id | int(11)      | NO   | MUL | 0       |       |
# | property_key    | varchar(64)  | NO   | MUL |         |       |
# | property_val    | varchar(255) | YES  | MUL |         |       |
# +-----------------+--------------+------+-----+---------+-------+
class GeneProductProperty < ActiveRecord::Base
  set_primary_key :gene_product_id
  belongs_to :gene_product
end


# | gene_product_seq      |
# +-----------------+---------+------+-----+---------+-------+
# | Field           | Type    | Null | Key | Default | Extra |
# +-----------------+---------+------+-----+---------+-------+
# | gene_product_id | int(11) | NO   | MUL | 0       |       |
# | seq_id          | int(11) | NO   | MUL | 0       |       |
# | is_primary_seq  | int(11) | YES  |     |         |       |
# +-----------------+---------+------+-----+---------+-------+
class GeneProductSeq < ActiveRecord::Base
  set_primary_key :gene_product_id
  belongs_to :gene_product
  has_one :seq, :foreign_key => :id
end


# | gene_product_synonym  |
# +-----------------+--------------+------+-----+---------+-------+
# | Field           | Type         | Null | Key | Default | Extra |
# +-----------------+--------------+------+-----+---------+-------+
# | gene_product_id | int(11)      | NO   | PRI | 0       |       |
# | product_synonym | varchar(255) | NO   | PRI |         |       |
# +-----------------+--------------+------+-----+---------+-------+
class GeneProductSynonym< ActiveRecord::Base
  set_primary_key :gene_product_id
  belongs_to :gene_product
end


# | graph_path            |
# +----------+---------+------+-----+---------+----------------+
# | Field    | Type    | Null | Key | Default | Extra          |
# +----------+---------+------+-----+---------+----------------+
# | id       | int(11) | NO   | PRI |         | auto_increment |
# | term1_id | int(11) | NO   | MUL | 0       |                |
# | term2_id | int(11) | NO   | MUL | 0       |                |
# | distance | int(11) | NO   |     | 0       |                |
# +----------+---------+------+-----+---------+----------------+
class GraphPath < ActiveRecord::Base
end


# | graph_path2term       |
# +---------------+---------+------+-----+---------+-------+
# | Field         | Type    | Null | Key | Default | Extra |
# +---------------+---------+------+-----+---------+-------+
# | graph_path_id | int(11) | NO   | MUL | 0       |       |
# | term_id       | int(11) | NO   | MUL | 0       |       |
# | rank          | int(11) | NO   |     | 0       |       |
# +---------------+---------+------+-----+---------+-------+
class GraphPath2term < ActiveRecord::Base
#  belongs_to :term
end


# | instance_data         |
# +---------------+--------------+------+-----+---------+-------+
# | Field         | Type         | Null | Key | Default | Extra |
# +---------------+--------------+------+-----+---------+-------+
# | release_name  | varchar(255) | YES  | UNI |         |       |
# | release_type  | varchar(255) | YES  |     |         |       |
# | release_notes | text         | YES  |     |         |       |
# +---------------+--------------+------+-----+---------+-------+
class InstanceData < ActiveRecord::Base
end


# | seq                   |
# +-------------+--------------+------+-----+---------+----------------+
# | Field       | Type         | Null | Key | Default | Extra          |
# +-------------+--------------+------+-----+---------+----------------+
# | id          | int(11)      | NO   | PRI |         | auto_increment |
# | display_id  | varchar(64)  | YES  | MUL |         |                |
# | description | varchar(255) | YES  |     |         |                |
# | seq         | mediumtext   | YES  |     |         |                |
# | seq_len     | int(11)      | YES  |     |         |                |
# | md5checksum | varchar(32)  | YES  | MUL |         |                |
# | moltype     | varchar(25)  | YES  |     |         |                |
# | timestamp   | int(11)      | YES  |     |         |                |
# +-------------+--------------+------+-----+---------+----------------+
class Seq < ActiveRecord::Base
  has_many :seq_dbxref, :foreign_key => :seq_id
  has_many :seq_property
  belongs_to :gene_product_seq
  has_and_belongs_to_many :dbxref, :join_table => :seq_dbxref
end



# | seq_dbxref            |

# +-----------+---------+------+-----+---------+-------+
# | Field     | Type    | Null | Key | Default | Extra |
# +-----------+---------+------+-----+---------+-------+
# | seq_id    | int(11) | NO   | PRI | 0       |       |
# | dbxref_id | int(11) | NO   | PRI | 0       |       |
# +-----------+---------+------+-----+---------+-------+
class SeqDbxref < ActiveRecord::Base
  set_primary_key :seq_id
  belongs_to :seq
end



# | seq_property          |
# +--------------+--------------+------+-----+---------+----------------+
# | Field        | Type         | Null | Key | Default | Extra          |
# +--------------+--------------+------+-----+---------+----------------+
# | id           | int(11)      | NO   | PRI |         | auto_increment |
# | seq_id       | int(11)      | NO   | MUL | 0       |                |
# | property_key | varchar(64)  | NO   | MUL |         |                |
# | property_val | varchar(255) | NO   | MUL |         |                |
# +--------------+--------------+------+-----+---------+----------------+
class SeqProperty < ActiveRecord::Base
  set_primary_key 'id'
  belongs_to :seq
end

# | source_audit          |
# +--------------+--------------+------+-----+---------+-------+
# | Field        | Type         | Null | Key | Default | Extra |
# +--------------+--------------+------+-----+---------+-------+
# | source_path  | varchar(255) | YES  | MUL |         |       |
# | source_type  | varchar(255) | YES  |     |         |       |
# | source_md5   | varchar(32)  | YES  |     |         |       |
# | source_mtime | int(11)      | YES  |     |         |       |
# +--------------+--------------+------+-----+---------+-------+
class SeqProperty < ActiveRecord::Base
  set_primary_key 'id'
end


# | species               |
# +----------------+--------------+------+-----+---------+----------------+
# | Field          | Type         | Null | Key | Default | Extra          |
# +----------------+--------------+------+-----+---------+----------------+
# | id             | int(11)      | NO   | PRI |         | auto_increment |
# | ncbi_taxa_id   | int(11)      | YES  | UNI |         |                |
# | common_name    | varchar(255) | YES  | MUL |         |                |
# | lineage_string | text         | YES  |     |         |                |
# | genus          | varchar(55)  | YES  | MUL |         |                |
# | species        | varchar(255) | YES  | MUL |         |                |
# +----------------+--------------+------+-----+---------+----------------+
class Species < ActiveRecord::Base
  set_primary_key 'id'
  belongs_to :gene_product
end


# | term                  |
# +-------------+--------------+------+-----+---------+----------------+
# | Field       | Type         | Null | Key | Default | Extra          |
# +-------------+--------------+------+-----+---------+----------------+
# | id          | int(11)      | NO   | PRI |         | auto_increment |
# | name        | varchar(255) | NO   | MUL |         |                |
# | term_type   | varchar(55)  | NO   | MUL |         |                |
# | acc         | varchar(255) | NO   | UNI |         |                |
# | is_obsolete | int(11)      | NO   |     | 0       |                |
# | is_root     | int(11)      | NO   |     | 0       |                |
# +-------------+--------------+------+-----+---------+----------------+
class Term < ActiveRecord::Base
  set_primary_key 'id'
  has_and_belongs_to_many :gene_product, :join_table => :association
  has_and_belongs_to_many :dbxref, :join_table => :term_dbxref
  has_many :term_audit
  has_many :association
end


# | term2term             |
# +----------------------+---------+------+-----+---------+----------------+
# | Field                | Type    | Null | Key | Default | Extra          |
# +----------------------+---------+------+-----+---------+----------------+
# | id                   | int(11) | NO   | PRI |         | auto_increment |
# | relationship_type_id | int(11) | NO   | MUL | 0       |                |
# | term1_id             | int(11) | NO   | MUL | 0       |                |
# | term2_id             | int(11) | NO   | MUL | 0       |                |
# | complete             | int(11) | NO   |     | 0       |                |
# +----------------------+---------+------+-----+---------+----------------+
class Term2term < ActiveRecord::Base
  set_primary_key 'id'
end


# | term_audit            |
# +---------------+---------+------+-----+---------+-------+
# | Field         | Type    | Null | Key | Default | Extra |
# +---------------+---------+------+-----+---------+-------+
# | term_id       | int(11) | NO   | PRI | 0       |       |
# | term_loadtime | int(11) | YES  |     |         |       |
# +---------------+---------+------+-----+---------+-------+
class TermAudit < ActiveRecord::Base
  belongs_to :term
end


# | term_dbxref           |
# +-------------------+---------+------+-----+---------+-------+
# | Field             | Type    | Null | Key | Default | Extra |
# +-------------------+---------+------+-----+---------+-------+
# | term_id           | int(11) | NO   | PRI | 0       |       |
# | dbxref_id         | int(11) | NO   | PRI | 0       |       |
# | is_for_definition | int(11) | NO   | PRI | 0       |       |
# +-------------------+---------+------+-----+---------+-------+
class TermDbxref < ActiveRecord::Base
end



# | term_definition       |
# +-----------------+--------------+------+-----+---------+-------+
# | Field           | Type         | Null | Key | Default | Extra |
# +-----------------+--------------+------+-----+---------+-------+
# | term_id         | int(11)      | NO   | PRI | 0       |       |
# | term_definition | text         | NO   |     |         |       |
# | dbxref_id       | int(11)      | YES  | MUL |         |       |
# | term_comment    | mediumtext   | YES  |     |         |       |
# | reference       | varchar(255) | YES  |     |         |       |
# +-----------------+--------------+------+-----+---------+-------+
class TermDefinition < ActiveRecord::Base
end


# | term_synonym          |
# +-----------------+--------------+------+-----+---------+-------+
# | Field           | Type         | Null | Key | Default | Extra |
# +-----------------+--------------+------+-----+---------+-------+
# | term_id         | int(11)      | NO   | MUL | 0       |       |
# | term_synonym    | varchar(255) | YES  | MUL |         |       |
# | acc_synonym     | varchar(255) | YES  |     |         |       |
# | synonym_type_id | int(11)      | NO   | MUL | 0       |       |
# +-----------------+--------------+------+-----+---------+-------+
class TermSynonym < ActiveRecord::Base
end

end # class GODatabase
end # module Bio

Example codes for Bio::GODatabase.

Bio::GODatabase.connection(:adapter => 'mysql', :host => 'localhost', 
                           :username => 'root', :password => '', :database => 'go')

p Bio::GODatabase::Term.find(5777)
p Bio::GODatabase::Seq.find(1)

p Bio::GODatabase.find_all_children(:term => 'blood coagulation')
p Bio::GODatabase.find_all_children('GO:0007555')

p Bio::GODatabase.find_every_descendent(:term => 'blood coagulation')
p Bio::GODatabase.find_every_descendent('GO:0007555')

p Bio::GODatabase.find_ancestors(:term => 'DNA Helicase Activity')
p Bio::GODatabase.find_ancestors('GO:0007555')

p Bio::GODatabase.find_ancestor_terms_with_relationships(:term => 'DNA Helicase Activity')
p Bio::GODatabase.find_ancestor_terms_with_relationships('GO:0007555')

p Bio::GODatabase.find_ancestor_terms_with_relationships(7555)

p Bio::GODatabase.count_gene_products_annotated_to_two_terms('immune response', 'interleukin-4 receptor binding')

p Bio::GODatabase.find_shared_parent('GO:0008045', 'GO:0007474')

p Bio::GODatabase.find_distance('GO:0007165', 'GO:0006629')

p Bio::GODatabase.find_products(:term=>"transmembrane receptor", 
                                :speciesdb=>"FB", :evcodes=>["!IEA", "!ISS"])

# OR query using the NCBI taxon ID - this will give different results; the 
# former is FlyBase associations only; the latter is FlyBase Drosophila 
# Melanogaster associations PLUS (possibly redundant) associations for fly 
# proteins coming from other sources (eg GOA).
#
p Bio::GODatabase.find_products(:term=>"transmembrane receptor", 
                                :taxid=>7227, :evcodes=>["!IEA", "!ISS"]);
      
# See also the ::GO::Model::GeneProduct class which represents instances of GeneProducts
#
# You can use the stag-template go-geneproduct-by-term-and-evidence; here are some example below
#
# selectall_xml.pl -d go /go-geneproduct-by-term-and-evidence GO:0004888
p Bio::GODatabase.find_products('GO:0004888')

# selectall_xml.pl -d go /go-geneproduct-by-term-and-evidence term_name='transmembrane receptor activity'
p Bio::GODatabase.find_products(:term => 'transmembrane receptor activity')

# selectall_xml.pl -d go /go-geneproduct-by-term-and-evidence term_acc=GO:0004888 product_dbname=FB
p Bio::GODatabase.find_products('GO:0004888', :dbname => 'FB')

# selectall_xml.pl -d go /go-geneproduct-by-term-and-evidence term_acc=GO:0004888 ncbi_taxa_id=7227
p Bio::GODatabase.find_products('GO:0004888', :ncbi_taxa_id => 7227)

# selectall_xml.pl -d go /go-geneproduct-by-term-and-evidence -where\
#    "term.acc='GO:0004888' AND species.ncbi_taxoa_id=7227 AND evidence.code not in ('IEA','ISS')"
p Bio::GODatabase.find_products(:where => "term.acc='GO:0004888' AND species.ncbi_taxoa_id=7227 AND evidence.code not in ('IEA','ISS')")

Makefile for GO Database installation.

|  Makefile for GO Database installation. - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Makefile for GO Database installation. - "aac".translate #=> "N"  Makefile for GO Database installation. - "aac".translate #=> "N" のブックマークコメント

  1. Mac OS X 1.4.6
  2. mysql5 (darwinport)
LATEST  = http://archive.godatabase.org/latest-full
REL     = 200605
ZCAT    = zcat
MYSQL   = mysql5 -u root -p
MYGODB  = go

download        : 
        curl -O $(LATEST)/go_$(REL)-seqdb-data.gz; 

install : install-data

install-data    : 
        echo "create database $(MYGODB);" | $(MYSQL)
        $(ZCAT) go_$(REL)-seqdb-data.gz | $(MYSQL) $(MYGODB);
 % make download
 % make install
トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20060616