Hatena::Groupbioruby

"aac".translate #=> "N" このページをアンテナに追加 RSSフィード

2007-04-11

GO Database on ActiveRecord plugin released.

|  GO Database on ActiveRecord plugin released. - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  GO Database on ActiveRecord plugin released. - "aac".translate #=> "N"  GO Database on ActiveRecord plugin released. - "aac".translate #=> "N" のブックマークコメント

I released GO Database on ActiveRecord, an active record models for the GO Database at the BioRuby Rails Plugins. The GO Database is the backend RDB for Amigo, the gene ontology broswer. The plugin works as an alternative SQL query interface for the GO Database.

screenshot screenshot screenshot screenshot

To install the go_database Rails plugin, on a rails application directory,

% script/plugin source svn://rubyforge.org/var/svn/bioruby-annex/rails/plugins/
% script/plugin list
% script/plugin install go_database

.

RDoc manual is available.

% cd vendor/plugins/go_database
% rake rdoc

After that, open rdoc/index.html.

トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20070411

2007-04-10

UniProt on ActiveRecord plugin released

|  UniProt on ActiveRecord plugin released - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  UniProt on ActiveRecord plugin released - "aac".translate #=> "N"  UniProt on ActiveRecord plugin released - "aac".translate #=> "N" のブックマークコメント

I released UniProt on ActiveRrcord a rails plugin. The plugin distributed via subversion repository at BioRuby Rails Plugins (bioruby-annex).

screenshot screenshot screenshot

Add the bioruby-annex subversion repository to the plugin sources.

% script/plugin source svn://rubyforge.org/var/svn/bioruby-annex/rails/plugins/
% script/plugin list | egrep uniprot
uniprot           svn://rubyforge.org/var/svn/bioruby-annex/rails/plugins/uniprot/

Install the uniprot plugin form the bioruby-annex subversion repository.

% script/plugin install uniprot

Generate uniprot files.

% script/generate uniprot 
% script/generate uniprot add_uniprot_models

Build RDoc.

% cd vendor/plugins/uniprot
% rake rdoc

See the RDoc generated for more details of the models and schema.

トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20070410

2007-04-08

Update UniProt release 10 (6 Martch 2007) on ActiveRecord

|  Update UniProt release 10 (6 Martch 2007) on ActiveRecord - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Update UniProt release 10 (6 Martch 2007) on ActiveRecord - "aac".translate #=> "N"  Update UniProt release 10 (6 Martch 2007) on ActiveRecord - "aac".translate #=> "N" のブックマークコメント

UniProt release 10 released on 6 March 2007. To up to data the UniProt on ActiveRecord, some codes need to update.

screenshot screenshot screenshot


UniProt chaged the ID line format at the previous release (UniProtKB release 9.0 of 31-Oct-2006). The CVS HEAD version of bioruby supported the lastest uniprot format.

Import the ralatest UniProt/SwissProt into my mysql via ActiveRecord.

Download the latest version of uniprot_sprot.dat.

% curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz

Execute an importing rake script.

% rake uniprot:import
...
rake uniprot:import  34054.50s user 1845.74s system 55% cpu 18:07:24.06 total

Too many time spent. :(

I beleve such waste of time may be reduced by ActiveRecord.transaction method.

Using script/console at the uniprot rails directory,

>> Entry.count
=> 263525
>> Os.find_by_name("Homo sapiens").entries.size
=> 16053

It seems to work fine.

Dump uniprot_sprot.dat from the mysql database.

% rake uniprot:dump
mysqldump -uroot uniprot_development > db/uniprot_development.dump
% du -sh db/uniprot_development.dump
904M    uniprot_development.dump
% gzip -9 db/uniprot_development.dump
% du -sh db/uniprot_development.dump.gz
179M    uniprot_development.dump.gz

bioruby-annex launched.

|  bioruby-annex launched. - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  bioruby-annex launched. - "aac".translate #=> "N"  bioruby-annex launched. - "aac".translate #=> "N" のブックマークコメント

I launched bioruby-annex, an open repository for biological applications and rails plugins with bioruby at RubyForge.org. The repository used subversion to manage source codes so that users can use the "script/plugin install a-svn-url" mechanism.

screenshot

Install script usage intended,

% script/plugin install svn://rubyforge.org/var/svn/bioruby-annex/rails/plugins/${plugin_name}/trunk
トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20070408

2007-04-06BioSQL on ActiveRecord

Bio::BioSQL access module by ActiveRecord

|  Bio::BioSQL access module by ActiveRecord - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Bio::BioSQL access module by ActiveRecord - "aac".translate #=> "N"  Bio::BioSQL access module by ActiveRecord - "aac".translate #=> "N" のブックマークコメント

On the BioRuby mailing list, RJP released a new implementation of BioSQL class powered by ActiveRecord.

screenshot screenshot screenshot


Preperation of BioSQL database

Download BioSQL schema.

screenshot


Install BioSQL schema.
% mysqladmin -u root -p create biosql_tx_test
% mysqladmin -u root -p create biosql_sp_test
% mysql -u root -p biosql_tx_test < sql/biosqldb-mysql.sql
% mysql -u root -p biosql_sp_test < sql/biosqldb-mysql.sql

I prepared two mysql databases, biosql_tx_test database for NCBI Taxonomy data and biosql_sp_test database for SwissProt data.


Install NCBI Taxonomy data

On the biosql-schema directory,

% mkdir taxdata
% cd taxdata
% lftp ftp.ncbi.nlm.nih.gov/pub/taxonomy
cd ok, cwd=/pub/taxonomy                                                
lftp ftp.ncbi.nlm.nih.gov:/pub/taxonomy> get taxdump.tar.gz
lftp ftp.ncbi.nlm.nih.gov:/pub/taxonomy> exit
% tar zxvf taxdump.tar.gz
% ls
citations.dmp  delnodes.dmp  division.dmp  gc.prt  gencode.dmp  
merged.dmp  names.dmp  nodes.dmp  readme.txt  taxdump.tar.gz

and invoke load_ncbi_taxonomy.pl script.

% cd ..
% perl scripts/load_ncbi_taxonomy.pl --dbname biosql_tx_test --dbuser root

screenshot

Accessing NCBI Taxonomy data using ARBioSQL

Connecting the database.

% irb -r bio -r arbiosql.rb
irb(main):001:0> con = Bio::BioSQL.new('mysql', 'biosql_tx_db', 'root')
irb(main):002:0> Bio::BioSQL::Taxon.find(1)
=> #<Bio::BioSQL::Taxon:0x2933648 @attributes={"genetic_code"=>"1", "node_rank"=>"no rank", "right_value"=>"712768", "left_value"=>"1", "taxon_id"=>"1", "ncbi_taxon_id"=>"1", "mito_genetic_code"=>"0", "parent_taxon_id"=>"1"}>

Bio::BioSQL provides connection establishment method to BioSQL database ('mysql') and ActiveRecord model classes (cf. Bio::BioSQL::Taxon).

Haw many Taxon entries ?

irb(main):003:0> Bio::BioSQL::Taxon.count
=> 356384

human Taxonomy ID ?

irb(main):019:0> Bio::BioSQL::TaxonName.find_by_name("human")
=> #<Bio::BioSQL::TaxonName:0x29449c0 @attributes={"name"=>"human", "name_class"=>"genbank common name", "taxon_id"=>"9606"}>
irb(main):020:0> Bio::BioSQL::TaxonName.find_by_name("human").taxon.id
=> 9606
irb(main):021:0> Bio::BioSQL::TaxonName.find_by_name("human").taxon_id
=> 9606

TaxonName belongs_to Taxon.

Although Bio::BioSQL::Taxon have "parent_taxon_id" field, Bio::BioSQL::Taxon#parent_taxon (#=> Bio::BioSQL::Taxon) method is not implemented.

The definition of Bio::BioSQL::Taxon class.

class Taxon < ActiveRecord::Base
  set_table_name "taxon"
  set_primary_key "taxon_id"
  set_sequence_name "taxon_pk_seq"
  has_many :taxon_name #probably has_one
  has_one :bioentry
end

Following codes support to retrieve the prent Taxon by Taxon#parent method,

  belongs_to :parent, :foreign_key => 'parent_taxon_id', :class_name => 'Taxon'

or

  def parent
    self.class.find(self.parent_taxon_id)
  end

Both codes work compatible.

irb(main):002:0> tax = Bio::BioSQL::Taxon.find(10)
=> #<Bio::BioSQL::Taxon:0x2949d30 @attributes={"genetic_code"=>"11", "node_rank"=>"genus", "right_value"=>"66309", "left_value"=>"66272", "taxon_id"=>"10", "ncbi_taxon_id"=>"10", "mito_genetic_code"=>"0", "parent_taxon_id"=>"135621"}>
irb(main):003:0> tax.parent
=> #<Bio::BioSQL::Taxon:0x2947cb0 @attributes={"genetic_code"=>"11", "node_rank"=>"family", "right_value"=>"73406", "left_value"=>"66271", "taxon_id"=>"135621", "ncbi_taxon_id"=>"135621", "mito_genetic_code"=>"0", "parent_taxon_id"=>"72274"}>

Install SwissProt data.

Use bioperl-db/scripts/biosql/load_seqdatabase.pl.

perl load_seqdatabase.pl -dbuser root -dbname biosql_sp_test -namespace swissprot -format swiss sprot40.dat

screenshot screenshot

Accessing UniProt/SwissProt data using ARBioSQL


トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20070406