Hatena::Groupbioruby

"aac".translate #=> "N" このページをアンテナに追加 RSSフィード

2007-04-10

UniProt on ActiveRecord plugin released

|  UniProt on ActiveRecord plugin released - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  UniProt on ActiveRecord plugin released - "aac".translate #=> "N"  UniProt on ActiveRecord plugin released - "aac".translate #=> "N" のブックマークコメント

I released UniProt on ActiveRrcord a rails plugin. The plugin distributed via subversion repository at BioRuby Rails Plugins (bioruby-annex).

screenshot screenshot screenshot

Add the bioruby-annex subversion repository to the plugin sources.

% script/plugin source svn://rubyforge.org/var/svn/bioruby-annex/rails/plugins/
% script/plugin list | egrep uniprot
uniprot           svn://rubyforge.org/var/svn/bioruby-annex/rails/plugins/uniprot/

Install the uniprot plugin form the bioruby-annex subversion repository.

% script/plugin install uniprot

Generate uniprot files.

% script/generate uniprot 
% script/generate uniprot add_uniprot_models

Build RDoc.

% cd vendor/plugins/uniprot
% rake rdoc

See the RDoc generated for more details of the models and schema.

トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20070410

2007-04-08

Update UniProt release 10 (6 Martch 2007) on ActiveRecord

|  Update UniProt release 10 (6 Martch 2007) on ActiveRecord - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Update UniProt release 10 (6 Martch 2007) on ActiveRecord - "aac".translate #=> "N"  Update UniProt release 10 (6 Martch 2007) on ActiveRecord - "aac".translate #=> "N" のブックマークコメント

UniProt release 10 released on 6 March 2007. To up to data the UniProt on ActiveRecord, some codes need to update.

screenshot screenshot screenshot


UniProt chaged the ID line format at the previous release (UniProtKB release 9.0 of 31-Oct-2006). The CVS HEAD version of bioruby supported the lastest uniprot format.

Import the ralatest UniProt/SwissProt into my mysql via ActiveRecord.

Download the latest version of uniprot_sprot.dat.

% curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz

Execute an importing rake script.

% rake uniprot:import
...
rake uniprot:import  34054.50s user 1845.74s system 55% cpu 18:07:24.06 total

Too many time spent. :(

I beleve such waste of time may be reduced by ActiveRecord.transaction method.

Using script/console at the uniprot rails directory,

>> Entry.count
=> 263525
>> Os.find_by_name("Homo sapiens").entries.size
=> 16053

It seems to work fine.

Dump uniprot_sprot.dat from the mysql database.

% rake uniprot:dump
mysqldump -uroot uniprot_development > db/uniprot_development.dump
% du -sh db/uniprot_development.dump
904M    uniprot_development.dump
% gzip -9 db/uniprot_development.dump
% du -sh db/uniprot_development.dump.gz
179M    uniprot_development.dump.gz
トラックバック - http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/20070408

2006-07-11UniProt on ActiveRecord

Modeling for UniProt/Knowledgebase Entry by ActiveRecord

|  Modeling for UniProt/Knowledgebase Entry by ActiveRecord - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Modeling for UniProt/Knowledgebase Entry by ActiveRecord - "aac".translate #=> "N"  Modeling for UniProt/Knowledgebase Entry by ActiveRecord - "aac".translate #=> "N" のブックマークコメント

screenshot screenshot screenshot screenshot screenshot screenshot screenshot screenshot



Examples

Loading and find Entry by name (entry_id).
$ script/console
Loading development environment.
>> Entry.find_by_name("ATAD1_HUMAN")
=> #<Entry:0x241cf5c @attributes={"name"=>"ATAD1_HUMAN", "entry_type"=>nil, "dt_create"=>"13-SEP-2005, integrated into UniProtKB/Swiss-Prot.", "dt_annotation"=>"18-APR-2006, entry version 26.", "sequence"=>"MVHAEAFSRPLSRNEVVGLIFRLTIFGAVTYFTIKWMVDAIDPTRKQKVEAQKQAEKLMKQIGVKNVKLSEYEMSIAAHLVDPLNMHVTWSDIAGLDDVITDLKDTVILPIKKKHLFENSRLLQPPKGVLLYGPPGCGKTLIAKATAKEAGCRFINLQPSTLTDKWYGESQKLAAAVFSLAIKLQPSIIFIDEIDSFLRNRSSSDHEATAMMKAQFMSLWDGLDTDHSCQVIVMGATNRPQDLDSAIMRRMPTRFHINQPALKQREAILKLILKNENVDRHVDLLEVAQETDGFSGSDLKEMCRDAALLCVREYVNSTSEESHDEDEIRPVQQQDLHRAIEKMKKSKDAAFQNVLTHVCLD", "molecular_type"=>"PRT", "sequence_length"=>"361", "id"=>"11877", "data_class"=>"STANDARD", "crc64"=>"2FAE88BA7E7140BC", "definition"=>"ATPase family AAA domain-containing protein 1.", "dt_sequence"=>"01-OCT-2002, sequence version 1.", "mw"=>"40744"}>
Accessions (AC line)
>> Entry.find_by_name("ATAD1_HUMAN").acs.map {|ac| ac.name }
=> ["Q8NBU5", "Q6P4B9", "Q8N3G1", "Q8WYR9", "Q969Y3"]
Keywords (KW line)
>> Entry.find_by_name("ATAD1_HUMAN").kws.map {|keyword| keyword.name }
=> ["ATP-binding", "Nucleotide-binding"]
Database-cross references (DR line)
>> Entry.find_by_name("ATAD1_HUMAN").drs.map {|x| x.db_name }
=> ["EMBL", "EMBL", "EMBL", "EMBL", "EMBL", "EMBL", "EMBL", "EMBL", "LinkHub", "Pfam", "InterPro", "InterPro", "InterPro", "Ensembl", "HSSP", "PROSITE", "SMART", "HGNC", "UniGene"]
Sequence
>> Entry.find_by_name("ATAD1_HUMAN").sequence
=> "MVHAEAFSRPLSRNEVVGLIFRLTIFGAVTYFTIKWMVDAIDPTRKQKVEAQKQAEKLMKQIGVKNVKLSEYEMSIAAHLVDPLNMHVTWSDIAGLDDVITDLKDTVILPIKKKHLFENSRLLQPPKGVLLYGPPGCGKTLIAKATAKEAGCRFINLQPSTLTDKWYGESQKLAAAVFSLAIKLQPSIIFIDEIDSFLRNRSSSDHEATAMMKAQFMSLWDGLDTDHSCQVIVMGATNRPQDLDSAIMRRMPTRFHINQPALKQREAILKLILKNENVDRHVDLLEVAQETDGFSGSDLKEMCRDAALLCVREYVNSTSEESHDEDEIRPVQQQDLHRAIEKMKKSKDAAFQNVLTHVCLD"
References
>> Entry.find_by_name("ATAD1_HUMAN").refs_count
=> 5
>> Entry.find_by_name("ATAD1_HUMAN").refs[0]   
=> #<Ref:0x27944a0 @rcs=[#<Rc:0x27932a8 @attributes={"text"=>"Pituitary", "token"=>"TISSUE", "id"=>"16335", "ref_id"=>"26323"}>], @attributes={"entry_id"=>"11877", "title"=>"A novel gene expressed in fetal normal pituitary.", "auther"=>"Liu F., Xu X.R., Qian B.Z., Xiao H., Chen Z., Han Z.", "id"=>"26323", "location"=>"Submitted (MAR-2001) to the EMBL/GenBank/DDBJ databases."}, @rps=[#<Rp:0x2793578 @attributes={"id"=>"31223", "ref_id"=>"26323", "comment"=>"NUCLEOTIDE SEQUENCE [MRNA]"}>], @rgs=[], @rxs=[]>
Comments (CC line)
>> Entry.find_by_name("ATAD1_HUMAN").ccs       
=> [#<Cc:0x26cd594 @attributes={"entry_id"=>"11877", "topic"=>"SIMILARITY", "id"=>"69666", "contents"=>"Belongs to the AAA ATPase family."}>]
Count Homo sapiens entries
>> Os.find_by_name("Homo sapiens")
=> #<Os:0x24940fc @attributes={"name"=>"Homo sapiens", "common_name"=>"(Human)", "id"=>"31"}>
>> Os.find_by_name("Homo sapiens").entries_count
=> 1701

uniprot_sprot.dat.gz and Rails

$ curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
$ rails uniprot -d mysql
$ cd uniprot

Then save uniprot/Rakefile, uniprot/config/database.yml, uniprot/db/migrate/001_create_entries.rb and uniprot/app/models/entry.rb.


Importing UniProt data into database

$ rake generate
$ rake db:migrate
$ rake import

After 20 hours,

$ script/console

Have fun !




uniprot/Rakefile

|  uniprot/Rakefile - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  uniprot/Rakefile - "aac".translate #=> "N"  uniprot/Rakefile - "aac".translate #=> "N" のブックマークコメント

# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.

require(File.join(File.dirname(__FILE__), 'config', 'boot'))

require 'rake'
require 'rake/testtask'
require 'rake/rdoctask'

require 'tasks/rails'


desc "import data"
task :import do
  require "#{RAILS_ROOT}/config/environment"
  require 'bio'
  require 'zlib'
  io = Zlib::GzipReader.open("../uniprot_sprot.dat.gz")

  Bio::FlatFile.open(io).each do |entry|
    print entry.entry_id
    if Entry.find_by_name(entry.entry_id)
      puts "\tskipped" 
      next
    else
      $stdout.sync = true
      print "\t." 
    end

    e = Entry.new(:name            => entry.entry_id,
                  :data_class      => entry.id_line['DATA_CLASS'].to_s,
                  :molecular_type  => entry.id_line['MOLECULE_TYPE'].to_s,
                  :sequence_length => entry.id_line['SEQUENCE_LENGTH'].to_i,
                  :dt_create       => entry.dt['created'].to_s,
                  :dt_sequence     => entry.dt['sequence'].to_s,
                  :dt_annotation   => entry.dt['annotation'].to_s,
                  :definition      => entry.de)
    e.sequence = entry.seq.to_s
    e.crc64    = entry.sq['CRC64'].to_s
    e.mw       = entry.sq['MW'].to_i    

    print "."    

    entry.accessions.each do |ac|
      e.acs << Ac.new(:name => ac)
    end

    print "."

    entry.os.each do |os|
      if o = Os.find_by_name(os['os']) 
      else o = Os.new(:name => os['os'], :common_name => os['name'].to_s)
      end
      e.oss << o
    end

    print "."

    entry.oc.each_with_index do |key, level|
      if o = Oc.find_by_name(key)
      else o = Oc.new(:name => key, :level => level)
      end
      e.ocs << o
    end

    print "."

    entry.ox.each do |db_name, accs|
      accs.each do |acc|
        if o = Ox.find(:first, :conditions => ["db_name = ? AND accession = ?", 
                                               db_name, acc])
        else o = Ox.new(:db_name => db_name, :accession => acc)
        end
        e.oxs << o
      end
    end

    e.gn = Gn.new
    entry.gn.each do |g|
      unless g.class == Hash
        cannonical_key = {'ORFNames' => :orfs, 'Name' => :name, 
                          'OrderedLocusNames' => :loci}
        g2 = {}
        g.to_s.split(';').map {|x| x.strip }.each do |ge|
          key, value = ge.split('=')
          g2[cannonical_key[key]] = value
        end
        g = g2
      end
      g[:synonyms] = [] unless g[:synonyms]
      g[:name] = '' unless g[:name]
      g[:loci] = [] unless g[:loci]
      g[:orfs] = [] unless g[:orfs]
      e.gn.name = g[:name]

      g[:synonyms].map do |synonym|
        e.gn.synonyms << GnSynonym.new(:synonym => synonym)
      end
      g[:loci].each do |locus|
        e.gn.loci << GnLocus.new(:locus => locus)
      end
      g[:orfs].map do |orf|
        e.gn.orf_names << GnOrfName.new(:name => orf)
      end
    end

    print "."

    entry.ref.each do |ref|
      r = Ref.new(:title => ref['RT'],
                  :auther => ref['RA'],
                  :location => ref['RL'])
    print "."
      ref['RG'].each do |rg|
        r.rgs << Rg.new(:name => rg)
      end
    print "."
      ref['RX'].each do |key, value|
        next if value == nil
        r.rxs << Rx.new(:name => key, :identifier => value)
      end
    print "."
      ref['RP'].each do |rp|
        r.rps << Rp.new(:comment => rp)
      end
    print "."
      ref['RC'].each do |rc|
        r.rcs << Rc.new(:token => rc['Token'], :text => rc['Text'])
      end
      e.refs << r
    end

    print "."

    entry.cc.each do |k, v|
      [entry.cc(k)].flatten.each do |value|
        e.ccs << Cc.new(:topic => k, 
                        :contents => value) 
      end
    end

    print "."

    entry.dr.each do |db_name, vs|
      vs.each do |v|
        e.drs << Dr.new(:db_name   => db_name, 
                        :entry_name => v[0], 
                        :content1   => v[1].to_s, 
                        :content2   => v[2].to_s, 
                        :content3   => v[3].to_s)
      end
    end

    print "."

    entry.kw.each do |key|
      if kw = Kw.find_by_name(key)
      else kw = Kw.new(:name => key) 
      end
      e.kws << kw
    end

    entry.ft.each do |name, fts|
      fts.each do |ft|
        e.fts << Ft.new(:name => name,
                        :from => ft['From'],
                        :to => ft['To'],
                        :description => ft['Description'],
                        :ftid => ft['FTId'])
      end
    end

    print "."

    e.save

    puts 'done'
  end
end

desc "mysql create database"
task :create do
  sh "mysqladmin5 -uroot drop uniprot_development"
  sh "mysqladmin5 -uroot create uniprot_development"
end


uniprot/config/database.yml

|  uniprot/config/database.yml - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  uniprot/config/database.yml - "aac".translate #=> "N"  uniprot/config/database.yml - "aac".translate #=> "N" のブックマークコメント

development:
   adapter: mysql
   database: uniprot_development
   username: root
   password:
   host: localhost 

uniprot/db/migrate/001_create_entries.rb

|  uniprot/db/migrate/001_create_entries.rb - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  uniprot/db/migrate/001_create_entries.rb - "aac".translate #=> "N"  uniprot/db/migrate/001_create_entries.rb - "aac".translate #=> "N" のブックマークコメント

class CreateEntries < ActiveRecord::Migration
  def self.up
    create_table(:entries) do |t|
      t.column(:name, :string)
      t.column(:data_class, :string)
      t.column(:molecular_type, :string)
      t.column(:sequence_length, :integer)
      t.column(:entry_type, :string)
      t.column(:dt_create, :string)
      t.column(:dt_sequence, :string)
      t.column(:dt_annotation, :string)
      t.column(:definition, :string)
      t.column(:sequence, :text)
      t.column(:mw, :integer)
      t.column(:crc64, :string)
    end
    add_index(:entries, :name)

    create_table(:acs) do |t|
      t.column(:entry_id, :string)
      t.column(:name, :string)
    end
    add_index(:acs, :name)
    add_index(:acs, :entry_id)

    create_table(:gns) do |t|
      t.column(:name, :string)
      t.column(:entry_id, :string)
    end
    add_index(:gns, :name)
    add_index(:gns, :entry_id)

    create_table(:gn_synonyms) do |t|
      t.column(:gn_id, :integer)
      t.column(:synonym, :integer)
    end
    add_index(:gn_synonyms, :synonym)
    add_index(:gn_synonyms, :gn_id)

    create_table(:gn_loci) do |t|
      t.column(:gn_id, :integer)
      t.column(:locus, :integer)
    end
    add_index(:gn_loci, :locus)
    add_index(:gn_loci, :gn_id)

    create_table(:gn_orf_names) do |t|
      t.column(:gn_id, :integer)
      t.column(:name, :integer)
    end
    add_index(:gn_orf_names, :name)
    add_index(:gn_orf_names, :gn_id)

    create_table(:entries_oss, :id => false) do |t|
      t.column(:entry_id, :integer)
      t.column(:os_id, :integer)
    end
    add_index(:entries_oss, :entry_id)
    add_index(:entries_oss, :os_id)

    create_table(:oss) do |t|
      t.column(:name, :string)
      t.column(:common_name, :string)
    end
    add_index(:oss, :name)

    create_table(:entries_ocs, :id => false) do |t|
      t.column(:entry_id, :integer)
      t.column(:oc_id, :integer)
    end
    add_index(:entries_ocs, :entry_id)
    add_index(:entries_ocs, :oc_id)

    create_table(:ocs) do |t|
      t.column(:level, :integer)
      t.column(:name, :string)
    end
    add_index(:ocs, :name)

    create_table(:entries_oxs, :id => false) do |t|
      t.column(:entry_id, :integer)
      t.column(:ox_id, :integer)
    end
    add_index(:entries_oxs, :entry_id)
    add_index(:entries_oxs, :ox_id)

    create_table(:oxs) do |t|
      t.column(:db_name, :string)
      t.column(:accession, :string)
    end
    add_index(:oxs, :db_name)
    add_index(:oxs, :accession)

    # references
    create_table(:refs) do |t|
      t.column(:entry_id, :integer)
      t.column(:title, :string)
      t.column(:auther, :string)
      t.column(:location, :string)
    end
    add_index(:refs, :location)
    add_index(:refs, :entry_id)
    create_table(:rxs) do |t|
      t.column(:name, :string)
      t.column(:identifier, :string)
      t.column(:ref_id, :integer)
    end
    add_index(:rxs, :name)
    create_table(:rgs) do |t|
      t.column(:name, :string)
      t.column(:ref_id, :integer)
    end
    add_index(:rgs, :name)
    create_table(:rps) do |t|
      t.column(:comment, :string)
      t.column(:ref_id, :integer)
    end
    add_index(:rps, :comment)
    create_table(:rcs) do |t|
      t.column(:token, :string)
      t.column(:text, :string)
      t.column(:ref_id, :integer)
    end
    add_index(:rcs, :token)

    create_table(:ccs) do |t|
      t.column(:topic, :string)
      t.column(:contents, :text)
      t.column(:entry_id, :string)
    end
    add_index(:ccs, :entry_id)
    add_index(:ccs, :topic)
    
    create_table(:drs) do |t|
      t.column(:entry_id, :string)
      t.column(:db_name, :string)
      t.column(:entry_name, :string)
      t.column(:content1, :string)
      t.column(:content2, :string)
      t.column(:content3, :string)
    end
    add_index(:drs, :entry_id)
    add_index(:drs, :db_name)
    add_index(:drs, :entry_name)

    create_table(:entries_kws, :id => false) do |t|
      t.column(:entry_id, :integer)
      t.column(:kw_id, :integer)
    end
    add_index(:entries_kws, :entry_id)
    add_index(:entries_kws, :kw_id)

    create_table(:kws) do |t|
      t.column(:name, :string)
    end
    add_index(:kws, :name)

    create_table(:fts) do |t|
      t.column(:entry_id, :string)
      t.column(:name, :string)
      t.column(:from, :string)
      t.column(:to, :string)
      t.column(:description, :string)
      t.column(:ftid, :string)
    end
    add_index(:fts, :entry_id)
    add_index(:fts, :name)
    add_index(:fts, :ftid)
  end

  def self.down
    drop_table :entries
    drop_table :acs
    drop_table :gns
    drop_table :gn_synonyms
    drop_table :gn_loci
    drop_table :gn_orf_names
    drop_table :ccs
    drop_table :drs
    drop_table :kws
    drop_table :fts
  end
end


uniprot/app/models/entry.rb

|  uniprot/app/models/entry.rb - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  uniprot/app/models/entry.rb - "aac".translate #=> "N"  uniprot/app/models/entry.rb - "aac".translate #=> "N" のブックマークコメント

class Entry < ActiveRecord::Base
  has_many :acs
  has_one :gn, :include => [:loci, :synonyms, :orf_names]
  has_and_belongs_to_many :oss,  :join_table => :entries_oss
  has_and_belongs_to_many :ocs
  has_and_belongs_to_many :oxs, :join_table => :entries_oxs
  has_many :refs, :include => [:rxs, :rgs, :rps, :rcs]
  has_many :ccs
  has_many :drs
  has_and_belongs_to_many :kws
  has_many :fts
end

class Ac < ActiveRecord::Base
  belongs_to :entry
end

class De < ActiveRecord::Base
  belongs_to :entry
end

class Gn < ActiveRecord::Base
  belongs_to :entry
  has_many :synonyms,  :table_name => 'GnSynonym', :class_name => 'GnSynonym'
  has_many :loci,      :table_name => 'GnLocus',   :class_name => 'GnLocus'
  has_many :orf_names, :table_name => 'GnOrfName', :class_name => 'GnOrfName'
end

class GnSynonym < ActiveRecord::Base
  belongs_to :gn
end

class GnLocus < ActiveRecord::Base
  set_table_name "gn_loci"
  belongs_to :gn
end

class GnOrfName < ActiveRecord::Base
  belongs_to :gn
end

class Os < ActiveRecord::Base
  set_table_name "oss"
  has_and_belongs_to_many :entries, :join_table => :entries_oss
end

class Oc < ActiveRecord::Base
  has_and_belongs_to_many :entries
end

class Ox < ActiveRecord::Base
  set_table_name "oxs"
  has_and_belongs_to_many :entries, :join_table => :entries_oxs
end

class Ref < ActiveRecord::Base
  belongs_to :entry
  has_many :rxs
  has_many :rps
  has_many :rcs
  has_many :rgs
end

class Rx < ActiveRecord::Base
  set_table_name 'rxs'
  belongs_to :ref
end

class Rg < ActiveRecord::Base
  belongs_to :ref
end

class Rp < ActiveRecord::Base
  belongs_to :ref
end

class Rc < ActiveRecord::Base
  belongs_to :ref
end

class Cc < ActiveRecord::Base
  belongs_to :entry
end

class Dr < ActiveRecord::Base
  belongs_to :entry
end

class Kw < ActiveRecord::Base
  has_and_belongs_to_many :entries
end

class Ft < ActiveRecord::Base
  belongs_to :entry
end