Hatena::Groupbioruby

"aac".translate #=> "N" このページをアンテナに追加 RSSフィード

2006-07-11UniProt on ActiveRecord

Modeling for UniProt/Knowledgebase Entry by ActiveRecord

|  Modeling for UniProt/Knowledgebase Entry by ActiveRecord - "aac".translate #=> "N" を含むブックマーク はてなブックマーク -  Modeling for UniProt/Knowledgebase Entry by ActiveRecord - "aac".translate #=> "N"  Modeling for UniProt/Knowledgebase Entry by ActiveRecord - "aac".translate #=> "N" のブックマークコメント

screenshot screenshot screenshot screenshot screenshot screenshot screenshot screenshot



Examples

Loading and find Entry by name (entry_id).
$ script/console
Loading development environment.
>> Entry.find_by_name("ATAD1_HUMAN")
=> #<Entry:0x241cf5c @attributes={"name"=>"ATAD1_HUMAN", "entry_type"=>nil, "dt_create"=>"13-SEP-2005, integrated into UniProtKB/Swiss-Prot.", "dt_annotation"=>"18-APR-2006, entry version 26.", "sequence"=>"MVHAEAFSRPLSRNEVVGLIFRLTIFGAVTYFTIKWMVDAIDPTRKQKVEAQKQAEKLMKQIGVKNVKLSEYEMSIAAHLVDPLNMHVTWSDIAGLDDVITDLKDTVILPIKKKHLFENSRLLQPPKGVLLYGPPGCGKTLIAKATAKEAGCRFINLQPSTLTDKWYGESQKLAAAVFSLAIKLQPSIIFIDEIDSFLRNRSSSDHEATAMMKAQFMSLWDGLDTDHSCQVIVMGATNRPQDLDSAIMRRMPTRFHINQPALKQREAILKLILKNENVDRHVDLLEVAQETDGFSGSDLKEMCRDAALLCVREYVNSTSEESHDEDEIRPVQQQDLHRAIEKMKKSKDAAFQNVLTHVCLD", "molecular_type"=>"PRT", "sequence_length"=>"361", "id"=>"11877", "data_class"=>"STANDARD", "crc64"=>"2FAE88BA7E7140BC", "definition"=>"ATPase family AAA domain-containing protein 1.", "dt_sequence"=>"01-OCT-2002, sequence version 1.", "mw"=>"40744"}>
Accessions (AC line)
>> Entry.find_by_name("ATAD1_HUMAN").acs.map {|ac| ac.name }
=> ["Q8NBU5", "Q6P4B9", "Q8N3G1", "Q8WYR9", "Q969Y3"]
Keywords (KW line)
>> Entry.find_by_name("ATAD1_HUMAN").kws.map {|keyword| keyword.name }
=> ["ATP-binding", "Nucleotide-binding"]
Database-cross references (DR line)
>> Entry.find_by_name("ATAD1_HUMAN").drs.map {|x| x.db_name }
=> ["EMBL", "EMBL", "EMBL", "EMBL", "EMBL", "EMBL", "EMBL", "EMBL", "LinkHub", "Pfam", "InterPro", "InterPro", "InterPro", "Ensembl", "HSSP", "PROSITE", "SMART", "HGNC", "UniGene"]
Sequence
>> Entry.find_by_name("ATAD1_HUMAN").sequence
=> "MVHAEAFSRPLSRNEVVGLIFRLTIFGAVTYFTIKWMVDAIDPTRKQKVEAQKQAEKLMKQIGVKNVKLSEYEMSIAAHLVDPLNMHVTWSDIAGLDDVITDLKDTVILPIKKKHLFENSRLLQPPKGVLLYGPPGCGKTLIAKATAKEAGCRFINLQPSTLTDKWYGESQKLAAAVFSLAIKLQPSIIFIDEIDSFLRNRSSSDHEATAMMKAQFMSLWDGLDTDHSCQVIVMGATNRPQDLDSAIMRRMPTRFHINQPALKQREAILKLILKNENVDRHVDLLEVAQETDGFSGSDLKEMCRDAALLCVREYVNSTSEESHDEDEIRPVQQQDLHRAIEKMKKSKDAAFQNVLTHVCLD"
References
>> Entry.find_by_name("ATAD1_HUMAN").refs_count
=> 5
>> Entry.find_by_name("ATAD1_HUMAN").refs[0]   
=> #<Ref:0x27944a0 @rcs=[#<Rc:0x27932a8 @attributes={"text"=>"Pituitary", "token"=>"TISSUE", "id"=>"16335", "ref_id"=>"26323"}>], @attributes={"entry_id"=>"11877", "title"=>"A novel gene expressed in fetal normal pituitary.", "auther"=>"Liu F., Xu X.R., Qian B.Z., Xiao H., Chen Z., Han Z.", "id"=>"26323", "location"=>"Submitted (MAR-2001) to the EMBL/GenBank/DDBJ databases."}, @rps=[#<Rp:0x2793578 @attributes={"id"=>"31223", "ref_id"=>"26323", "comment"=>"NUCLEOTIDE SEQUENCE [MRNA]"}>], @rgs=[], @rxs=[]>
Comments (CC line)
>> Entry.find_by_name("ATAD1_HUMAN").ccs       
=> [#<Cc:0x26cd594 @attributes={"entry_id"=>"11877", "topic"=>"SIMILARITY", "id"=>"69666", "contents"=>"Belongs to the AAA ATPase family."}>]
Count Homo sapiens entries
>> Os.find_by_name("Homo sapiens")
=> #<Os:0x24940fc @attributes={"name"=>"Homo sapiens", "common_name"=>"(Human)", "id"=>"31"}>
>> Os.find_by_name("Homo sapiens").entries_count
=> 1701

uniprot_sprot.dat.gz and Rails

$ curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
$ rails uniprot -d mysql
$ cd uniprot

Then save uniprot/Rakefile, uniprot/config/database.yml, uniprot/db/migrate/001_create_entries.rb and uniprot/app/models/entry.rb.


Importing UniProt data into database

$ rake generate
$ rake db:migrate
$ rake import

After 20 hours,

$ script/console

Have fun !