首页 SAMtool FAQ

SAMtool FAQ

举报
开通vip

SAMtool FAQSAMtoolFAQSAMFAQContents[hide]l1AbouttheSAM/BAMFormatl1.1HowcanIgetalignmentsintheSAM/BAMformat?l1.2HowunalignedreadsarestoredinSAM?l1.3CIGARis`50M',butIseemismatchesinthealignment.l2AboutSAMtoolsl2.1HowtoconvertSAMtoBAM?l2.2IwanttocallSNPsandshortindels.l2.3I...

SAMtool FAQ
SAMtoolFAQSAMFAQContents[hide]l1AbouttheSAM/BAMFormatl1.1HowcanIgetalignmentsintheSAM/BAMformat?l1.2HowunalignedreadsarestoredinSAM?l1.3CIGARis`50M',butIseemismatchesinthealignment.l2AboutSAMtoolsl2.1HowtoconvertSAMtoBAM?l2.2IwanttocallSNPsandshortindels.l2.3IwanttocallSNPsfromonechromosomeonly.l2.4TheintegerFLAGfieldisnotfriendlytoeyes.l2.5Idonotunderstandthecolumnsinthepileupoutput.l2.6Isee`*'inthepileupsequencecolumn.Whatarethey?l2.7Ionlywanttouseasubsetofalignmentsinpileup.l2.8DoessamtoolsgeneratetheconsensussequencelikeMaq?l2.9Iwanttoget`unique'alignmentsfromSAM/BAM.l2.10Inrepetitiveregions,SAMtoolscallallbasesas'A'althoughthereareno'A'basesinreads.l2.11HowareSNPsandindelscalledandfilteredbySAMtools?l2.12TheWindowsversionofSAMtoolsdoesnotworksometimes.l3ForDevelopersl3.1Howtomakemyalignerworkbestwithsamtools?l3.2Whymappingquality?AbouttheSAM/BAMFormatHowcanIgetalignmentsintheSAM/BAMformat?ManyalignmentprogramsgenerateSAM/BAMnativelyoroutputaformatthatcanbeconvertedtoSAM/BAM.Pleasecheckoutthispageforthecompletelist.Ifyourpreferredsoftwareisnotinthislist,youmaycontactthedevelopersorwriteyourown,andthenpleaseletusknow.HowunalignedreadsarestoredinSAM?Anunalignedreadsmustbeflaggedwith0x4.Itmayhavenocoordinate(i.e.acoordinate`*:0'),butmayhaveanordinarycoordinatewiththeCIGARfieldsetto`*'.InSAM,ifonereadinareadpairisalignedbutthemateisnot,westronglyrecommendtosetthecoordinateoftheunmappedreadthesameasthatofthemappedonesuchthatinapositionsortedSAM/BAMfile,theunmappedreadisadjacenttothemapped.Thisconventiongreatlyhelpslocalassemblywhenwewanttocollectallrelatedreadsinasmallregion.CIGARis`50M',butIseemismatchesinthealignment.CIGARoperation`M'means`alignmentmatch'(i.e.notagap).Itmaybea`sequencematch'ora`sequencemismatch'.Mismatchinginformationisstoredinthe`MD'tagwhichisoptionalbutcanbegeneratedwiththe`calmd'samtoolscommand.WeareproposingnewCIGARoperations`='forsequencematchand`X'forsequencemismatch,buttheyarenotwellsupportedbysamtools.AboutSAMtoolsHowtoconvertSAMtoBAM?IfyourSAMfilehasheader@SQlines,youmaygetBAMby:samtoolsview-bSaln.sam>aln.bamIfnot,youneedtohaveyourreferencefileref.faandthendothis:samtoolsfaidxref.fasamtoolsview-btref.fa.faialn.sam>aln.bamThesecondmethodalsoworksifyourSAMfilehas@SQlines.Afterconversion,youwouldprobablyliketosortandindexthealignmenttoenablefastrandomaccess:samtoolssortaln.bamaln-sortedsamtoolsindexaln-sorted.bamIwanttocallSNPsandshortindels.Forashortanswer,dothis:samtoolspileup-vcfref.faaln.bam|teeraw.txt|samtools.plvarFilter-D100>flt.txtawk'($3=="*"&&$6>=50)||($3!="*"&&$6>=20)'flt.txt>final.txtForalonganswer,seethisprotocol.Pleasealwaysremembertosetthemaximumdepth(-D)infiltering.IwanttocallSNPsfromonechromosomeonly.Indexyouralignmentwiththe`index'commandand:samtoolsview-ualn.bamchr10|samtoolspileup-vcfref.fa->chr10.raw.txtPleasereadthispageformoreinformation.TheintegerFLAGfieldisnotfriendlytoeyes.YoumaygetstringFLAGby:samtoolsview-Xaln.bam|less-SFormoreinformation,pleasecheckout:samtoolsview-?Idonotunderstandthecolumnsinthepileupoutput.Thisisexplainedinthemanualpage.Orbriefly(whenyouinvokepileupwiththe-coption):1.referencesequencename2.referencecoordinate3.referencebase,or`*'foranindelline4.genotypewhereheterozygotesareencodedintheIUBcode:M=A/C,R=A/G,W=A/T,S=C/G,Y=C/TandK=G/T;indelsareindicatedby,forexample,*/+A,-A/*or+CC/-C.Thereisnodifferencebetween*/+Aor+A/*.5.Phred-scaledlikelihoodthatthegenotypeiswrong,whichisalsocalled`consensusquality'.6.Phred-scaledlikelihoodthatthegenotypeisidenticaltothereference,whichisalsocalled`SNPquality'.SupposethereferencebaseisAandinalignmentwesee17Gand3A.WewillgetalowconsensusqualitybecauseitisdifficulttodistinguishanA/GheterozygotefromaG/Ghomozygote.WewillgetahighSNPquality,though,becausetheevidenceofaSNPisverystrong.7.rootmeansquare(RMS)mappingquality8.#readscoveringtheposition9.readbasesataSNPline(checkthemanualpageformoreinformation);the1stindelalleleotherwise10.basequalityataSNPline;the2ndindelalleleotherwise11.indellineonly:#readsdirectlysupportingthe1stindelallele12.indellineonly:#readsdirectlysupportingthe2ndindelallele13.indellineonly:#readssupportingathirdindelalleleIfpileupisinvokedwithout`-c',indellinesandcolumnsbetween3and7inclusivewillnotbeoutputted.Isee`*'inthepileupsequencecolumn.Whatarethey?Astaratthesequencecolumnrepresentsadeletion.Itisaplaceholdertomakesurethenumberofbasesatthatcolumnmatchesthereaddepthcolumn.Simplyignore`*'ifyoudonotusethisinformation.Ionlywanttouseasubsetofalignmentsinpileup.Ifyouwanttofilteronmappingquality,flags,onereadgrouporonelibrary,youmayjustusetheviewcommand.Ifwanttoapplymorecomplexfilters,youmaywriteanawkcommandforSAM.Forexample,Ionlywanttousealignmentwithtwoorfewerdifferences(mismatches+gaps):samtoolsview-haln.bam|perl-ne'printif(/^@/||(/NM:i:(\d+)/&&$1<=2))'|samtoolspileup-S->out.txtorexcludeallgappedalignments:samtoolsview-haln.bam|awk'$6!~/[ID]/'|samtoolspileup-S-DoessamtoolsgeneratetheconsensussequencelikeMaq?Yes.Trythis:samtoolspileup-cfref.faaln.bam|samtools.plpileup2fq-D100>cns.fastqAgain,remembertoset-Daccordingtoyourreaddepth.Notethatpileup2fqappliesfewerfiltersincomparisontovarFilter,andyoumayseetinyinconsistencybetweenthetwooutputs.Iwanttoget`unique'alignmentsfromSAM/BAM.Weprefertosayanalignmentis`reliable'ratherthan`unique'as`uniqueness'isnotwelldefinedingeneralcases.Youcangetreliablealignmentsbysettingathresholdonmappingquality:samtoolsview-bq1aln.bam>aln-reliable.bamYoumaywanttosetamorestringentthresholdtogetmorereliablealignments.Inrepetitiveregions,SAMtoolscallallbasesas'A'althoughthereareno'A'basesinreads.ThisisduetoafloatingunderflowintheMAQSNPcallingmodelusedbydefaultandonlyhappensinrepetitiveregions.Thesecallsarealwaysfilteredout.However,ifyouareuncomfortablewiththis,youmayusethesimplifiedSOAPsnpmodelwith:samtools-avcfref.faaln.bam>raw.txtTheMAQmodelandSOAPsnpmodelusuallydeliververysimilarSNPcalls.HowareSNPsandindelscalledandfilteredbySAMtools?Bydefault,SNPsarecalledwithaBayesianmodelidenticaltotheoneusedinMAQ.AsimplifiedSOAPsnpmodelisimplemented,too.IndelsarecalledwithasimpleBayesianmodel.Thecallerdoeslocalrealignmenttorecoverindelsthatoccurattheendofareadbutappeartobecontiguousmismatches.Foranexample,seethispicture.ThevarFilterfiltersSNPs/indelsinthefollowingorder:▪d:lowdepth▪D:highdepth▪W:toomanySNPsinawindow(SNPonly)▪G:closetoahigh-qualityindel(SNPonly)▪Q:lowroot-mean-square(RMS)mappingquality(SNPonly)▪g:closetoanotherindelwithmoreevidence(indelonly)ThefirstletterindicatesthereasonwhySNPs/indelsarefilteredwhenyouinvokevarFilterwiththe`-p'option.ASNP/indelfilteredbyarulehigherinthelistwillnotbetestedagainstotherrules.TheWindowsversionofSAMtoolsdoesnotworksometimes.WearesorrythatthisisduetobugsintheWindowsport.TheWindowsversionismainlymeanttobeacross-platformviewer.Mostofsamtoolsfunctionalityarenottested.Forheavyuseofsamtools,pleaserunitonLinuxmachinesinstead.ForDevelopersHowtomakemyalignerworkbestwithsamtools?TogetthebestperformanceinSNPcalling,werecommendthefollowingrules.▪Trytogenerateallthemandatoryfields,inparticularthematecoordinatesandinsertsize.Samtools'rmdupreliesontheISIZEfieldandPicardMarkDuplicatesrequiresthematecoordinates.Onemayusethefixmatecommandafterward,butthatisveryinefficient.▪Choosearandompositionforarepetitiveread.Ifanalignerdiscardsrepetitivereads,thereaddepthwillbeinaccurate,whichmaycauseproblemsinfilteringSNPs.▪Writemappingquality.ItisrecommendedtocomputemappingqualityassamtoolsSNPcallermaytakeadvantageofthisinformation.However,computingmappingqualityrequiresanalignertolookintosuboptimalhitsandthusslowsdownalignment.Ifyouralignercannotdothis,write0forrepetitivereadsand60for`unique'reads.Whymappingquality?Theplotbelowshowsalignmentaccuracyfor108bpsimulatedreadsunderdifferentconfigurationsofBWA.Ifweonlyretain`unique'alignment,wegetasinglespotungap-se-unqiuewhichcorrespondsto~2300wrongalignmentsoutof1.68millionmappedreads.IfwelookatthemappingqualitygeneratedbyBWAandsetastringentthresholdonthat,itispossibletogetanaccuracyof400/1.67M(theungap-seline).Thatissayingweget>80%fewerfalsealignmentsatthecostof1%lossinsensitivity.Settingahigherthresholdfurtherreducesfalsealignmentsandhelpstoreducenoisesinidentifyingstructuralvariationsbridginguniqueregions.Theplotmayvarywiththealignerinuse,butitisgenerallytruethatanalgorithmseeingmoresuboptimalalignmentsismoreaccurate.
本文档为【SAMtool FAQ】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_321635
暂无简介~
格式:doc
大小:57KB
软件:Word
页数:0
分类:
上传时间:2021-06-24
浏览量:2