首页 oracle数据仓库最佳实践白皮书新

oracle数据仓库最佳实践白皮书新

举报
开通vip

oracle数据仓库最佳实践白皮书新BestpracticesforaDataWarehouseonOracleDatabase11gAnOracleWhitePaperSeptember2008BestPracticesforaDataWarehouseonOracleDatabase11gPage2NOTE:Thefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedi...

oracle数据仓库最佳实践白皮书新
BestpracticesforaDataWarehouseonOracleDatabase11gAnOracleWhitePaperSeptember2008BestPracticesforaDataWarehouseonOracleDatabase11gPage2NOTE:Thefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfunctionality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesorfunctionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.BestPracticesforaDataWarehouseonOracleDatabase11gPage3BestPracticesforaDataWarehouseonOracleDatabase11gNote:....................................................................................................................2ExecutiveSummary..........................................................................................4Introduction.......................................................................................................4BalancedConfiguration....................................................................................5Interconnect..................................................................................................6DiskLayout...................................................................................................7LogicalModel....................................................................................................9PhysicalModel.................................................................................................10Staginglayer.................................................................................................10EfficientDataLoading..........................................................................11Foundationlayer-ThirdNormalForm.................................................14Optimizing3NF.....................................................................................15Accesslayer-StarSchema........................................................................19OptimizingStarQueries........................................................................20SystemManagement.......................................................................................22WorkloadManagement..............................................................................22WorkloadMonitoring................................................................................26ResourceManager......................................................................................31OptimizerStatisticsManagement............................................................32InitializationParameter..............................................................................34Memoryallocation.................................................................................34ControllingParallelExecution.............................................................36EnablingefficientIOthroughput........................................................36StarQuery...............................................................................................37Conclusion........................................................................................................37BestPracticesforaDataWarehouseonOracleDatabase11gPage4BestpracticesforaDataWarehouseonOracleDatabase11gEXECUTIVESUMMARYIncreasinglycompaniesarerecognizingthevalueofanenterprisedatawarehouse(EDW).AtrueEDWprovidesasingle360-degreeviewofthebusinessandapowerfulplatformforawidespectrumofbusinessintelligencetasksrangingfrompredictiveanalysistonearreal-timestrategicandtacticaldecisionsupportthroughouttheorganization.InordertoensuringtheEDWwillgettheoptimalperformanceandwillscaleasyourdatasetgrowsyouneedtogetthreefundamentalthingscorrect,thehardwareconfiguration,thedatamodelandthedataloadingprocess.BydesigningthesethreecornerstonescorrectlyyoucanseamlesslyscaleoutyourEDWwithouthavingtoconstantlytuneortweakthesystem.INTRODUCTIONToday’sinformationarchitectureismuchmoredynamicthanitwasjustafewyearsago.BusinessesnowdemandmoreinformationsoonerandtheyaredeliveringanalyticsfromtheirEDWtoanevery-wideningsetofusersandapplicationsthaneverbefore.InordertokeepupwiththisincreaseindemandtheEDWmustnowbenearreal-timeandbehighlyavailable.Howdoyouknowifyourdatawarehouseisgettingthebestpossibleperformance?Orwhetheryou'vemadetherightdecisionstokeepyourmulti-TBsystemhighlyavailable?Basedonoveradecadeofsuccessfulcustomerdatawarehouseimplementationsthiswhitepaperprovidesasetofbestpracticesand“how-to”examplesfordeployingadatawarehouseonOracleDatabase11gandleveragingit’sbest-of-breedfunctionality.Thepaperisdividedintofoursections:Thefirstsectiondealswiththekeyaspectsofconfiguringyourhardwareplatformofchoicetoensureoptimalperformance.Thesecondbrieflydescribesthetwofundamentallogicalmodelsusedfordatabasewarehouses.ThethirdoutlineshowtoimplementthephysicalmodelfortheselogicalmodelsinthemostoptimalmannerinanOracledatabase.Finallythefourthsectioncoverssystemmanagementtechniquesincludingworkloadmanagementanddatabaseconfiguration.BestPracticesforaDataWarehouseonOracleDatabase11gPage5ThispaperisbynomeansacompleteguideforDataWarehousingwithOracle.YoushouldrefertotheOracleDatabase’sdocumentation,especiallytheOracleDataWarehouseGuideandtheVLDBandPartitioningGuide,forcompletedetailsonallofOracle’swarehousingfeatures.BALANCEDCONFIGURATIONRegardlessofthedesignorimplementationofadatawarehousetheinitialkeytogoodperformanceliesinthehardwareconfigurationused.ThishasneverbeenmoreevidentthanwiththerecentincreaseinthenumberofDataWarehouseappliancesinthemarket.ManydatawarehouseoperationsarebaseduponlargetablesscansandotherIO-intensiveoperations,whichperformvastquantitiesofrandomIOs.Inordertoachieveoptimalperformancethehardwareconfigurationmustbesizedendtoendtosustainthislevelofthroughput.Thistypeofhardwareconfigurationiscalledabalancedsystem.Inabalancedsystemallcomponents-fromtheCPUtothedisks-areorchestratedtoworktogethertoguaranteethemaximumpossibleIOthroughput.Buthowdoyougoaboutsizingsuchasystem?YoumustfirstunderstandhowmuchthroughputcapacityisrequiredforyoursystemandhowmuchthroughputeachindividualCPUorcoreinyourconfigurationcandrive.Bothpiecesofinformationcanbedeterminedfromanexistingsystem.However,ifnoenvironmentspecificvaluesareavailable,avalueofapproximately200MB/secIOthroughputpercoreisagoodplanningnumberfordesigningabalancedsystem.AllsubsequentcriticalcomponentsontheIOpath-theHostBusAdapters,fiberchannelconnections,theswitch,thecontroller,andthedisks–havetobesizedappropriately.DiskArray1DiskArray2DiskArray3DiskArray4DiskArray5DiskArray6DiskArray7DiskArray8FC-Switch1FC-Switch2HBA1HBA2HBA1HBA2HBA1HBA2HBA1HBA2Figure1Abalancesystem-4-nodeRACenvironmentFigure1showsaconceptualdiagramofa4-nodeRACsystem.Fourservers(eachwithonedualcoreCPU)areequippedwithtwohostbusadapters(HBAs).TheBestPracticesforaDataWarehouseonOracleDatabase11gPage6serversareconnectedto8diskarraysusing28-portswitches.NotethatthesystemneedstobeconfiguredsuchthateachnodeintheRACconfigurationcanaccessalldiskarrays.Usingourheuristiccalibrationnumberof200MB/secpercore,eachnodecandrive400MB/secofIO.Consequently,eachnodeneedstwo2GbitHBAs:thetwoHBAspernodecansustainathroughputofabout400MB/s(200MB/seach).Inordertosustainafulltablescanonall4nodes,thethroughputrequirementforeachofthefiberchannelswitchisabout4x200MB/s=800MB/s.ItisimportanttokeepinmindthatthereisonlyoneHBAfromeachnodeconnectedtoeachswitch.Ontheothersideeachswitchserves4diskarrays.Eachfiberchannelconnectiontothediskarrayiscapableofdelivering200MB/s,thereforethemaximumthroughputfromthediskarrayintotheswitchisalso800MB/sperswitch.Whensizingthediskarrayonemustensurethedisksandcontrollerscansustainatotalthroughputof200MB/s.Thisisveryimportantastoday’sdiskdrivescontinuetoincreaseinstoragecapacitywithouttherequisiteincreaseinspeed.MakesureyourstoragevendorsizesyourdiskarrayforthroughputandnotjustIOoperationpersecond(IOPS)orcapacity(TB).InterconnectWhenrunningadatawarehouseonaRealApplicationCluster(RAC)itisjustasimportanttosizetheclusterinterconnectwiththesamecareandcautionyouwouldusefortheIOsubsystemthroughput.Theruleofthumbforsizingtheinterconnectisithastosustain200MB/spercore.Whysomuch?Thereasonforthisissimple;oncedataisreadoffofthedisksforagivenqueryitwillresideinprocessmemoryononeofthenodesinthecluster,shouldanotherprocessonadifferentnoderequiresomeorallofthatdatatocompletethisquery,itwillrequestthedatatobepassedovertheinterconnectratherthanbeingreadagainfromdisk(MoredetailsonthisintheParallelQuerysection).IftheinterconnectbandwidthisnotequaltothediskIObandwidthitwillbecomeamajorbottleneckforthequery.Dependingonthenumberofcoresyouhavepernode,andthetotalnumberofnodes,youmayneedtouseInfiniBandratherthanmultipleGibabitEthernetcardsfortheinterconnecttoensureitcansustainthedesiredthroughput.InfiniBandprovideabettersolutionforlargerscalesystemsasitconsumelessCPUpermessagesent/received.Evenforthesmall4nodesystemshowinFigure1theinterconnectneedstobeabletosustain3.2GB/sec(approximately800MB/spernode)toscalelinearlyforoperationsinvolvinginter-nodeparallelexecution.AsingleInfiniBandcardwouldbeabletosustainthisthroughput.However,iftheinterconnectwasbuiltusing4GbEthernetcardsyouwouldneedatleastfourcardspernodetosustainthethroughput.BestPracticesforaDataWarehouseonOracleDatabase11gPage7DiskLayoutOnceyouhaveconfirmedthehardwareconfigurationhasbeensetupasabalancedsystemthatcansustainyourrequiredthroughputyouneedtofocusonyourdisklayout.Oneofthekeyproblemsweseewithexistingdatawarehouseimplementationstodayispoordiskdesign.OftentimeswewillseealargeEnterpriseDataWarehouse(EDW)residingonthesamediskarrayasoneormoreotherapplications.ThisisoftendonebecausetheEDWdoesnotgeneratethenumberofIOPStosaturatethediskarray.WhatisnottakenintoconsiderationinthesesituationsisthefactthattheEDWwilldofewer,largerIOs,whichwilleasilyexceedthediskarraysthroughputcapabilitiesintermsofgigabytespersecond.Ideallyyouwantyourdatawarehousetoresideonitsownstoragearray(s).Whenconfiguringthestoragesubsystemforadatawarehouseitshouldbesimple,efficient,highlyavailableandveryscalable.Itshouldnotbecomplicatedorhardtoscaleout.OneoftheeasiestwaystoachievethisistoapplytheS.A.M.E.methodology(StripeandMirrorEverything).S.A.M.E.canbeimplementedatthehardwarelevelorbyusingASM(AutomaticStorageManagement–acapability1stintroducedinOracleDatabase10g)orbyusingacombinationofboth.ThispaperwillonlydealwithimplementingS.A.M.EusingacombinationofhardwareandASMfeatures.Fromthehardwareperspective,buildredundancyinbyimplementingmirroringatthestoragesubsystemlevelusingRAID1.OncetheRAIDgroupshavebeencreateyoucanturnthemovertoASM.ASMprovidesfilesystemandvolumemanagercapabilities,builtintotheOracleDatabasekernel.TouseASMfordatabasestorage,youmustfirstcreateanASMinstance.OncetheASMinstanceisstarted,youcancreateyourASMdiskgroups.Adiskgroupisasetofdiskdevices,whichASMmanagesasasingleunit.AdiskgroupiscomparabletoaLVM’s(LogicalVolumeManager)volumegrouporastoragegroup.Eachdiskinthediskgroupshouldhavethesamephysicalcharacteristicsincludingsizeandspeed,asASMspreadsdataevenlyacrossallofthedevicesinthediskgrouptooptimizeperformanceandutilization.Ifthedevicesofasinglediskgrouphavedifferentphysicalcharacteristicitispossibletocreateartificialhotspotsorbottlenecks,soitisimportanttoalwaysusesimilardevicesinadiskgroup.ASMusesa1MBstripesizebydefault.ASMincreasesdatabaseavailabilitybyprovidingtheabilitytoaddorremovediskdevicesfromdiskgroupswithoutshuttingdowntheASMinstanceorthedatabasethatusesit.ASMautomaticallyrebalancesthedata(files)acrossthediskgroupafterdiskshavebeenaddedorremoved.Thiscapabilityallowsyoutoseamlesslyscaleoutyourdatawarehouse.ForadatawarehouseenvironmentaminimumoftwoASMdiskgroupsarerecommend,onefordata(DATA)andonefortheFlashRecoverArea(FRA).ForthesakeofsimplicitythefollowingexamplewillonlyfocusoncreatingtheDATAdiskgroup.InthesimpleRACenvironmentinFigure1eachdiskarrayhas16TipsforabalancedsystemTotalthroughput=#coresX200MBUse1HBAportperCPUUse1diskcontrollerperHBAPortMaxof10physicaldiskspercontrollerUsemoresmallerdrives(146GBor300GB)Useminimumof4GBofMemorypercoreUseRAID1withASMInterconnectbandwidth=IOsubsystembandwidthBestPracticesforaDataWarehouseonOracleDatabase11gPage8physicaldisks.Fromthese16disks,eightRAID1groupswillbecreated(seeFigure2).LUNA1LUNA2LUNA3LUNA4LUNA5LUNA6LUNA7LUNA8Disk12Disk8Disk16Disk4Disk11Disk7Disk15Disk3Disk10Disk6Disk14Disk2Disk9Disk5Disk13Disk1Figure2EightRaid1groupsarecreatedfromthe16rawdisksOncetheRAIDgroupshavebeenestablishedtheASMdiskgroupDATAcanbecreated.TheASMdiskgroupwillcontaintheeightRAID1pairs.WhenatablespaceiscreatedASMwillallocatespaceinthediskgroupin1MBchunksorallocationunitsinaroundrobinfashion.ASMstartseachspaceallocationwithadifferentdiskandusesadifferentrandomordereachtimetoensuredataisevenlydistributedacrossalldisks(seeFigure3).LUNA1LUNA2LUNA2LUNA3LUNA3LUNA4LUNA4LUNA5LUNA5LUNA6LUNA6LUNA7LUNA7LUNA8LUNA8Disk12Disk12Disk8Disk8Disk16Disk16Disk4Disk4Disk11Disk11Disk7Disk7Disk15Disk15Disk3Disk3Disk10Disk10Disk6Disk6Disk14Disk14Disk2Disk2Disk9Disk9Disk5Disk5Disk13Disk13Disk1Disk1ASMstripe112345678812345677ASMDiskgroupLUNA1LUNA1LUNA2LUNA2LUNA3LUNA3LUNA4LUNA4LUNA5LUNA5LUNA6LUNA6LUNA7LUNA7LUNA8LUNA8Figure3ASMdiskgroupcontains8RAID1groups&spaceisallocatedin1MBchunksBestPracticesforaDataWarehouseonOracleDatabase11gPage9LOGICALMODELThedistinctionbetweenalogicalmodelandaphysicalmodelissometimesconfusing.Inthispaperalogicalmodelforadatawarehousewillbetreatedmoreasaconceptualorabstractmodel,amoreideologicalviewofwhatthedatawarehouseshouldbe.ThephysicalmodelwilldescribehowthedatawarehouseisactuallybuiltinanOracledatabase.Alogicalmodelisanessentialpartofthedevelopmentprocessforadatawarehouse.Itallowsyoutodefinethetypesofinformationneededinthedatawarehousetoanswerthebusinessquestionsandthelogicalrelationshipsbetweendifferentpartsoftheinformation.Itshouldbesimple,easilyunderstoodandhavenoregardforthephysicaldatabase,thehardwarethatwillbeusedtorunthesystemorthetoolsthatenduserswillusetoaccessit.Therearetwoclassicmodelsusedfordatawarehouse,ThirdNormalFormanddimensionalorStarSchema.ThirdNormalForm(3NF)isaclassicalrelational-databasemodellingtechniquethatminimizesdataredundancythroughnormalization.A3NFschemaisaneutralschemadesignindependentofanyapplication,andtypicallyhasalargenumberoftables.Itpreservesadetailedrecordofeachtransactionwithoutanydataredundancyandallowsforrichencodingofattributesandallrelationshipsbetweendataelements.Userstypicallyrequireasolidunderstandingofthedatainordertonavigatethemoreelaboratestructurereliably.TheStarSchemaissocalledbecausethediagramresemblesastar,withpointsradiatingfromacenter.Thecenterofthestarconsistsofoneormorefacttablesandthepointsofthestararethedimensiontables.Figure4Star-oneormorefacttablessurroundedbymultipledimensiontablesFacttablesarethelargetablesthatstorebusinessmeasurementsandtypicallyhaveforeignkeystothedimensiontables.Dimensiontables,alsoknownaslookuporreferencetables,containtherelativelystaticordescriptivedatainthedatawarehouse.TheStarSchemabordersonaphysicalmodel,asdrillpaths,hierarchyandqueryprofileareembeddedinthedatamodelitselfratherthanthedata.ThisinBestPracticesforaDataWarehouseonOracleDatabase11gPage10partatleast,thisiswhatmakesnavigationofthemodelsostraightforwardforendusers.Thereisoftenmuchdiscussionregardingthe‘best’modelingapproachtotakeforanygivenDataWarehousewitheachstyle,classic3NFordimensionalhavingtheirownstrengthsandweaknesses.ItislikelythatNextGenerationDataWarehouseswillneedtodomoretoembracethebenefitsofeachmodeltyperatherthanrelyonjustone-thisistheapproachthatOracleadoptinourDataWarehouseReferenceArchitecture.Thisisalsotrueofthemajorityofourcustomerswhouseamixtureofbothmodelforms.Mostimportantisforyoutodesignyourmodelaccordingtoyourspecificbusinessneeds.PHYSICALMODELThestartingpointforthephysicalmodelisthelogicalmodel.Thephysicalmodelshouldmirrorthelogicalmodelasmuchaspossible,althoughsomechangesinthestructureofthetablesand/orcolumnsmaybenecessary.Inadditionthephysicalmodelwillincludestagingormaintancetablesthatareusuallynotincludedinthelogicalmodel.Figure5belowshowsablueprintofthephysicallayerswedefineinourDWReferenceArchitectureandseeinmanydatawarehouseenvironments.Althoughyourenvironmentmaynothavesuchclearlydefinedlayersyoushouldhavesomeaspectsofeachlayerinyourdatabasetoensureitwillcontinuetoscaleasitincreasesinsizeandcomplexity.Figure5PhysicallayersofaDataWarehouseStaginglayerThestaginglayerenablesthespeedyextraction,transformationandloading(ETL)ofdatafromyouroperationalsystemsintothedatawarehousewithoutdistributinganyofthebusinessusers.Itisinthislayerthemuchofthecomplexdatatransformationanddata-qualityprocessingwilloccur.Thetablesinthestaginglayerarenormallysegregatedfromthe“live”datawarehouse.Themostbasicapproachforthestaginglayeristohaveitbeanidenticalschematotheonethatexistsinthesourceoperationalsystem(s)butwithsomestructuralchangestothetables,suchasrangepartitioning.Itisalsopossiblethatinsomeimplementationsthislayerisnotnecessary,asalldatatransformationprocessingwillbedone“onthefly”asdataisextractedfromthesourcesystembeforeitisinserteddirectlyintoBestPracticesforaDataWarehouseonOracleDatabase11gPage11theFoundationLayer.Eitherwayyouwillstillhavetoloaddataintothewarehouse.EfficientDataLoadingWhetheryouareloadingintoastagelayerordirectlyintothefoundationlayerthegoalshouldbethesame,getthedataintothewarehouseinthemostefficientandexpedientmanner.Oracleoffersseveraldataloadingoptions•ExternaltableorSQL*Loader•OracleDataPump(import&export)•ChangeDataCaptureorOraclestreamsfortricklefeeds•OracleTransparentGatewaysWhichapproachshouldyoutake?Obviouslythiswilldependonthesourceandformatofthedatayoureceive.Inthispaperwewilldealwiththeloadingofdatafromflatfiles.IfyouareloadingfromflatfilesintoOracleyouhavetwooptions,SQL*Loaderorexternaltables.WestronglyrecommendthatyouloadusingexternaltablesratherthanSQL*Loader.WhenSQL*Loaderisusedtoloaddatainparallel,thedataisloadedintotemporaryextents,onlywhenthetransactioniscommittedarethetemporaryextentsmergedintotheactualtable.Anyexistingspaceinpartiallyfullextentsinthetablewillbeskipped.Forhighlypartitionedtablesthiscouldpotentiallyleadtoalotofwastedspace.ExternalTablesOracle’smostsophisticatedapproachtoloadingflatfilesisthroughtheuseofexternaltables.Anexternaltableallowsyoutoaccessdatainexternalsources(flatfile)asifitwereinatableinthedatabase.ThismeansthatexternalfilescanbequerieddirectlyandinparallelusingthefullpowerofSQL,PL/SQL,andJava.Anexternaltableiscreatedusingthestandardcreatetablesyntaxexceptitrequiresanadditionalclause.ThefollowingSQLcommandcreatesanexternaltablefortheflatfile‘sales_data_for_january.dat’.CREATETABLEext_tab_for_sales_data(PriceNUMBER(6),QuantityNUMBER(6),Time_idDATE,Cust_idNUMBER(12),Prod_idNUMBER(12))ORGANIZATIONEXTERNAL(TYPEoracle_loaderDEFAULTDIRECTORYadminACCESSPARAMETERSBestPracticesforaDataWarehouseonOracleDatabase11gPage12(RECORDSDELIMITEDBYnewlineBADFILE'ulcase1.bad'LOGFILE'ulcase1.log'FIELDSTERMINATEDBY","(PriceINTEGEREXTERNAL(6),QunantityINTEGEREXTERNAL(6),Time_idDATE)LOCATION(sales_data_for_january.dat))REJECTLIMITUNLIMITED;ThemostcommonapproachwhenloadingdatafromanexternaltableistodoaCreateTableAsSelect(CTAS)statementoranInsertAsSelect(IAS)statementintoanexistingtable.ForexamplethesimpleSQLstatementbelowwillinsertalloftherowsinaflatfileintopartitionp2oftheSalesfacttable.InsertintoSalespartition(p2)Select*Fromext_tab_for_sales_data;DirectPathLoadThekeytogoodloadperformanceistousedirectpathloadwhereverpossible.Adirectpathloadparsestheinputdataaccordingtothedescriptiongivenintheexternaltabledefinition,convertsthedataforeachinputfieldtoitscorrespondingOraclecolumndatatype,andbuildsacolumnarraystructure.ThesecolumnarraystructuresarethenusedtoformatOracledatablocksandbuildindexkeys.Thenewlyformatteddatabaseblocksarethenwrittendirectlytothedatabase(multipleblocksperI/OrequestusingasynchronouswritesifthehostplatformsupportsasynchronousI/O)bypassingthedatabasebuffercache.ACTASwillalw
本文档为【oracle数据仓库最佳实践白皮书新】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
个人认证用户
诺诺
暂无简介~
格式:pdf
大小:3MB
软件:PDF阅读器
页数:0
分类:房地产
上传时间:2020-02-19
浏览量:0