Archive for November, 2009

This is my SQLPASS :

Sessions

Day 0 – Pre-Conference

  • Building a Microsoft Data Warehousing Platform by Brian Knight – excellent introduction to data cleansing, warehousing, and even data mining!

Day 1

  • Keynote
  • Spotlight: SSRS in the Enterprise – How to Survive with a Smile (Wayne Snyder)
  • To Partition or Not To Partition – The Questions … and the answers (Don Vilen)
  • SQL Server Execution Plans from Compilation to Caching to Reuse (Maciej Pilecki)
  • Designing and Building Private SQL Server Clouds

Day 2

  • Quest Software Breakfast Presentation about DMVs (Kevin Kline – @kekline, Brent Ozar – @brento, Tim Ford @sqlagentman, Buck Woody @buckwoody, Louis Davidson @drsql) – Wish there was another Quest breakfast for tomorrow
  • Keynote
  • Applied SSIS Design Patterns (Andy Leonard @AndyLeonard)
  • DRP101: Learn the Difference Between Your Log and Your Cluster (Brent Ozar @brento)
  • Women in Technology Luncheon sponsored by CA
  • Adding SSRS Report Bells and Whistles (Jessica Moss @jessicamoss)
  • Security and Deployment with Reporting Services 2008 R2 (Lukasz Pawlowski)

Day 3

  • Keynote
  • and highlight of the week: got invited to the bloggers table! Thanks Grant (Grant Fritchey – @GFritchey) and Brent (Brent Ozar – @brento)
  • said hi to Peter de Betta (@debettap)
  • Advanced Techniques for SSIS Package Development (David Noor)
  • Super Reports:Patterns and Recipes (Paul Turley) — and at the same time watching hilarious tweets from the @buckwoody comedy hour .. err session
  • SQLCAT: SharePoint on SQL Server – Implementation, Configuration and Tuning (Burzin Patel) — and still watching back and forth hilarious tweets from the @buckwoody comedy hour .. err session
  • Storage for the DBA (Denny Cherry @mrdenny)

Last Day – Post-Conference

  • with Itzik Ben-Gan on Query Tuning and Indexing for SQL Server 2005 and 2008 (Grant did warn that “bleeding in the ears” is normal, and I have to say, he wasn’t kidding!)

There are many sessions I would have wanted to attend, but they all happened at the same time. Which justifies why Im getting the DVD :) And – all of the presentations have been great!

Inspiration

Additional highlights of my first SQLPASS:

  • I mentioned it already, but again worth mentioning – highlight of the week: got invited to the bloggers table! Thanks Grant (Grant Fritchey – @GFritchey) and Brent (Brent Ozar – @brento)
  • Kevin Kline (@kekline) remembering my name! :) (his first recollection was “ninja.. you’re with the ninjas…”
  • Quest Software Breakfast Presentation was great
  • Women in Technology Luncheon was (is) very inspiring

SQLPASS Twitter Connection

And, even if this is the first SQLPASS I attended, I think it has been one of the coolest so far (based on remarks I heard). Bloggers and “tweeps” have made their mark in this conference. And some of “tweeps” I have met so far… (the pleasure and privilege is mine!) ~ I’m @sqlbelle

  • @brento
  • @kekline
  • @GFritchey
  • @brianknight
  • @AndyLeonard
  • @wendy_dance
  • @sqlfool
  • @pinaldave
  • @joewebb
  • @bknight
  • @peshkaj
  • @stuarta
  • @lotsahelp
  • @sqldba
  • @paschott
  • @mike_walsh
  • @DamonRipper
  • @SQLDBA_Dude
  • @SQLSocialite
  • @GlennAlanBerry
  • @llangit
  • @SQLServerMag

It is very inspiring to meet a lot of these people. If you meet them, you’ll know right away how passionate they are about their craft, about learning, about their dedication to share their knowledge and help others. It is very humbling.

Swags

And .. thanks to the vendors for the swags :)

  • Quest – for the free signed Database Benchmarking book (thanks Kevin Kline!) – plus the twitter shirt!
  • Red Gate for the free books on Exceptional DBA and DBA Tacklebox
  • Idera – I got to spin twice .. got a $1 and a cup holder. But it was good to see our buddy David Terry!

Lessons Learned

Aaron Bertrand mentioned in his post Another PASS has come and gone – “If you didn’t make it to PASS this year, I feel bad for you.” ~ in some ways I do too.

We have our own reasons for not being able to attend conferences such as SQLPASS, but after attending my first PASS, I realize now how much I have been missing. I feel I’ve gotten SO MUCH out of this year’s SQLPASS conference (networking, learning, just being able to feel “at home” in the company of people who love the same technologies and work you love), and I will try every way I can to ensure I attend again next year – SQLPASS2010.

One (1) day Post-PASS

  • missing the #sqlpass tweets
  • volunteered for VANPASS
  • Started reading Lynn Langit’s Smart Business Intelligence Solutions and SQL Server MVP Deep Dives
  • Itzik Ben-Gan, I just ordered your books! That was how awesome I thought the sessions were

PS – I will be sharing some tips/tricks/enlightenments I’ve learned in the next few blogs

VN:F [1.9.22_1171]
Rating: 8.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

I just finished the SQLPASS pre-conference from Brian Knight – Building a Microsoft Data Warehousing Platform, and I’m so pumped to do more BI! Reporting, Data Warehousing, Data Mining – all fun stuff.

In a gist, this is what we learned today (and it’s a lot of acronyms):
– SSIS, ETL
– SSAS
– Cubes, Data Warehouse
– Data Mining
– MDX and DMX (on a very very high level)

It was a lot of details and demos jammed in one day, but I won’t have it any other way, and overall the session was just great! Loved it!

Here are some bits and pieces of information / tips / useful sites / resources (probably mostly for my benefit, so I remember. But if you find it useful, then all the better):

  • 65% of data warehousing projects fail, and one of the reasons for this is lack of communication. Business users want something, but devs deliver something else.
  • Waterfall model for DW projects – hard to react. Agile is better – and Brian recommends 1 month iterations (ie complete deliverable cycle per month)
  • Why data warehouse? Consolidate data, aggregates, reporting, analytics, archived data (ex if you need to keep data for years and years)
  • Business Key = Alternate Key = PK from source
  • Surrogate Keys – insulates data warehouse from source changes
  • In one of Pragmatic Works’ projects, they lowered SSIS loading time from 4.5 hours to 3 mins – all about choosing “better” way. In SSIS most components are in memory and are synchronous.
    • Most SSIS components are synchronous – good for performance
    • Some are asynchronous and partially blocking, not so good – for example UNION ALL and MERGE – slight overhead
    • Some are asynchronous and fully blocking – not good at all – for example sort or aggregate transforms – but sometimes there are reasons to use these
  • In BIDS, devenv.exe -NOSPLASH to prevent splash screen :)
  • In SSIS – Protection Levels for Packages – EncryptSensitiveWithUserKey generally a bad idea because it associates the package with your profile; if you transfer package, might say package is corrupt. Best to Encrypt All With Password; ServerStorage relies on msdb
  • Different types of dimensions
    • Type 0 – Changing Attribute – updates; no history kept
    • Type 1 – Fixed Attribute – cannot change under any circumstance; ignores change
    • Type 2 – helps track changes; biggest disadvantage is additional storage; it creates a new record per update
  • Good practice – smaller packages; easier to change and maintain
  • Slowly Changing Dimension Wizard (SCD) – data types need to be lined up, need to be very careful; easy to break; also when you need to customize later on and relaunch the wizard, previous customizations will be dropped. You should keep documentation on how to recreate your customizations
  • Another problem with SCD – similar to a cursor, goes through row by row; alternative way is to create staging table first. Use T-SQL for what it’s good at, SSIS for what it’s good at. If you can do sorts in T-SQL, do the sorts in T-SQL
  • Inferred members – data that not in source, but is in Fact; for example, if it takes 2 weeks to add a salesperson in HR, but salesperson already has sales from Day 1
  • Better load – “Fast Load”; also configure “Maximum Insert Commit Size”
  • Every member should has an unknown member
  • ALWAYS alias your columns
  • Alternatives to detecting updates – instead of specifying all columns in a multi OR expression, use CHECKSUM or HASHBYTES. Problem with CHECKSUM is it is not guaranteed to be unique. Ex.
    HASHBYTES('sha', ISNULL(col1, '') + ISNULL(col2, ''))

    So in your expression column can just have something similar to: (DT_WSTR, 64)yourhashcol_src != (DT_WSTR,64)yourhashcol_dw

  • SQL Cache Transform – creates cache file, can be reused multiple times; 80% faster in SQL Server 2008; SQL Server 2005 was bad with threading
  • Terminologies – data warehouse (SSAS 2008)
    • Measure Group = Fact Table
    • Attributes = Columns in a Dimension
    • Member = Row in an Attribute
  • Hierarchies are shortcuts for end users; you can ultimately hide actual attributes (AttributeHierarchyVisible = True), and just expose the hierarchies
  • Always reconnect to cube after dimension or attribute changes so changes are apparent
  • Set up attribute relationships ex:
    Date SK -- Date -- Month -- Quarter -- Year 

    Note that uniqueness of quarter is really Year + Quarter.

  • Set up constraints/uniqueness through the “Key Column”. Set up what’s being displayed to user using “Name Column”. You will probably need to explore your dimensions and see which attributes need this.
  • For any time dimensions, add “Time Intelligence”. As a side effect, this automatically adds additional filters like “Yesterday”, “This Week”, “Last Week” etc in Excel
  • “Mushroom Cloud” – Invalid queries, but you are getting invalid results. For example if you drag Product then EmployeeCount. This gives you the EmployeeCount value for Product.
  • In Cube Editor, MeasureGroup, set IgnoreUnrelatedDimension = False helps prevent “mushroom clouds”; result will display blank instead of giving you invalid value
  • Discretization Method – allows you to create dynamic buckets of data
    • Automatic
    • EqualAddress – your just specify number of groups
    • Clusters – groups based on your data
  • HideMemberIf (trying to remember why this option was highlighted in the session)
  • MDX is language for querying the cube/SSAS; used for creating KPIs, calculations
  • Tuple in MDX points to a cell; you can work with six (6) different axes. Good book is MS Press MDX Step by Step
  • When you want to show data, put in SELECT; if you don’t but you want to filter by it, put in WHERE
  • Parenthesis in an MDX helps you represent a tuple
  • SSRS and Excel can generate some MDX
  • Calculations < Calculated Member does not store anything in the cube, just the metadata; actual calculations done on the fly
    • PRO: Only formula is stored in the cube; don’t need extra storage
    • CON: Only formula is stored in the cube; performance will be worse
  • Usage Based Optimization : Go to SSMS < Connect to SSAS < Set CreateQueryLogTable = true; you can also choose to “Review Aggregation Usage”
  • How to lock data in your cube : Go to your dimension then go to Role, and add roles/users
  • Option: Enable Visual Totals – like an implicit WHERE; Off by default, which means users will still see all data even if data is locked down; for example, if user can only see US data, his/her totals should be just for US, but s/he will see US Totals, then a different Grand Total. Enable this to lock down these totals to only the data they can see
  • Deployment wizard in SSAS – creates .asdatabase file that contains metadata required to deploy the cube
  • You dont need to always process the whole cube; you can do incremental processing
  • Partitioning in SSAS – in about 50K records, you will start to feel the slowness
  • SQL Server Standard allows 3 partitions; SQL Server Enterprise allows any number
  • New feature in SQL Server 2008 Enterprise – Proactive Caching
  • In SQL Server 2008, you can use profiler on SSAS to deconstruct MDX queries. In Excel, you can also right click on some columns then check MDX
  • There is an option to do offline OLAP : Excel < Options < OLAP Tools < Offline OLAP. You can get to your data, but actions and drillthrough are off
  • Double hop issue in SSRS and SSAS when you build reports on top of the cube : check Kerberos, SPN
  • You don’t need a cube to do data mining! You can also allow your users to do predictions based on flat tables or views.
  • Sample data mining scenario: used to auto approve insurance claims; can be used for fraud detection; can be integrated with SSIS in conditional tasks
  • For data mining structures, Decision Trees is the best way to start; as far as alogrithms go, you can even create your own, or purchase 3rd party algorithms
  • Users can get/do data mining through Excel
  • Users can do drillthroughs with data mining

Useful Resources / Sites

  • pragmaticworks.com
  • sqlshare.com
  • ssas-info.com
  • dtsxchange.com
  • sqlserverdatamining.com
  • Marco Russo’s Expert Cube Development with Microsoft SQL Server 2008 Analysis Services
  • Brian Knight’s 24-Hour Trainer: Microsoft SQL Server 2008 Integration Services (Wrox Programmer to Programmer)
  • MS Press Microsoft SQL Server 2008 MDX Step by Step (Step By Step (Microsoft))
VN:F [1.9.22_1171]
Rating: 9.0/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
`