Code never lies, comments sometimes do [Ron Jeffries]

Visitors Map

Follow on Twitter

@ZammaCode
Powered by Blogger.

Thursday, December 21, 2023

Wednesday, December 20, 2023

What is Snowflake?





  • Snowflake is a self-managed data platform or data cloud
  • Snowflake makes storing, processing, and analyzing data faster and more flexible compared to traditional options
  • Snowflake uses a brand-new SQL query engine designed specifically for the cloud
  • From self-managed means:
    • You don't need to install, configure, or manage any physical or virtual hardware or software
    • Snowflake takes care of ongoing maintenance, upgrades, and tuning for you


Thursday, November 30, 2023

Wednesday, August 2, 2023

Wednesday, July 12, 2023

What is Data Science


  • Data science is to make data useful for decisions
  • Data science encompasses three disciplines: 
    • Statistics; decision making under uncertainty on data. Excellence of statistics is rigorous approach.
    • Machine learning; automation of decision making under uncertainty on data. Performance is the excellence of the machine learning.
    • Analytics; find the unknown and don't know how many decisions you want to make before you begin. The excellence of an analyst is speed.

Thursday, June 22, 2023

Joins in DAX


  • DAX relationship has some limitations like:
    • The matching criteria always use the = operator and does not allow the use of other operators such as <>, >=, <, and <= .
    • Supports only one-to-many and one-to-one relationships.
  • In order to overcome these limitations, DAX provides the capability to utilize join functions.
  • CROSSJOIN(tab1, tab2, .......)
    • CROSSJOIN join function returns a table that is a Cartesian product of the specified tables.
    • Join that involves Cartesian product is also known as Full outer join.
  • GENERATE(tab1, tab2)
    • GENERATE join function return a table that is a Cartesian product of only two different tables.
    • Join that involves Cartesian product is also known as Full outer join.
    • By using FILTER function with GENERATE:
      • We can use more than one matching conditions including operators other than =.

      • We can create inner join i.e. only matching rows from tab1 and tab2.
    • FILTER function is the DAX equivalent of WHERE clause in a T-SQL statement.
    • When GENERATE function returns a calculated table then it must be make sure that both tables have unique column names.
      • Using SELECTCOLUMNS(tab, col_new_name, exp_ret_col, [tab, col_new_name, exp_ret_col], …) function we can rename columns. 

  • NATURALINNERJOIN (left_tab, right_tab)
    • NATURALINNERJOIN function is used to perform Inner join i.e. only matching rows are selected from left and right tables.
    • This join requires that both tables should belong to same physical source table or same lineage. 
  • NATURALLEFTOUTERJOIN(left_tab, right_tab)
    • NATURALLEFTOUTERJOIN function is used to perform Left outer join i.e. all rows from left table and only matching rows from right table are selected.
  • EXCEPT(left_tab, right_tab)
    • EXCEPT function perform Left Anti join i.e returns all rows from left table that are not matched in right table.


Tuesday, June 20, 2023

Logical Functions in DAX


  • Logical functions evaluates to TRUE or FALSE.
  • Following are the DAX Logical functions.
  • IF(logical_test, result_true, result_false) 
    • Evaluates a condition and returns one value if it is TRUE, and another value if it is FALSE.

    • The IF logical function can also be nested. 
  • IF.EAGER(logical_test, result_true, result_false)
    • IF.EAGER logical function like IF function evaluates a condition and returns one value if it is TRUE, and another value if it is FALSE.
    • F.EAGER has the same functional behavior as the IF function, but performance may differ due to differences in execution plans as it uses an eager execution plan.
    • The IF function is strictly evaluated while IF.EAGER is eagerly evaluated.
      • For example in IF (2<1, 1+2, 4+1), Power BI will only compute the second part (4+1) since the condition 2<1 is false, and the first part (1+2) will not be computed.
      • While in IF.EAGER (2<1, 1+2, 4+1) both true and false parts will be evaluated.
      • In cases where true and false parts are computed the IF.EAGER provides a better performance as compared to IF function.
  • SWITCH(expression, option1, result1, option2, result2,                  ............, elseResult)
    • Switch logical function evaluates an expression against a list of values and return the one of possible results.
    • By default SWITCH function checks for equality, to achieve functionality of other operators like less than or greater than we can replace expression with TRUE() function.
    • By using TRUE() now Switch will be checking the subsequent options/conditions for true.

  • AND(logical_val_1, logical_val_2)
    • AND logical function returns TRUE if both the arguments are TRUE otherwise returns FALSE.
    • The AND function operates similarly to the && logical operator.
  • OR(logical_val_1, logical_val_2)
    • OR logical function returns TRUE if one of the arguments is TRUE, otherwise returns FALSE.
    • The OR function operates similarly to the || logical operator.
  • NOT(logical_val)
    • NOT logical function inverts FALSE to TRUE or TRUE to FALSE.
  • COALESCE(val1, val2, val3, ........)
    • COALESCE logical function accepts multiple arguments and returns the first non-blank argument. If the first argument is blank, it returns the value from the second argument, and so on. If all arguments are blank, it returns blank.
  • IFERROR(expression, val_if_error) 
    • IFERROR logical function returns val_if_error if error occurs during evaluation of expression, otherwise it returns value of expression. 


Relationships in Power BI


  • When working with multiple tables, it is highly likely that you will need to perform analysis using data from all of them.
  • Relationships are rules that define how two tables can be associated.
  • Relationships connect one table column to another table’s column.
  • Power BI Desktop can automatically detect relationships between tables by matching columns with the same name and datatype.
  • Relationships can be defined as:
    • one-to-many (1:*)
      • In one-to-many relationship, a single record in one table can be linked to one or more records in another table.
      • Often the Lookup table contains only a single record, while the Data table contains one or more records.
      • The column of Lookup table that is involved in relationship will always contain unique values. 
    •  one-to-one (1:1)
      • In one-to-one relationship, a single record in one table can be linked to only one record in another table.
      • Both columns in a one-to-one relationship have unique values.
  • Relationship between tables allow
    • To apply star schema
    • Cross Filter direction
      • By default, in a one-to-many relationship, filters can flow downwards in a single direction, specifically from the "one" side to the "many" side. This means that only column from Lookup table can filter data in the Data table.
      • The default unidirectional behavior can be modified to allow for both or bi-directional filtering, enabling columns from both the Lookup and Data tables to filter the data.
      • The drawback of enabling bi-directional filtering is that it can potentially slow down reports.
      • For one-to-one relationship filter flow direction is always from both tables. 
    • To take advantage of different DAX functions like 
      • RELATED (columnName) function fetch the value of the specified column from "one" side of the relationship.
      • RELATEDTABLE (tableName) function retrieves values in table format from the specified table located on the "many" side of the relationship.
      • USERELATIONSHIP (column1, column2)
        • This function only enables the indicated relationship for the duration of the calculation by specifying the primary and foreign columns as parameters.
        • Note; a pair of tables can have multiple defined relationships, but only one of them can be designated as the active relationship at a given time.
      • CROSSFILTER (leftColumn, rightColumn, filterDirection) function specifies cross filtering direction to be used in the evaluation of an expression.

Monday, June 19, 2023

Operators in DAX


  • Operators are used to create expressions.
  • There are four different types of operators in DAX.
    • Arithmetic Operators perform basic arithmetic calculations.
      • + (Addition)
      • – (Subtraction)
      • * (Multiplication)
      • / (Division)
      • ^ (Exponentiation)
    • Comparison Operators return true or false when used to compare two values.
      • = (Equal to)
      • == (Strict equal to)
      • > (Greater than)
      • < (Less than)
      • >= (Greater than or equal to)
      • <= (Less than or equal to)
      • <> (Not equal to)
All of the above comparison operators except ==, returns TRUE for the following
BLANK  = 0, BLANK = "", BLANK = FALSE
The == operator, returns TRUE when the two arguments have the same value or are both BLANK
    • Logical operators are used to merge two or more statements that result in either TRUE or FALSE when evaluated.
      • && (AND); returns true if all statements combined with && are true, else it returns false
      • || (OR); returns true if at least one of the statements combined with || is true, else it returns false
      • IN returns TRUE if a row of values exists or contained in a table, otherwise returns FALSE
    • Text Concatenation operator (&) concatenates two strings.

Data Types & Variables in DAX


Data Types
  • DAX is strongly typed language. 
  • DAX data types can be categorized into three groups: 
    • Numeric
      • Numeric data types include 
        • Decimal
        • Fixed decimal/currency
        • Integer/whole numbers, 
        • Percentage (same is decimal data type but with formatting as percentage)  
        • Date/Time (underneath Date/Time value is stored as a Decimal Number)
        • Date (same as  Date/Time value with zero decimal numbers)
        • Time (same as Date/Time value with no digits to the left of the decimal place) 
        • True/False
    • Non-numeric
      • Non-numeric types include
        • Text
        • Binary
    • Variant
      • Variant data type is used for expressions that might return different data types, depending on the conditions.
Variables
  • Variables are defined with the VAR keyword and the RETURN clause is used to return a value of the variable.
  • Variable should be named using one word; variables can’t contain spaces.
  • Variable scope begins with a VAR statement and ends with a matching RETURN statement.
  • Once the value of a variable is evaluated, it remains constant and does not change. This behavior is similar to constants in traditional programming languages.
  • Variables in DAX are executed within the context of the initial filter and row contexts.
  • Use of variables help in simplifying the DAX expressions.
  • DAX performs automatic conversion between strings and numbers whenever necessary.
  • The following example uses the Sample-Superstore dataset to demonstrate the use of variables in measure Test Variables.

Snowflake Schema


  • The Snowflake schema is an enhanced version of the Star Schema. 
  • In this schema, dimension tables are broken down into additional sub-dimensions. For example, the Product dimension table could be further normalized into related tables, such as a SubCategory table.
  • Snowflake schemas offer more storage efficiency, due to adherence to normalization standards, but query performance is not as good as with more de-normalized data model like Star Schema.
  • Snowflake schema can be converted into Star schema by using tables merge/append options in Power Query to reduce the number of tables.



Tuesday, June 13, 2023

Star Schema


  • Star Schema is a data modeling approach to classify tables as either lookup tables (dimensions or master data tables) or data tables (fact or transaction tables).
  • Lookup Tables 
    • Lookup tables describe business entities e.g. Products, time etc.
    • Lookup tables typically have fewer rows compared to data tables and often have a larger number of columns.
    • Lookup tables support filtering and grouping.
  • Data Tables 
    • Data tables contain a very large number of rows and continue to grow over time i.e. store transactional data.
    • Data table contains dimension tables key columns that relate to dimension tables.
    • Data table supports summarization.
  • The relationship of tables in model identifies the table either as Lookup or Data table. Common relationship cardinality in Lookup and Data table is one-to-many or its inverse many-to-one. 
  • The "one" side is always a dimension type table while the "many" side is always a fact type table. The Collie layout methodology recommends that to place lookup tables at the top and data tables at the bottom.
  • In order to optimize your data tables, consider extracting repetitive attribute columns and creating separate lookup tables for them.
  • For a lookup table with two columns (Key and Description), excluding the Key column and incorporating the Description directly into the data table can be more efficient.
  • Consider flattening multiple joined lookup tables into a single wider lookup table.

Context in DAX


  • Context describes the environment in which a DAX formula is evaluated. 
  • There are two types of context.
  • Filter Context
    • Filter context is the context that is applied to a whole table or column (set of rows).
    • The filter context is set of filters that are applied to the data model before the evaluation of a DAX statements starts. 
    • Filter context is usually created by visual, slicer, page or whole report. 
    • For instance, the formula AverageSales = AVERAGE(Sales[Total Sales]) will use the filter context to calculate the average of the Total Sales column.
    • The CALCULATE function can be used to change the assigned Filter context.
    • The initial filter context coming from the visual is applied to the underlying table(s) in the data model and automatically propagates from the "one" side of the relationship to the "many" side of the relationship i.e. from the lookup table to the data table.
    • In the following example the filter applied on Products table can propagate downhill to the Sales table but cant not flow back uphill to Customers table.

  • Row Context
    • Row context is the context that is applied to each individual row of a table when a formula is evaluated.
    • The row context is created by default for calculated columns, some formulas like iterators (X functions), and FILTER() function, while all other scenarios will require you to create the row context e.g. in the case of Measures.
    • Row context does not filter the table. To turn the row context into filter context you can use the CALCULATE() function.
    • Row context cannot use relationship, therefore;  RELATED() and RELATEDTABLE() functions allow a row context to leverage an existing relationship.
    • For instance, when evaluating a formula like TotalSales = Sales[Quantity] * Sales[Price], the calculation will be performed for each row, resulting in the TotalSales value for each individual row.