One moment, please.

Looking for:

Microsoft Blog | Latest Product Updates and Insights.Microsoft SQL Server Business Intelligence Development Beginner's Guide | Packt

Click here to DOWNLOAD

- Microsoft sql server 2014 business intelligence development beginners guide pdf free

According to the definition of states, ETL is not just a data integration phase. Let's discover more about it with an example; in an operational sales database, you may have dozen of tables that provide sale transactional data. When you design that sales data into your data warehouse, you can denormalize it and build one or two tables for it. So, the ETL process should extract data from the sales database and transform it combine, match, and so on to fit it into the model of data warehouse tables.

There are some ETL tools in the market that perform the extract, transform, and load operations. SSIS also has many built-in transformations to transform the data as required. A data warehouse is designed to be the source of analysis and reports, so it works much faster than operational systems for producing reports.

However, a DW is not that fast to cover all requirements because it is still a relational database, and databases have many constraints that reduce the response time of a query.

The requirement for faster processing and a lower response time on one hand, and aggregated information on another hand causes the creation of another layer in BI systems. This layer, which we call the data model, contains a file-based or memory-based model of the data for producing very quick responses to reports. Microsoft's solution for the data model is split into two technologies: the OLAP cube and the In-memory tabular model.

The OLAP cube is a file-based data storage that loads data from a data warehouse into a cube model. The cube contains descriptive information as dimensions for example, customer and product and cells for example, facts and measures, such as sales and discount. The following diagram shows a sample OLAP cube:.

In the preceding diagram, the illustrated cube has three dimensions: Product , Customer , and Time. Each cell in the cube shows a junction of these three dimensions. Aggregated data can be fetched easily as well within the cube structure. For example, the orange set of cells shows how much Mark paid on June 1 for all products.

As you can see, the cube structure makes it easier and faster to access the required information. Multidimensional modeling is based on the OLAP cube and is fitted with measures and dimensions, as you can see in the preceding diagram. The tabular model is based on a new In-memory engine for tables. The In-memory engine loads all data rows from tables into the memory and responds to queries directly from the memory.

This is very fast in terms of the response time. The frontend of a BI system is data visualization. In other words, data visualization is a part of the BI system that users can see. There are different methods for visualizing information, such as strategic and tactical dashboards, Key Performance Indicators KPIs , and detailed or consolidated reports.

As you probably know, there are many reporting and visualizing tools on the market. Microsoft has provided a set of visualization tools to cover dashboards, KPIs, scorecards, and reports required in a BI application. SSRS is a mature technology in this area, which will be revealed in. Chapter 9 , Reporting Services. Excel is also a great slicing and dicing tool especially for power.

There are also components in Excel such as Power View, which are designed to build performance dashboards. You will learn more about Power View in Chapter 9 , Reporting. Sometimes, you will need to embed reports and dashboards in your custom written application. Chapter 12 , Integrating Reports in Application , of this book explains that in detail.

Master Data Management Every organization has a part of its business that is common between different systems. For example, an organization may receive customer information from an online web application form or from a retail store's spreadsheets, or based on a web service provided by other vendors.

Microsoft's solution for MDM is Master. Data Services MDS. Master data can be stored in the MDS entities and it can be maintained. Even if one or more systems are able to change the master data, they can write back their changes into MDS through the staging architecture. The quality of data is different in each operational system, especially when we deal with legacy systems or systems that have a high dependence on user inputs.

As the BI system is based on data, the better the quality of data, the better the output of the BI solution. Because of this fact, working on data quality is one of the components of the BI systems. As an example, Auckland might be written as "Auck land" in some Excel files or be typed as "Aukland" by the user in the input form.

There are also matching policies that can be used to apply standardization on the data. A data warehouse is a database built for analysis and reporting.

In other words, a data warehouse is a database in which the only data entry point is through ETL, and its primary purpose is to cover reporting and data analysis requirements. This definition clarifies that a data warehouse is not like other transactional databases that operational systems write data into. When there is no operational system that works directly with a data warehouse, and when the main purpose of this database is for reporting, then the design of the data warehouse will be different from that of transactional databases.

If you recall from the database normalization concepts, the main purpose of normalization is to reduce the redundancy and dependency. The following table shows customers' data with their geographical information:. Let's elaborate on this example. As you can see from the preceding list, the geographical information in the records is redundant. This redundancy makes it difficult to apply changes. For example, in the structure, if Remuera , for any reason, is no longer part of the Auckland city, then the change should be applied on every record that has Remuera as part of its suburb.

The following screenshot shows the tables of geographical information:. So, a normalized approach is to retrieve the geographical information from the customer table and put it into another table. There are also components in Excel such as Power View, which are designed to build performance dashboards. Sometimes, you will need to embed reports and dashboards in your custom written application. Chapter 12 , Integrating Reports in Application , of this book explains that in detail.

Every organization has a part of its business that is common between different systems. That part of the data in the business can be managed and maintained as master data. For example, an organization may receive customer information from an online web application form or from a retail store's spreadsheets, or based on a web service provided by other vendors.

Master Data Management MDM is the process of maintaining the single version of truth for master data entities through multiple systems. Even if one or more systems are able to change the master data, they can write back their changes into MDS through the staging architecture. The quality of data is different in each operational system, especially when we deal with legacy systems or systems that have a high dependence on user inputs.

As the BI system is based on data, the better the quality of data, the better the output of the BI solution. Because of this fact, working on data quality is one of the components of the BI systems.

As an example, Auckland might be written as "Auck land" in some Excel files or be typed as "Aukland" by the user in the input form. As a solution to improve the quality of data, Microsoft provided users with DQS. DQS works based on Knowledge Base domains, which means a Knowledge Base can be created for different domains, and the Knowledge Base will be maintained and improved by a data steward as time passes.

There are also matching policies that can be used to apply standardization on the data. A data warehouse is a database built for analysis and reporting. In other words, a data warehouse is a database in which the only data entry point is through ETL, and its primary purpose is to cover reporting and data analysis requirements.

This definition clarifies that a data warehouse is not like other transactional databases that operational systems write data into. When there is no operational system that works directly with a data warehouse, and when the main purpose of this database is for reporting, then the design of the data warehouse will be different from that of transactional databases.

For example, in the structure, if Remuera , for any reason, is no longer part of the Auckland city, then the change should be applied on every record that has Remuera as part of its suburb. The following screenshot shows the tables of geographical information:. So, a normalized approach is to retrieve the geographical information from the customer table and put it into another table.

Then, only a key to that table would be pointed from the customer table. In this way, every time the value Remuera changes, only one record in the geographical region changes and the key number remains unchanged. So, you can see that normalization is highly efficient in transactional systems.

This normalization approach is not that effective on analytical databases. If you consider a sales database with many tables related to each other and normalized at least up to the third normalized form 3NF , then analytical queries on such databases may require more than 10 join conditions, which slows down the query response. In other words, from the point of view of reporting, it would be better to denormalize data and flatten it in order to make it easier to query data as much as possible.

This means the first design in the preceding table might be better for reporting. However, the query and reporting requirements are not that simple, and the business domains in the database are not as small as two or three tables. So real-world problems can be solved with a special design method for the data warehouse called dimensional modeling.

There are two well-known methods for designing the data warehouse: the Kimball and Inmon methodologies. The Inmon and Kimball methods are named after the owners of these methodologies. Both of these methods are in use nowadays. The main difference between these methods is that Inmon is top-down and Kimball is bottom-up. In this chapter, we will explain the Kimball method. Both of these books are must-read books for BI and DW professionals and are reference books that are recommended to be on the bookshelf of all BI teams.

This chapter is referenced from The Data Warehouse Toolkit , so for a detailed discussion, read the referenced book. To gain an understanding of data warehouse design and dimensional modeling, it's better to learn about the components and terminologies of a DW. A DW consists of Fact tables and dimensions. The relationship between a Fact table and dimensions are based on the foreign key and primary key the primary key of the dimension table is addressed in the fact table as the foreign key.

Facts are numeric and additive values in the business process. For example, in the sales business, a fact can be a sales amount, discount amount, or quantity of items sold. All of these measures or facts are numeric values and they are additive. Additive means that you can add values of some records together and it provides a meaning.

For example, adding the sales amount for all records is the grand total of sales. Dimension tables are tables that contain descriptive information. Descriptive information, for example, can be a customer's name, job title, company, and even geographical information of where the customer lives. Each dimension table contains a list of columns, and the columns of the dimension table are called attributes.

Each attribute contains some descriptive information, and attributes that are related to each other will be placed in a dimension.

For example, the customer dimension would contain the attributes listed earlier. Each dimension has a primary key, which is called the surrogate key. The surrogate key is usually an auto increment integer value. The primary key of the source system will be stored in the dimension table as the business key. The Fact table is a table that contains a list of related facts and measures with foreign keys pointing to surrogate keys of the dimension tables.

Fact tables usually store a large number of records, and most of the data warehouse space is filled by them around 80 percent. Grain is one of the most important terminologies used to design a data warehouse. Grain defines a level of detail that stores the Fact table. For example, you could build a data warehouse for sales in which Grain is the most detailed level of transactions in the retail shop, that is, one record per each transaction in the specific date and time for the customer and sales person.

Understanding Grain is important because it defines which dimensions are required. There are two different schemas for creating a relationship between fact and dimensions: the snow flake and star schema.

In the start schema, a Fact table will be at the center as a hub, and dimensions will be connected to the fact through a single-level relationship. There won't be ideally a dimension that relates to the fact through another dimension. The following diagram shows the different schemas:. The snow flake schema, as you can see in the preceding diagram, contains relationships of some dimensions through intermediate dimensions to the Fact table.

If you look more carefully at the snow flake schema, you may find it more similar to the normalized form, and the truth is that a fully snow flaked design of the fact and dimensions will be in the 3NF. The snow flake schema requires more joins to respond to an analytical query, so it would respond slower.

Hence, the star schema is the preferred design for the data warehouse. It is obvious that you cannot build a complete star schema and sometimes you will be required to do a level of snow flaking. However, the best practice is to always avoid snow flaking as much as possible.

After a quick definition of the most common terminologies in dimensional modeling, it's now time to start designing a small data warehouse. One of the best ways of learning a concept and method is to see how it will be applied to a sample question. Assume that you want to build a data warehouse for the sales part of a business that contains a chain of supermarkets; each supermarket sells a list of products to customers, and the transactional data is stored in an operational system.

Our mission is to build a data warehouse that is able to analyze the sales information. Before thinking about the design of the data warehouse, the very first question is what is the goal of designing a data warehouse? What kind of analytical reports would be required as the result of the BI system? The answer to these questions is the first and also the most important step. This step not only clarifies the scope of the work but also provides you with the clue about the Grain.

Defining the goal can also be called requirement analysis. Your job as a data warehouse designer is to analyze required reports, KPIs, and dashboards. After requirement analysis, the dimensional modeling phases will start. Based on Kimball's best practices, dimensional modeling can be done in the following four steps:. In our example, there is only one business process, that is, sales. Grain, as we've described earlier, is the level of detail that will be stored in the Fact table.

Based on the requirement, Grain is to have one record per sales transaction and date, per customer, per product, and per store. Once Grain is defined, it is easy to identify dimensions. Based on the Grain, the dimensions would be date, store, customer, and product.

It is useful to name dimensions with a Dim prefix to identify them easily in the list of tables. The next step is to identify the Fact table, which would be a single Fact table named FactSales. In this chapter, readers will learn when to use dashboards, how to visualize data with dashboards, and how to use PerformancePoint and Power View to create dashboards. Chapter 11, Power BI , explains how predesigned reports and dashboards are good for business users, but power users require more flexibility.

Power BI is a new self-service BI tool. Chapter 12, Integrating Reports in Applications , begins with the premise that reports and dashboards are always required in custom applications. NET applications in web or Metro applications to provide reports on the application side for the users. However, you can also download and install MS SQL Server Evaluation Edition, which has the same functionalities but is free for the first days, from the following link:.

There are many examples in this book and all of the examples use the following databases as a source:. After downloading the database files, open SQL Server Management Studio and enter the following scripts to create databases from their data files:. This book is very useful for BI professionals consultants, architects, and developers who want to become familiar with Microsoft BI tools.

It will also be handy for BI program managers and directors who want to analyze and evaluate Microsoft tools for BI system implementation. Instructions often need some extra explanation so that they make sense, so they are followed with:. This heading explains the working of tasks or instructions that you have just completed. You will also find a number of styles of text that distinguish between different kinds of information.

Here are some examples of these styles, and an explanation of their meaning. Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: Expand the Chapter 02 SSAS Multidimensional database and then expand the dimensions.

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: On the Select Destination Location screen, click on Next to accept the default destination.

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop. Open navigation menu. Close suggestions Search Search. User Settings. Skip carousel. Carousel Previous. Carousel Next. What is Scribd? Explore Ebooks. Bestsellers Editors' Picks All Ebooks. Explore Audiobooks. Bestsellers Editors' Picks All audiobooks. Explore Magazines. Editors' Picks All magazines. Explore Podcasts All podcasts.

Difficulty Beginner Intermediate Advanced. Explore Documents. Enjoy millions of ebooks, audiobooks, magazines, and more. Start your free days Read preview. Publisher: Packt Publishing. Released: May 26, ISBN: Format: Book. Written in an easy-to-follow, example-driven format, there are plenty of stepbystep instructions to help get you started!

The book has a friendly approach, with the opportunity to learn by experimenting. This book is will give you a good upshot view of each component and scenarios featuring the use of that component in Data Warehousing and Business Intelligence systems.

About the author RR. Related Podcast Episodes. Data Exploration For Business Users Powered By Analytics Engineering With Lightdash: An interview with Oliver Laslett about the open source Lightdash framework for business intelligence and how it builds on the work that your analytics engineers are doing with dbt.

Mark brings up using Livebook as a Business Intelligence tool for doing analysis of a running application's data. Single Source of Truth: In mathematics, truth is universal. In data, truth lies in the where clause of the query. As large organizations have grown to rely on their data more significantly for decision making, a common problem is not being able to agree on what the Astrato is a data analytics and business intelligence tool built on the cloud and for the cloud.

Alexander discusses the features and capabilities of Astrato for Data Discovery From Dashboards To Databases With Castor: An interview about how the Castor platform approaches the problem of data discovery and preserving context for your organization.

Jonathan Sharr is the kind of story that keeps us going! Since then he went Think too hard about it, and you might actually find yourself Ismail, the CTO and co-founder of Hingeto, a Y-combinator funded and fast-growing Silicon Valley startup, shares how they use business intelligence Business Intelligence Beyond The Dashboard With ClicData: An interview with Telmo Silva about all of the layers involved in a full featured business intelligence system and how he created ClicData to make them available to organizations of every size.

A first look at Oracle Spatial: Spatially aware databases such as Oracle Spatial can offer enhanced data validity, finer control over level of access and user privileges, and ease of use for web developers who are not familar with geo-coding. Related Articles. Related categories Skip carousel. Free access for Packt account holders Instant updates on new Packt books Preface What this book covers What you need for this book Who this book is for Conventions Time for action — heading What just happened?

Reader feedback Customer support Downloading the example code Downloading color versions of the images for this book Errata Piracy Questions 1. Time for action — creating the first cube What just happened? Time for action — viewing the cube in the browser What just happened?

Dimensions and measures Time for action — using the Dimension Designer What just happened? Do not write procedures that exceed four pages in length excluding comments, headers, and maintenance log, without written justification to the data base administrators. Exceptions to rule are procedures used in conversion or loads of data. Code exceeding this limit should be looked at closer and possibly split into smaller stored procedures or modularized. This is prohibited.

In all instances, include the appropriate fields in the statements where needed. Example below. FirstName, psn. LastName, emp. Person AS psn ON emp. Formatting rules for this structure are as follows: 1.

No exceptions are permitted. It is best to apply this style even when you do not exceed the character limit though this rule is not strictly enforced. The guidelines presented here by example apply to all Transact-SQL statements. Use of white space is highly recommended. Person psn ON emp.

Individual elements are placed in separate lines 2. Elements are indented one tab stop 3. Parentheses are used to distinguish logical blocks and are indented at the same level as any other parameter.

To ensure consistency parameters are descriptive and define correctly according to their usage. Second example is to check a range. Stored procedures are named according to the business function they perform or business rule they enforce. The name should include a prefix of BS for BroadStone and a business description of the action performed. The name is appropriately abbreviated with items found in the standard abbreviation list.

This means one insert, one update or one delete statement per stored procedure. Do not write stored procedures that exceed four pages in length excluding comments, headers, and maintenance log. Exceptions to the rule are procedures used in conversion or loads of data. Analyse code exceeding this limit and split into smaller stored procedures or modularize. The comment block is simple and contains a brief audit listing of changes that go to production; not a running summary of changes during development.

Error Trapping is highly recommended with the Enterprise libraries, because if you have an error, the first thing one ask is what the error log says. Please also list the fields.

Also, white space for readability is also highly desirable. The above and latter is supported by industry standards. Triggers Triggers are to be avoided as they have the potential to cause execute multiple SQL statements without other developer knowledge.

Constraints Planning and creating tables requires identifying how to enforce integrity of the data stored within the columns of the tables. MS SQL Server has the following constraints to enforce integrity: Primary Key Constraints are columns or a column that uniquely identifies a row within a table. All tables should have a primary key and composite primary keys should be avoided. For most of your OLTP tables, an identity column is the primary key.

Foreign Key Constraints are columns or a column that is used to enforce a relation between information in two tables. A link is defined between the two tables when a primary key is referenced by a column or columns in another table. The reference becomes a foreign key in the second table. Indexes An index is an on-disk structure associated with a table or a view that speeds retrieval of rows from the table or the view. An index contains keys built from one or more columns in the table or the view.

News from 963intupu-biq5

Search This Blog