Configuring your release pipelines for safe deployments

For large and high scale applications, the promise of “enterprise grade” availability and high reliability levels are key to customer confidence on the applications. Continuous delivery pipelines for such scaled out applications typically consist of multiple environments.
DevOPS enables faster & automated delivery of changes, thereby helping customers with the most advanced set of features. In theory, any change to a production system has risks. Safe deployment guidelines help in managing this risk for large scaled out applications, thereby fulfilling the customer promise.
In this blog post, we shall share the safe deployment guidelines that we follow in Microsoft and how do we configure the pipelines or release definitions using Visual Studio Team Services to enforce the guidelines.
Gradual rollout to multiple environments
For applications under discussion, the production environment would comprise of multiple scale units (one per region).
You may want to deploy the changes a test or staging environment before deploying to any of the production environments (as a final quality validation) and a canary environment that interacts with production environments for the dependent services serving some synthetic load.
Also, it is recommended to not deploy to all production environments in one go, exposing all the customers to the changes. A gradual rollout that exposes the changes to customers over a period, thereby implicitly validating the changes in production with a smaller set of customers at a time.
In effect, the deployment pipeline would look like the following:

As an example, for an application is deployed in 12 regions with US regions (4) having a high load, European regions (4) having a medium load and Asian regions (4) having a relatively lighter load, following would be the order of rollout.

  Regions
  Description

Test
Run final functional validation on the application

Canary
Process synthetic load on the application, interacting with production instances of dependent services

Pilot customers
Pilot customers (Internal and early adopter customers) are onboarded to a separate scale unit. Deploy after deployment to Canary succeeds.

Asian regions 1, 2, 3 & 4
Asian regions have a lighter load. Deploy to all regions in parallel after deployment to Pilot succeeds.

European regions 1, 2, 3 & 4
European regions have medium load. Deploy to all regions in parallel after deployment to all Asian regions succeed.

 US regions 1, 2, 3 & 4
US regions have high load. Deploy to all regions in parallel after deployment to all European regions succeed.

In a release definition, we use environment triggers to configure the environment deployment conditions as follows.

If required, you can configure to deploy to the four scale units in each region sequentially for additional level of control.
Uniform deployment workflow for all environments
As discussed above, we are deploying the application to each scale unit independently. A deployment and validation process is defined for each of the scale units. As a best practice, you should follow the same procedure to deploy those bits in every environment. The only thing that should change from one environment to the next is the configuration you want to apply and the validation steps to execute.
To enforce the deployment procedures to be same across environments, we define a task group for the deployment procedure and include the same in each of the environments. The different configurations are parameterized and the values are managed using environment variables in the release definition.
The deployment workflow for each of the environments in our release definitions looks like the following.

Manual approval for rollouts
There are various reasons due to which you may not want the application to be updated at some points of time. It could be due to an upcoming major event for which you want to avoid all risks, or a known issue with a dependency that needs changes to your application to be deferred.
Configuring manual approvals before the pipeline begins ensures that we get two pairs of eyes ensure the application is not going through these special circumstances and can be updated.
Moreover, there might be urgent hotfixes or special changes that do not apply to all the scale units. For such changes, we need to bypass the pipeline and directly deploy to specific environments only. We still would like to get approvals in such scenarios. However, in case of a pipeline flow we do not want to get multiple approvals, one for each of the environments.
So, in a nut shell, we are looking for one approval at the start of the deployment sequence. The sequence may or may not start from the Test environment.
We configure approvals for each of the environments in our release definitions to fulfil these requirements. The approvals for the production environments are configured like the following.

Segregation of roles
As we discussed above, we would like to have two people analyze every deployment and ensure that all’s well for deploying to the environment.
It is possible that the approver mentioned in the release definition is same as the person requesting the deployment. In such a scenario, the requirement of segregating the roles of deployment submitter and approver between two different users does not get fulfilled, thereby risking a wrong deployment due to manual oversight.
Release management provides an option to deny the submitter from approving a deployment, thereby helping enforce segregation of roles.

Health check during roll out
With the above environment and approval settings, validation phase for the environments play a key role in ensuring the environments are healthy after the deployments. It may always not be possible to fully automate the validation and health monitoring of environments.
In such circumstances, adopting a “Wait and auto-promote” criteria for the production environments are recommended. The pipeline is paused for a certain duration of time, during which team members monitor various health indicators of the service and can abort the pipeline if it is not appropriate to continue. Users can manually re-start the pipeline from the next environment once the issue is analyzed.
Including a manual intervention task in the validation workflow for production environments helps us configure the release definitions to be in “wait and auto-promote” mode with a 24 hours wait between environments.

Branch filters for deployments
With extensive use of Git as the version control system for development, developers commit changes in various branches. All the changes come together in master or release branches. To ensure completeness of features being deployed, it is recommended to restrict deployments of artifacts generated from these branches to the production environments.
Secure the pipelines
We have now configured our pipelines that ensure we safely deploy the changes to all the environments. Team members can now create releases and start the deployments. To avoid any issues, we need to ensure that the configurations are not disturbed and the checks out in place are not by passed by users.
We configure Release permissions on the release definitions to avoid any unwanted changes. Specifically, we control the users who are allowed “Administer release permissions”, “Delete release definition”, “Delete release environment”, “Delete releases”, “Edit release definition”, “Edit release environment”, “Manage release approvers” and “Manage releases” permissions.

https://blogs.msdn.microsoft.com/visualstudioalm/2017/04/24/configuring-your-release-pipelines-for-safe-deployments/

cavsmathur: Learn about the recent major update to the #Windows10 #Operating #System dated April 11, 2017—KB4015217 (OS Build… https://t.co/duBWjo4Uau

cavsmathur: Learn about the recent major update to the #Windows10 #Operating #System dated April 11, 2017—KB4015217 (OS Build… https://t.co/duBWjo4Uau

How to find query plan choice regressions with SQL Server 2017 CTP2

SQL Server 2017 in CTP2.0  version automatically identifies potential performance regressions in SQL queries, enables you to easily find potential issues, and gives you information what should you do to fix the issue.

In CTP2.0 version is added new system view sys.dm_db_tuning_recommendations that returns recommendations that you can apply to fix potential problems in your database. This view contains all identified potential performance issues in SQL queries that are caused by the SQL plan changes, and the correction scripts that you can apply. Every row in this view contains one recommendation that you can apply to fix the issue. Some of the information that are shown in this view are:

Id of the query, plan that caused regression, and the plan that that might be used instead of this plan.
Reason that describes what kind of regression is detected (e.g. CPU time for the query is changed from 17ms to 189ms)
T-SQL script that can be used to force the plan.
Information about the current plan, and previous plan that had better performance.

Since some of the information are formatted as JSON documents, you can use the following query to parse details in recommendations and return information about the query, identified problem, T-SQL script that you can apply, etc.

SELECT planForceDetails.query_id, reason, score,
JSON_VALUE(details, ‘$.implementationDetails.script’) script,
planForceDetails.[new plan_id], planForceDetails.[recommended plan_id]
FROM sys.dm_db_tuning_recommendations
CROSS APPLY OPENJSON (Details, ‘$.planForceDetails’)
WITH ( [query_id] int ‘$.queryId’,
[new plan_id] int ‘$.regressedPlanId’,
[recommended plan_id] int ‘$.forcedPlanId’
) as planForceDetails;

This query returns something like the following results:

query_id
reason
score
script

17
Average query CPU time changed from 3.55ms to 127.78ms
80
exec sp_query_store_force_plan @query_id = 17, @plan_id = 41
….

1804
Average query CPU time changed from 13.5ms to 27.78ms
23
exec sp_query_store_force_plan @query_id = 1804, @plan_id = 17
….

1304
Average query CPU time changed from 8.4ms to 207.8ms
41
exec sp_query_store_force_plan @query_id = 1304, @plan_id = 41
….

If you look at this results, you will be able to find queries that are slower, see what is the difference in CPU time, and execute the script in the [script] column and fix the problem. Score column is internal score that estimates the importance of the recommendation, and you can use this column to sort recommendations when you review them.
Find more info about Automatic tuning features in SQL Server 2017 on Microsoft Doc site.

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/04/24/how-to-find-query-plan-choice-regressions-with-sql-server-2017-ctp2/

Build a recommendation system with the support for graph data in SQL Server 2017 and Azure SQL DB

Authored by Arvind Shyamsundar and Shreya Verma
Reviewed by Dimitri Furman, Joe Sack, Sanjay Mishra, Denzil Ribeiro, Mike Weiner, Rajesh Setlem
Graphs are a very common way to represent networks and relationships between objects. Historically, it is not easy to represent such data structures in relational databases like SQL Server and Azure SQL DB. To address this requirement, in November 2016 (through a private preview program for a set of early adopter customers) we introduced extensions to which allow us to natively store and query graphs inside the database on Azure SQL DB.
We recently made these features publicly available as part of the SQL Server 2017 CTP 2.0 release (note that the feature is still in private preview for Azure SQL DB at this time). Please review this related blog post for an overview of the feature. In our blog post we look at a typical use case for graph data in SQL Server and Azure SQL DB.
Scenario
A common scenario we’ve seen with our early adopter customers is their interest to use graph technology to implement ‘recommendation systems’. For this walkthrough, imagine we have to implement a recommendation system for songs. Specifically, let’s imagine a scenario where there’s a user who likes Lady Gaga’s song ‘Just Dance‘. Now, our objective is to implement a recommendation system which will suggest songs which are similar to ‘Just Dance’. So, how do we get started? First, we need data!
What data can we use?
Many approaches to implementing recommendation systems involve using two distinct sets of data: one which contains users, and the other which contains details of the entities that those users are related to.

In retail scenarios, these would be the products purchased by the user.
In our current scenario, these are the songs which those users listened to.

It so happens that there is an amazing source of such data for songs and ‘user tastes’ (which songs did each user listen to) available online. This dataset is called the Million Song Dataset (MSD), and while it has a lot of other information, the specific subset of data that is of immediate interest to us is summarized below:

The list of all the songs is contained in a delimited file available  here. There are a million songs in this dataset.
On the MSD website there is a link to another dataset called the  ‘User Taste Profile’ data which contains (anonymized) user listening profiles and that is available here. There are a million unique users, and a total of 48 million ‘relationships’ (each corresponding to a row in this file) in this dataset.

What algorithm?
Now that we know what data is available to us, let’s think about the algorithm to be used. A standard approach called collaborative filtering can be used in conjunction with our graph data. Presented below is a simplified graphical representation of the algorithm that we will use.
As you can see from the animation, the algorithm is quite simple:

First, we identify the user and ‘current’ song to start with (red line)
Next, we identify the other users who have also listened to this song (green line)
Then we find the other songs which those other users have also listened to (blue, dotted line)
Finally, we direct the current user to the top songs from those other songs, prioritized by the number of times they were listened to (this is represented by the thick violet line.)

The algorithm above is quite simple, but as you will see it is quite effective in meeting our requirement. Now, let’s see how to actually implement this in SQL Server 2017.
Implementation
To begin, we recommend that you quickly review this feature overview video as well as the official documentation links for more details on the new functionality:

An overview of Graph data in SQL Server
Architecture details for Graph data in SQL Server
The official sample for Graph data in SQL Server

Once you have the background, it’s easy to understand how to represent the scenario as ‘graph tables’ in SQL Server 2017. We will create two ‘node’ tables – one for the users and one for the songs. We will then ‘connect’ these two node tables with an ‘edge’ table. Here’s a quick visual summary of what we will be doing:
Importing the data
Now, let’s get to the nuts and bolts! The first step is to declare tables into which we will insert the source data. These ‘staging’ tables are ‘regular’ tables and have no ‘graph’ attributes. Here are the scripts for this:

The next step is to use the OPENROWSET BULK functionality to rapidly ingest the text files into their staging tables in SQL Server. Here are the scripts for this:

Constructing the graph
Once we have the raw data in staging tables, we can then ‘convert’ them into their Graph equivalents. Here are the table definitions; note the usage of AS NODE and AS EDGE to define the tables involved in the graph:

To actually ‘convert’ the data, we use INSERT…SELECT statements as shown below:

Querying the graph

Now that we have all the data in the ‘graph’ form, we can proceed to use the new MATCH function to express our query over the set of nodes and edges. The query below finds songs that are similar to Lady Gaga’s song called ‘Just Dance’!

Optimizing performance

The above query performs relatively quickly (in around 3 seconds on a laptop with an i7 processor). Consider that this query has to deal with a million users, a million songs and 48 million relationships between those entities. Most of the cost is taken by the time to scan through the tables, one row at a time and then match them using hash joins, as you can visualize by looking at the execution plan:

While 3 seconds is not bad, can we make this even faster? The good news is that in SQL Server 2017 CTP 2.0, graph tables support clustered columnstore indexes. While the compression offered is definitely going to help reduce I/O, the bigger benefit is that queries on these tables leverage the ‘batch-mode’ execution which allows much faster execution of queries. This is really useful for us given that the above query is doing large aggregations (GROUP BY). Let’s proceed to create these clustered columnstore indexes:

Once we create these indexes, the performance actually improves substantially and reduces the query execution time to half a second, which is 6x faster than before. That’s really impressive considering the sheer amount of data that the query needs to look at to arrive at the result!
Let’s take a minute to look at the new execution plan. Observe the ‘Batch mode’ execution highlighted below:

The other interesting thing to note is the new adaptive join type highlighted above. This is great to see – queries on graph data benefit with these new query processing improvements inside SQL Server 2107!
Let’s summarize the ‘before’ and ‘after’ states:

Query execution time (seconds)

Logical Reads (for the Likes table)

Space occupied by the Likes table on disk

Heap tables

3.6

588388

3.4GB

Clustered columstore

0.6

174852

1.7GB

In summary, having graph data inside SQL Server allows database administrators and developers to leverage the familiar, mature and robust query processing capabilities within SQL Server. This is crucial to reducing the learning curve and likely complexity associated with using other technologies to store and query graph data.
Visualizing graphs

While we can use external applications and tools like PowerBI to visualize graphs, the ‘icing on the cake’ is the fact that we can use R Services in SQL Server to visualize graph data. With an open-source R package called ‘igraph’ we can visualize graphs relatively easily and render them to standard image formats like PNG. Here is a code snippet showing you how that can be done:

Here’s a section of the visualization (refer the comments in the above script to understand what the visualization represents) generated. While it is quite basic, but as you can see it is functionally very useful:

Conclusion
The support for graph data in SQL Server 2017 is an exciting new development and opens up doors to a new category of workloads which can leverage this functionality. It is one more step in bringing algorithms and intelligence closer to where the data resides.
Recommendation systems (such as the simple example presented here), fraud detection systems, content and asset management and many other scenarios can also benefit from the integration that graph data in SQL Server 2017 offers. The support for graph data in the database will be also be publicly available for Azure SQL DB in due course of time.
The complete code for this walkthrough is available here. Please use the Comments section below to ask questions and provide your feedback. We are eager to hear from you!
Citations
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.
The Echo Nest Taste profile subset, the official user data collection for the Million Song Dataset, available at: http://labrosa.ee.columbia.edu/millionsong/tasteprofile

https://blogs.msdn.microsoft.com/sqlcat/2017/04/21/build-a-recommendation-system-with-the-support-for-graph-data-in-sql-server-2017-and-azure-sql-db/

? ????? ??????? ????????? ????? ?????? SQL Server 2017!

?? ???? ???????? Microsoft ???????????? ?????? ????????? ?????? SQL Server! Community Technology Preview (CTP) 2.0 — ??? ?????? ??????????????? ?????? SQL Server 2017, ????????? ??? ??? Windows, ??? ? ??? Linux. ? ???? ??????????????? ??????? ?? ???????? ??? ????? ????????????, ? ??? ????? ??????????? ??????? ??????????? ????????? ? ?????????????? Python ? ???????????? ? ?????????????? ??????, ???????????…

https://blogs.technet.microsoft.com/rutechnews/2017/04/21/sql-server-2017-community-technology-preview-2-0-now-available/

farihasmith: Work. Play. Or do both, with the #HP Spectre #x360 and #Windows10 https://t.co/UL8Xmov2jG #server #software #tools #…https://t.co/gfOQ5Ebjzo

farihasmith: Work. Play. Or do both, with the #HP Spectre #x360 and #Windows10 https://t.co/UL8Xmov2jG #server #software #tools #…https://t.co/gfOQ5Ebjzo

Microsoft R Server 9.1 now available

During yesterday’s Data Amp event, Microsoft announced the new Microsoft R Server 9.1, which is available now. In addition to several advanced […]
The post Microsoft R Server 9.1 now available appeared first on Microsoft + Open Source.

http://open.microsoft.com/2017/04/20/microsoft-r-server-9-1-now-available/

Combating a spate of Java malware with machine learning in real-time

In recent weeks, we have seen a surge in emails carrying fresh malicious Java (.jar) malware that use new techniques to evade antivirus protection. But with our research team’s automated expert systems and machine learning models, Windows 10 PCs get real-time protection against these latest threats.
Attackers are constantly changing their methods and tools. We know from many years of research into malware and cybercriminal operations that cybercriminals have go-to programming languages for their malicious activities, but they switch from time to time to slip past security solutions. For instance, we recently tracked how cybercriminals have changed how they use NSIS installers in order to evade AV and deliver ransomware.
To help deliver real-time protection, our researchers use the Microsoft intelligent security graph, a robust automated system that monitors threat intelligence from a wide network of sensors. This system includes machine learning models, which drive proactive and predictive protection against fresh threats.
Tracking malicious email campaigns
Our sensors first picked up signs of the Java spam campaigns at the start of the year. Our automated tools, which can sort and classify massive volumes of malicious emails, showed us actionable intelligence about the surge of Java malware-bearing emails.
These emails use various social engineering techniques to lure recipients to open malicious attachments. Many of the emails are in Portuguese, but we’re also seeing cases in English. They pretend to be notifications for billing, payment, pension, or other financial alerts.
Here are the most popular subject line and attachment file name combinations used in the email campaigns:

Subject
Attachment file name

Segue em anexo Oficio Numero:
Decisão-Judicial.zip

Serviços de Cobranças Imperio adverte, Boleto N
2Via_Boleto_N.zip

“Cobrança Extrajudicial” Imperio Serviços de Cobranças
2Via_Boleto_N.zip

Payment Advice
Payment Advice.rar

Curriculum Vitae
Curriculum_.zip

FGTS Inativo – – Disponivel para saque em
SALDO_FGTS_MP_.zip

FGTS Inativo – – Disponivel para saque em
FGTS_-_MP_.zip

Extrato_FGTS_disponivel_em_sua_conta_inativa_de_N
FGTS_Disponivel_N.zip

NEW PURCHASE ORDER (TOP URGENT)
BLUERHINETECHNOLOGY_EXPORT_PURCHASE_ORDER.zip

NF-e . Emitente – GLOBECALL DO BRASIL LTDA.
NF-e-.zip

Figure 1. Most popular subject line and attachment file name combinations in email campaigns
The attachments are usually .zip or .rar archive files that contain the malicious .jar files. The choice of .jar as attachment file type is an attempt by cybercriminals to stay away from the more recognizable malicious file types: MIME, PDF, text, HTML, or document files.

Figure 2. Sample malicious email carrying Java malware in a .zip file
Tracking updates in malicious code
In addition to information about the email campaigns, our monitoring tools also showed another interesting trend: throughout the run of the campaigns, an average of 900 unique Java malware files were used in these campaigns every day. At one point, there were 1,200 unique malicious Java files in a single day.

Figure 3. Volume of unique Java malware used in email campaigns
These Java malware files are variants of old malware with updated code that attempt to evade detection by security products.
The most notable change we saw in these new variants of Java malware is in the way they obfuscate malicious code. For instance, we saw the following obfuscation techniques:

Using a series of append operators and a string decryption function

Figure 4. Sample obfuscated Java malware code
Using overly long variable names, making them effectively unreadable

Figure 5. Sample obfuscated Java malware code
Using excessive codes, making code tracing more difficult

Figure 6. Sample obfuscated Java malware code

Obfuscated codes can make analysis tedious. We use automated systems that detonate the attachments, effectively bypassing obfuscation. When malware is detonated, we see the malicious intent and gain intelligence that we can use to prevent attacks.
Our tools log malicious behaviors observed during detonation and use these to detect new and unknown attachments. These malicious behaviors include:

Figure 7. Sample Java malware trace logs
From threat intelligence to real-time protection
Through automated analysis, machine learning, and predictive modeling, we’re better able to deliver protection against the latest, never-before-seen malware. These expert systems give us visibility and context into attacks as they happen, allowing us to deliver real-time protection against the full range of threats.
Context-aware detonation systems analyze millions of potential malware samples and gather huge amounts of threat intelligence. This threat intelligence enriches our cloud protection engine, allowing us to block threats in real-time. In addition to the Java malware, we also detect the payloads, which are usually online banking Trojans like Banker and Banload, or Java remote access Trojans (RATs) like Jrat and Qrat.

Figure 8. Automated systems feed threat intelligence to cloud engines and machine learning models, which result in real-time protection against threats
Threat intelligence from the detonation system constantly enhances our machine learning models. New malicious file identifiers from the analysis of the latest threats are added to machine learning classifiers, which power predictive protection.
This is how we use automation, machine learning, and the cloud to deliver protection technologies that are smarter and stronger against new and unknown threats. We automatically protect Windows PCs against more than 97% of Java malware in the wild.

Figure 9. Breakdown of Java malware detection methods
Conclusion: Real-time protection against relentless threats
The email campaigns distributing Java malware account for a small portion of cybercriminal operations that deliver new malware and other threats. Cybercriminals are continuously improving their tools and modus operandi to evade system protections.
Our research team is evolving how we combat cybercrime by augmenting human capacity with a combination of sensors, automated processes, machine learning, and cloud protection technologies. Through these, we are better able to monitor and create solutions against these threats.
These protections are available in the security technologies that are built into Windows 10. And with the  Creators Update, up-to-date computers get the latest security features and proactive mitigation.
Windows Defender Antivirus provides real-time protection against threats like Java malware and their payloads by using automation, machine learning, and heuristics.
In enterprise environments, Office 365 Advanced Threat Protection blocks malicious emails from spam campaigns, such as those that distribute Java malware, using machine learning capabilities and threat intelligence from the automated processes discussed in this blog.
Device Guard locks down devices and provides kernel-level virtualization-based security, allowing only trusted applications to run.
Windows Defender Advanced Threat Protection alerts security operations teams about suspicious activities on devices in their networks.
It is also important to note that Oracle has been enforcing stronger security checks against legitimate applications using Java. For instance, starting with Java 7 Update 51, Java does not allow Java applications that are not signed, are self-signed, or are missing permission attributes. Oracle will also start blocking .jar files signed with MD5, requiring instead signing with SHA1 or stronger.
However, the Java malware discussed in this blog are equivalent to executable files (as opposed to Java applet). Here are some additional tips to defend against Java malware in enterprise environments:

Remove JAR in file type associations in the operating system so that .jar files don’t run when double-clicked; .jar files must be manually executed using command line
Restrict Java to execute only signed .jar files
Manually verify signed .jar files
Apply email gateway policy to block .jar as attachments

 
Duc Nguyen, Jeong Mun, Alden Pornasdoro
Microsoft Malware Protection Center

https://blogs.technet.microsoft.com/mmpc/2017/04/20/combating-a-wave-of-java-malware-with-machine-learning-in-real-time/

Making it easier to revert

Sometimes when things go wrong in my environment, I don’t want to have to clean it all up — I just want to go back in time to when everything was working. But remembering to maintain good recovery points isn’t easy.
Now we’re making it so that you can always roll back your virtual machine to a recent good state if you need to. Starting in the latest Windows Insider build, you can now always revert a virtual machine back to the state it started in.
In Virtual Machine Connection, just click the Revert button to undo any changes made inside the virtual machine since it last started.

Under the hood, we’re using checkpoints; when you start a virtual machine that doesn’t have any checkpoints, we create one for you so that you can easily roll back to it if something goes wrong, then we clean it up once the virtual machine shuts down cleanly.
New virtual machines will be created with “Use automatic checkpoints” enabled by default, but you will have to enable it yourself to use it for existing VMs.  This option can be found in Settings -> Checkpoints -> “Use automatic checkpoints”

Note: the checkpoint will only be taken automatically when the VM starts if it doesn’t have other existing checkpoints.
Hopefully this will come in handy next time you need to undo something in your VM. If you are in the Windows Insider Program, please give it a try and let us know what you think.
Cheers,
Andy

https://blogs.technet.microsoft.com/virtualization/2017/04/20/making-it-easier-to-revert/

Introducing Interleaved Execution for Multi-Statement Table Valued Functions

For SQL Server vNext and Azure SQL Database, the Microsoft Query Processing team is introducing a new set of adaptive query processing improvements to help fix performance issues that are due to poor cardinality estimates. Improvements in the adaptive query processing space include batch mode memory grant feedback, batch mode adaptive joins, and interleaved execution.  In this post, we’ll introduce interleaved execution.

SQL Server has historically used a unidirectional “pipeline” for optimizing and executing queries.  During optimization, the cardinality estimation process is responsible for providing row count estimates for operators in order to derive estimated costs.  The estimated costs help determine which plan gets selected for use in execution.  If cardinality estimates are incorrect, we will still end up using the original plan despite the poor original assumptions.
Interleaved execution changes the unidirectional boundary between the optimization and execution phases for a single-query execution and enables plans to adapt based on the revised estimates. During optimization if we encounter a candidate for interleaved execution, which for this first version will be multi-statement table valued functions (MSTVFs), we will pause optimization, execute the applicable subtree, capture accurate cardinality estimates and then resume optimization for downstream operations.

While many DBAs are aware of the negative effects of MSTVFs, we know that their usage is still widespread.  MSTVFs have a fixed cardinality guess of “100” in SQL Server 2014 and SQL Server 2016, and “1” for earlier versions. Interleaved execution will help workload performance issues that are due to these fixed cardinality estimates associated with multi-statement table valued functions.
The following is a subset of an overall execution plan that shows the impact of fixed cardinality estimates from MSTVFs (below shows Live Query Statistics output, so you can see the actual row flow vs. estimated rows):

Three noteworthy areas in the plan are numbered 1 through 3:

We have a MSTVF Table Scan that has a fixed estimate of 100 rows. But for this example, there are 527,592 flowing through this MSTVF Table Scan as seen in Live Query Statistics via the “527597 of 100” actual of estimated – so our fixed estimate is significantly skewed.
For the Nested Loops operation, we’re still assuming only 100 rows are flowing through the outer reference. Given the high number of rows actually being returned by the MSTVF, we’re likely better off with a different join algorithm altogether.
For the Hash Match operation, notice the small warning symbol, which in this case is indicating a spill to disk.

Now contrast the prior plan with the actual plan generated with interleaved execution enabled:

Three noteworthy areas in the plan are numbered 1 through 3:

Notice that the MSTVF table scan now reflects an accurate cardinality estimate. Also notice the re-ordering of this table scan and the other operations.
And regarding join algorithms, we have switched from a Nested Loop operation to a Hash Match operation instead, which is more optimal given the large number of rows involved.
Also notice that we no longer have spill-warnings, as we’re granting more memory based on the true row count flowing from the MSTVF table scan.

What makes a query eligible for interleaved execution?
For the first version of interleaved execution, MSTVF referencing statements must be read-only and not part of a data modification operation. Also, the MSTVFs will not be eligible for interleaved execution if they are used on the inside of a CROSS APPLY.
How do I enable interleaved execution?
To have your workloads automatically eligible for this improvement, enable compatibility level 140 for the database in SQL Server 2017 CTP 2.0 or greater and in SQL Azure Database.
What performance improvements can we expect to see?
This depends on your workload characteristics – however we have seen the greatest improvements for scenarios where MSTVFs output many rows that then flow to other operations (for example, joins to other tables or sort operations).
In one example, we worked with a financial services company that used two MSTVF-referencing queries and they saw the following improvements:

 
For MSTVF “A”, the original query ran in 135 seconds and the plan with interleaved execution enabled ran in 50 seconds. For MSTVF “B”, the original query ran in 21 seconds and the plan with interleaved execution enabled ran in 1 second.
A special thanks to Arun Sirpal, the Senior Database Administrator who conducted this testing and worked with our team during private preview!
In general, the higher the skew between the estimated vs. actual number of rows, coupled with the number of downstream plan operations, the greater the performance impact.
What is the overhead?
The overhead should be minimal-to-none. MSTVFs were already being materialized prior to the introduction of interleaved execution, however the difference is that now we’re now allowing deferred optimization and are then leveraging the cardinality estimate of the materialized row set.
What could go wrong?
As with any plan affecting changes, it is possible that some plans could change such that with better cardinality we get a worse plan. Mitigation can include reverting the compatibility level or using Query Store to force the non-regressed version of the plan.
How does interleaved execution work for consecutive executions?
Once an interleaved execution plan is cached, the plan with the revised estimates on the first execution is used for consecutive executions without re-instantiating interleaved execution.
How can I track when interleaved execution is used?
You can see usage attributes in the actual query execution plan:

Plan attribute
Description

ContainsInterleavedExecutionCandidates
Applying to the QueryPlan node, when “true”, it means the plan contains interleaved execution candidates.

IsInterleavedExecuted
The attribute is inside the RuntimeInformation element under the RelOp for the TVF node. When “true”, it means the operation was materialized as part of an interleaved execution operation.

You can also track interleaved execution occurrences via the following XEvents:

XEvent
Description

interleaved_exec_status
This event fires when interleaved execution is occurring.

interleaved_exec_stats_update
This event describes the cardinality estimates updated by interleaved execution.

Interleaved_exec_disabled_reason
This event fires when a query with a possible candidate for interleaved execution does not actually get interleaved execution.

What does the estimated plan show?
A query must be executed in order to allow interleaved execution to revise MSTVF cardinality estimates.  However, the estimated execution plan will still show when there are interleaved execution candidates via the ContainsInterleavedExecutionCandidates attribute.
What if the plan is manually cleared or automatically evicted from cache?
Upon query execution, there will be a fresh compilation that uses interleaved execution.
Will this improvement work if I use OPTION (RECOMPILE)?
Yes.  A statement using OPTION(RECOMPILE) will create a new plan using interleaved execution and not cache it.
Will Query Store capture and be able to force an interleaved execution plan?
Yes.  The plan will be the version that has corrected cardinality estimates based on initial execution.
Will you be expanding the scope of interleaved execution in a future version beyond MSTVFs?
Yes. We are looking at expanding to additional problematic estimation areas.
Thanks for reading, and stay tuned for more blog posts regarding the adaptive query processing feature family!

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/04/19/introducing-interleaved-execution-for-multi-statement-table-valued-functions/