FOR JSON performance – simple experiment

How fast is FOR JSON compared to generating JSON in application layer?
In this post I will compare performance of FOR JSON and traditional approach for generating JSON in application layer.
This is not a general performance comparison. I will focus on one simple and common use case – returning one row from a table with a set of related rows from a child table. In this experiment I will use SalesOrderHeader -> SalesOrderDetails structure from AdventureWorks database where we have one to many relationship between SalesOrderHeader and SalesOrderDetails.
Problem
I have an application that reads SalesOrderHeader and SalesOrderDetails rows and returns them as JSON shown in the following code:
{“SalesOrderID”:55859,”RevisionNumber”:8,”OrderDate”:”2013-09-07T00:00:00″, “DueDate”:”2013-09-19T00:00:00″,”ShipDate”:”2013-09-14T00:00:00″,”Status”:5, “Sales.SalesOrderDetail”:[ {“SalesOrderID”:55859,OrderQty”:1,”ProductID”:779,”SpecialOfferID”:1,”UnitPrice”:2319.9900}, {“SalesOrderID”:55859,”SalesOrderDetailID”:57519,”OrderQty”:1,”ProductID”:871″}, {“SalesOrderID”:55859,”SalesOrderDetailID”:57520,”UnitPrice”:4.9900,”UnitPriceDiscount”:0.0000}, {“SalesOrderID”:55859,”SalesOrderDetailID”:57521,”OrderQty”:1,UnitPrice”:34.9900,”UnitPriceDiscount”:0.0000} ]}
 
In the following examples, I will explain how you can generate this JSON.
Option 1 – Formatting JSON in application
If you don’t want to format JSON in database, you would need to read one record from the parent table and then read all related records from the child table.
declare @id int = (ROUND(rand() * 31464, 0) + 43660) SELECT * FROM Sales.SalesOrderHeader WHERE SalesOrderID = @id; SELECT * FROM Sales.SalesOrderDetail WHERE SalesOrderID = @id
Then you can use some formatter to generate JSON text in application layer (e.g. Json.Net). Problem in this method is the fact that you will need to generate two queries to get parent row and related child rows.
Option 2 – JOIN parent and child table
 Second option would be to join these two tables in database layer and return flat result set that joins them, e.g.:
declare @id int = (ROUND(rand() * 31464, 0) + 43660) SELECT * From Sales.SalesOrderHeader LEFT JOIN Sales.SalesOrderDetail ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID WHERE Sales.SalesOrderHeader.SalesOrderID = @id
Here you will get repeated header values for every details row so you would need to remove duplicated information in application layer when you generate JSON.
Option 3 – Format related rows as JSON
Third option would be to return all information from SalesOrderHeader table and format related child rows as JSON in column expression:
declare @id int = (ROUND(rand() * 31464, 0) + 43660) SELECT *, (SELECT * From Sales.SalesOrderDetail WHERE SalesOrderID = @id FOR JSON PATH) AS Details From Sales.SalesOrderHeader WHERE SalesOrderID = @id
 
Here, you are getting single row as a result so you don’t need to remove duplicates, but you would need to format information from header as JSON key:value pairs and include related child information that are already formatted as JSON.
Option 4 – Format JSON in database layer using FOR JSON AUTO
Last option would be to join tables in SQL Server and format them using FOR JSON AUTO. This is similar to option 2, but we are delegating formatting to database layer:
declare @id int = (ROUND(rand() * 31464, 0) + 43660) SELECT * From Sales.SalesOrderHeader LEFT JOIN Sales.SalesOrderDetail ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID WHERE Sales.SalesOrderHeader.SalesOrderID = @id FOR JSON AUTO
This query will prepare results you need in database layer and return it as text. Problem with this approach is that response is bigger so make sure that you have good connection between app layer and database server. In this case you can just stream results to client without any additional formatting.
Performance results
I’m using Adam Machanic’s SQLQueryStress tool to generate workloads and test workloads. I’m executing 50 iterations of 20 concurrent batches for all three options. Results are shown in the following figure:
 

 
As you can see, we don’t have big difference in query performance in all queries results vary between 7.8 to 8.3 seconds. Note that here I’m not counting additional time that would be required to format JSON in application layer. Therefore, conclusions in this case are:

Response time of queries that use FOR JSON is not significantly increased comparing to equivalent queries that you would need to run to format JSON in application layer.
You don’t need extra processing time in application layer and additional logic that will join parent (SalesOrderHeader) and child (SalesOrderDetails) tables/entites.

Note that this might not be applicable to any configuration and use case. Results might vary depending on you connection bandwidth between application layer and database server, number of concurrent threads, etc. However in mu case I’m getting similar results in all cases.
Also, this experiment do not covers other cases where you return multiple parent rows, with different numbers of fields returned from parent and child tables, or many child table collections. Feel free to repeat this experiment, try other use cases and report performance results you are getting.
 
 

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/10/for-json-performance-simple-experiment.aspx