Our management and reporting server was to be replaced. It was running on Windows Server 2008 R2 with SQL Server 2012 and collecting data of all our SQL Server VMs with PowerShell scripts. The data collected is serverstate, disk usage, number and details of installed instances, existing databases and many things more.
Not fully working…
Our new monitoring and reporting system should be running on a Windows Server 2012 R2 with SQL Server 2016, with the database contianing the collected data in an AllwaysOn Availability Group. The database is created in an instant and the scripts and task scheduler entries copied from the old server. But many of the scripts fail. Mostly they report, that they can’t reach some servers, or that the WMI provider “is missing” on some servers. But not all fail on the same spot nor with the same message.
Debugging from scratch
Due to the varying error point and error message I rework all scripts, reformat for readability and add comments where I can figure out the idea behind the old code.
In the same step I unify codeparts, reading out some information from system variables instead of hardcoding them in every script.
For a unified logging and to report the job-outcome back into the database I write two small PowerShell modules: One to initialize the common variables to log the state and outcome of the script (number of tried and failed servers, instances databases) and the needed standard functions (AddToLog, UpdateJobLog). And another to read out the system variables, telling the scripts the location of the reporting database.
The job outcome is being written into the reporting database with an UPDATE statement. To be able to read out previous outcomes and to get used to the new SQL Server 2016 functionalities, I implement the JobLog table with a “temporal table”. Works great! 😉
The error gets visible – SQL Server WMI
Now that all scripts log the same way it gets clear where the errors come from. A Google research confirms: The SQL Server 2016 WMI objects cannot communicate with SQL Server 2012. Since the used modules SQLPS and SQLPSX load the WMI objects of the installed SQL Server version , the scripts fail only on the new server with SQL Server 2016. The whole scenario is the following:
- WMI objects of SQL Server 2012 can communicate with all previous versions of SQL Server and SQL Server 2012.
- WMI objects of SQL Server 2014 can communicate with all previous versions of SQL Server up to 2008 R2 and with SQL Server 2014.
- WMI objects of SQL Server 2016 can communicate with all previous versions of SQL Server up to 2008 R2 and with SQL Server 2016.
The most visual source is https://www.codykonior.com/2015/05/30/whats-wrong-with-sqlps/. It is telling the following:
The developers of SQL Server 2012 WMI objects have built into their dll “Microsoft.SqlServer.SqlWmiManagement.dll” a three-case statement to build the ManagementPath. The method “TryConnect” tests one connection method after the other. If the first fails, it tries the next, if this also fails again the next. The first variant is the one for SQL Server 2012, the next the one for “older” SQL Server, as to be seen by the names: The first variant is called “GetManagementPath”, the two for the “older” variants “GetManagementPathKatmai” (SQL Server 2008 is called Katmai) and GetManagementPathPreKatmai”. In the first variant the SQL Server version is handed over by a fix “11” in the code.
For the developers of the SQL Server 2014 WMI objects the task was therefore to adopt the connection-setup in a way so she can communicate also with 2014. But what did they do? They did not adapt “TryConnect” by adding a fourth step to try with a “12” in the path to connect to a SQL Server 2014. No, they adapt “GetManagementPath” by replacing the hardcoded “11” with a “12”. So SQL Server 2014 WMI tests first a connection to SQL Server 2014, in case of failure to Katmai (2008) and last to “older than Katmai”. 2012 just gets “missed out”.
The error was reported by the community. But the developers of the SQL Server 2016 WMI in the CTPs but also in the final version again adapted “GetManagementPath” instead of “TryConnect”.
First fix fails
So I thought first, well, so this is easy. I collect in the reporting script the servers to which I was not able to connect and at the end start a new PowerShell session with the same script, but force the load of an older SQLPS version and therefore older WMI version. But the way SQLPS and SQLPSX load their dependencies, all WMI and SMO of SQL Server get loaded / stay in GAC. So the connections to the old servers also fail in the second run. My effort has not been completely in vain, since until the point of mastering to start a new, “clean” PowerShell session, I learned much about PowerShell coding…
The solution too came to me from the before-mentioned source: https://www.codykonior.com/2015/05/30/whats-wrong-with-sqlps/
Cody Konior describes an easy to understand way to fix the WMI objects with the executables ildasm.exe and ilasm.exe – which are included with (mostly, not in Visual Studio Code) every version of Visual Studio – . One searches and copies from the GAC the highest Microsoft.SqlServer.SqlWmiManagement.dll in a temporary folder, decompiles it with ildasm.exe (ildasm /all C:TempMicrosoft.SqlServer.SqlWmiManagement_Old.dll /out=C:TempMicrosoft.SqlServer.SqlWmiManagement_Old.il) to intermediate language and copies it again to Microsoft.SqlServer.SqlWmiManagement_patched.il to simplify a later comparison. Then this file gets edited by copying the function GetManagementPath, once for every version to be connected to. By doing this, one gets new functions GetManagementPath2012 and GetManagementPath2014. In these, the version gets modified from “11” to “12” and “13”. Then one adds more tries in the function TryConnect: After a failed call toGetManagementPath, GetManagementPath2014 gets tried, then GetManagementPath2012, followed by the already existing Katmai and PreKatmai. Like this, all versions get tried one after the other. Afterwards the file is to be recompiled (ilasm /dll C:TempMicrosoft.SqlServer.SqlWmiManagement_Patched.il /output=C:TempMicrosoft.SqlServer.SqlWmiManagement_Patched.dll) and copied into the GAC.
As last step, the native images (MSIL) have to be removed so they are no longer used preferredly. The complete run-through, all steps in PowerShell code are to be found on the source mentioned above (https://www.codykonior.com/2015/05/30/whats-wrong-with-sqlps/), as well as a unified patch for understanding. A modified dll is not there, most possibly due to intellectual property or other rights.
Like this, I modified the WMI dll on the new server and hey! The scripts run finally completely to the end, reporting and collecting data of all servers. The conquest is at an end! And I have:
- a working reporting and monitoring system
- learned how to write PowerShell modules
- Two new PowerShell modules
- Gathered much experience in coding PowerShell
- edited IL for the first time in my life 😉