An initiative of Mobiliar IT

Make AON and Linked Servers on SQL Server 2017 on Linux RC2 work again when having IPv6 disabled.

Are you testing SQL Server 2017 on Linux? It is a great idea from Microsoft to bring SQL Server to Linux and they did a great job so far! So it was some kind of surprise as my AON availability group stopped working when I installed RC2 on my test farm:

The problem

As soon as I updated a replica, it lost connection and was not able to re-establish it. It is also not some sort of version conflict: Even after upgrading every replica of the group, they were not able to communicate with each other. Neither a reboot nor a “remove and add” to the availability group helped. I even stopped and removed the whole group, but to no success. Every step succeeded up to creating the group on the primary. But when I called the “join availability group” command on any secondary, it waited for 5 minutes and then failed with a “connection timeout”. An example of a failing command:

-- on primary:
CREATE AVAILABILITY GROUP [LinuxTestAG1]
 WITH (CLUSTER_TYPE = EXTERNAL)
 FOR REPLICA ON
 N'myRepl1' WITH (
 ENDPOINT_URL = N'tcp://myRepl1.myDomain.test:5022',
 AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
 FAILOVER_MODE = EXTERNAL,
 SEEDING_MODE = AUTOMATIC,
 SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
 ),
 N'myRepl2' WITH ( 
 ENDPOINT_URL = N'tcp://myRepl2.myDomain.test:5022', 
 AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
 FAILOVER_MODE = EXTERNAL,
 SEEDING_MODE = AUTOMATIC,
 SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
 )

-- on secondary:
ALTER AVAILABILITY GROUP [LinuxTestAG1] JOIN WITH (CLUSTER_TYPE = EXTERNAL)

Surprisingly, I was able to ping every replica from each other replica, also logging in with sqlcmd was possible from any replica to another. The SQL Server logs did not show much valuable information. I just found a line in the log of the primary replica that when running the “create availability group” command it created a connection to the secondary and then got a timeout.

The quest

Thanks to the EAP I could investigate the problem together with Microsoft when I did not find a solution after putting in some effort. We tried to set up linked servers from one replica to another, but this also failed, resulting in a connection error. The error occurred wether we used only hostnames or FQDNs of the replicas.

Then we caught the first glimpse of hope: Using IP addresses instead of hostnames works! As soon as I defined a linked server using the IP of the target server, communication worked. So I dropped the availability group and created it new using IPs in the addresses (keeping the hostnames as replica names). An example of a working command:

-- on primary:
CREATE AVAILABILITY GROUP [LinuxTestAG1]
   WITH (CLUSTER_TYPE = EXTERNAL)
   FOR REPLICA ON
     N'myRepl1' WITH (
         ENDPOINT_URL = N'tcp://192.168.31.15:5022',
         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
         FAILOVER_MODE = EXTERNAL,
         SEEDING_MODE = AUTOMATIC,
         SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
         ),
     N'myRepl2' WITH ( 
         ENDPOINT_URL = N'tcp://192.168.31.16:5022', 
         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
         FAILOVER_MODE = EXTERNAL,
         SEEDING_MODE = AUTOMATIC,
         SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
         )

-- on secondary:
ALTER AVAILABILITY GROUP [LinuxTestAG1] JOIN WITH (CLUSTER_TYPE = EXTERNAL)

Still we wondered why it works like that. It could not be a problem of name resolving, since ping worked and also printed in its output the IP of the replica.

The solution

So we extended our search: Into the surroundings of SQL Server, the Linux system on which we were running the SQL Server. And there it was: IPv6 is disabled on our systems. But not by using systemctl, but instead in the bootmanager (boot cmdline contains “ipv6.disable=1”).

In this case, SQL Server seems not to detect that IPv6 is disabled and tries to establish a connection to the other replica using both IPv4 and IPv6. Since it does not “know” that IPv6 is disabled and the connection fails, it gives up on it, ignoring the fact that the IPv4 connection was successful. It’s because of this that SQL Server errorlog also lists a successful connection only to fail some seconds later.

A big “Thank You” to Microsoft for the time and sweat spent together in figuring this out. Thanks to you guys it didn’t take as long as I feared in the beginning.

The workaround

Talking to Microsoft I understand this is something they will probably address in a later release. Until then, if you want to keep IPv6 disabled in boot cmdline and want to use linked servers or AON availability groups on “SQL Server 2017 on Linux RC2”, you should define the servers / replica by giving the IP (v4) instead of hostname to prevent a connection attempt on IPv6.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Basic HTML is allowed. Your email address will not be published.

Subscribe to this comment feed via RSS

%d bloggers like this: