Outdated Version

You are viewing an older version of this section. View current production version.

Network Errors

ERROR 1158 (08S01): Leaf Error (): Error reading packet ### from the connection socket (): Connection timed out

Issue

The presence of extremely large numbers of duplicates in combination with LOAD DATA IGNORE can cause the leaves to have to wait so long they time out.

Solution

If you see this error when running LOAD DATA IGNORE, verify that the data does not have a lot of duplicates.

ERROR 1735: Unable to connect … Timed out reading from socket

Issue

A MemSQL node is unable to connect to another MemSQL node. This may be because there is no network connectivity (such as a network problem or a firewall blocking connectivity), or because a node is overloaded with connection requests.

Solution

Here are some possible solutions to solve this problem:

  • Ensure that all nodes are able to connect to all other nodes on the configured port (the default is 3306). Update any firewall rules that block connectivity between the nodes.

    One way to verify connectivity is to run the command FILL CONNECTION POOLS on all MemSQL nodes. If this fails with the same error, then a node is unable to connect to another node.

    Info

    Some queries require different amounts of connectivity. For example, some queries only require aggregator-leaf connections while others require aggregator-leaf as well as leaf-leaf connections. As a result, it is possible for some queries to succeed while others fail with this error.

  • If all nodes are able to connect to all other nodes, the error is likely because your query or queries require opening too many connections at once. Run FILL CONNECTION POOLS on all MemSQL nodes to pre-fill connection pools. If the connection pool size is too small for your workload, adjust the max_pooled_connections configuration variable, which controls the number of pooled connections between each pair of nodes.

ERROR 1970 (HY000): Subprocess /var/lib/memsql/master-3306/extractors/kafka-extract –get-offsets –kafka-version=0.8.2.2 timed out

Issue

This error occurs when there are connectivity issues between a MemSQL node and the data source (e.g. Kafka cluster or S3). This error is particularly common when using S3 pipelines because of throttling and other S3 behavior.

Solution

To solve this issue, edit the value of pipelines_extractor_get_offsets_timeout_ms. The default value is 10000. Increase this value to eliminate the timeout error. See Pipeline System Variables for more information on this timeout variable, and to change the value, see How to Set Pipelines System Variables.

ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’

Issue

When the MySQL client connects to localhost, it attempts to use a socket file instead of TCP/IP. The socket file used is specified in /etc/mysql/my.cnf when the MySQL client is installed on the system. This is a MySQL socket file, which MemSQL does not use by default. Therefore, connecting with localhost attempts to connect to MySQL and not MemSQL.

Solutions

There are two solutions to solve this problem:

  1. Specify 127.0.0.1 as the host instead of localhost. That is, mysql -h 127.0.0.1 -u root instead of mysql -h localhost -u root.

    Info

    If you omit the host (mysql -u root), the MySQL client will implicitly use localhost.

  2. In /etc/mysql/my.cnf, change the socket value to the location of your MemSQL socket file as shown in the example below:

[client]
port          = 3306
socket        = /var/lib/memsql/data/memsql.sock

ERROR 2026 (HY000): SSL connection error: SSL_CTX_set_default_verify_paths failed

Issue

This error occurs when the incorrect path is provided for the ca-cert-pem file when using the --ssl_ca flag in the connection string to the MemSQL node.

Solution

The solution is to verify you are using the correct path to the ca-cert.pem file.

ERROR 2026 (HY000): SSL connection error: SSL is required but the server doesn’t support it

Issue

This error occurs when you attempt to create a connection into the affected memsql node and either you did not add the required SSL configurations to the memsql.cnf file, or you did add the required SSL configurations to the memsql.cnf file but you did NOT restarted the target memsql node.

Solutions

  • Check to make sure the correct ssl configurations have been written to the memsql.cnf file of the target memsql node.
  • Check to make sure the target memsql node has been restarted since updating its memsql.cnf file.

ERROR: Distributed Join Error. Leaf X cannot connect to Leaf Y.

Issue

When a distributed join occurs, the leaves within the cluster must reshuffle data amongst themselves, which requires the leaves to connect to one another. If the leaves are not able to communicate with one another, and a distributed join is touching those leaves, the distributed query will not run successfully. The inter-leaf communication that needed for distributed join queries relies on the DNS cache on each leaf. If this cache is out of sync with the current state of the leaves, the distributed join will fail.

Solution

Use the following steps to troubleshoot this scenario:

  1. Confirm you are able to access MemSQL from one leaf to another in the cluster. This will eliminate network connection issues.

    Info

    You are able to connect manually from one leaf to another because doing so does not utilize the DNS cache on the leaf.

  2. Run SHOW LEAVES on an affected leaf (e.g. leaf X) in the cluster. The Opened_Connections columns should reveal what leaves the affected leaf has open connections with. Verify that leaf Y is not in this list.

  3. When leaves connect to each other, they cache connection information (leaf-1 is at IP 192.0.2.1, leaf-2 is at IP 192.0.2.2, etc.). If the IPs of these leaves ever change the cache will not automatically update. This will ultimately result in an unsuccessful connection attempt because the other leaves in the cluster are using old IP address information. The solution is to flush the DNS cache and connection pools on all affected nodes. You can do so by running the following:

    FLUSH HOSTS;
    FLUSH CONNECTION POOLS;
    

    FLUSH HOSTS clears the DNS cache on the node. This must be performed all affected nodes in the cluster. FLUSH CONNECTION POOLS shuts down all existing connections and closes idle pooled connections.