消息关闭
    暂无新消息!
server:Intel E5 48 Cores,256GB memory(system used 180GB before crashed), 8T disk
version:mysql  Ver 14.14 Distrib 5.7.16, for Linux (x86_64)
OS:redhat7.2 

机器启用一个多月没有崩溃过,开始只有100GB数据,现在400个schema占500GB,配了主从复制,这是主库,软硬件相同的从库数据有不一致,跑大的查询也会crash。
从监控来看,每当内存占用超过160G(二楼贴图,绿色是剩余空间),就crash并自动重启,重启要花1小时,正常重启几分钟。
附日志和配置文件

2017-05-24T20:58:06.959318Z 3991095 [Note] Aborted connection 3991095 to db: 'ws2015121700001' user: 'root' host: '10.XXX' (Got timeout writing communication packets)

2017-05-25T05:59:45.648070Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4307ms. The settings might not be optimal. (flushed=690 and evicted=0, during the time.)

以上两种错误在过去几天频发

宕机时有以下日志,两次都是这样

05:59:58 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=209715200
read_buffer_size=131072
max_used_connections=224
max_threads=1000
thread_count=94
connection_count=94
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 205146143 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xef182b]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7ad8c1]
/lib64/libpthread.so.0(+0xf100)[0x7f80ffa60100]
/usr/sbin/mysqld(_Z22row_merge_drop_indexesP5trx_tP12dict_table_tm+0xa6)[0x100a0d6]
/usr/sbin/mysqld(_Z32dict_table_remove_from_cache_lowP12dict_table_tm+0x4af)[0x114e34f]
/usr/sbin/mysqld(_Z23dict_make_room_in_cachemm+0x17f)[0x114e62f]
/usr/sbin/mysqld[0x105e945]
/usr/sbin/mysqld(srv_master_thread+0xa1f)[0x106374f]
/lib64/libpthread.so.0(+0x7dc5)[0x7f80ffa58dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f80fe5141cd]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2017-05-25T06:00:21.917264Z 0 [Warning] Insecure configuration for --secure-file-priv: Location is accessible to all OS users. Consider choosing a different directory.
2017-05-25T06:00:21.917499Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.16-log) starting as process 40549 ...
2017-05-25T06:00:21.953825Z 0 [Note] InnoDB: PUNCH HOLE support available
2017-05-25T06:00:21.953880Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-05-25T06:00:21.953900Z 0 [Note] InnoDB: Uses event mutexes
2017-05-25T06:00:21.953907Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2017-05-25T06:00:21.953911Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-05-25T06:00:21.953914Z 0 [Note] InnoDB: Using Linux native AIO
2017-05-25T06:00:21.959640Z 0 [Note] InnoDB: Number of pools: 1
2017-05-25T06:00:21.959783Z 0 [Note] InnoDB: Using CPU crc32 instructions
2017-05-25T06:00:21.961859Z 0 [Note] InnoDB: Initializing buffer pool, total size = 100G, instances = 1, chunk size = 128M
2017-05-25T06:00:28.901192Z 0 [Note] InnoDB: Completed initialization of buffer pool
2017-05-25T06:00:30.068043Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-05-25T06:00:30.135765Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2017-05-25T06:00:30.912936Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 18005318784106


---------------------------

[mysqld]
port = 3306
datadir=/data/mysqldata/3306/data
tmpdir=/data/mysqltmp
socket=/data/mysqldata/3306/mysql/mysql_3306.sock
pid-file=/data/mysqldata/3306/mysql/mysqld_3306.pid
slow_query_log_file = /data/mysqldata/3306/log/mysql-slow.log
log-error=/data/mysqldata/3306/log/mysqld.err
user=mysql
sql_mode=NO_ENGINE_SUBSTITUTION,NO_AUTO_CREATE_USER
character_set_server=utf8
collation-server=utf8_general_ci

server-id=2
read-only=0
log_slave_updates=1

explicit_defaults_for_timestamp=true
skip-ssl
skip-name-resolve


#slowquerylog
slow_query_log=1
long_query_time=15
#log-queries-not-using-indexes


secure-file-priv=/data/mysqldata/3306/mysql

innodb_data_home_dir=/data/mysqldata/3306/data
innodb_data_file_path=ibdata1:2898264064;ibdata2:2898264064:autoextend
innodb_lock_wait_timeout=50
innodb_file_per_table=1
innodb_log_file_size=1024M
innodb_log_files_in_group = 3
innodb_open_files=1024000
innodb_buffer_pool_size=102400M
innodb_buffer_pool_instances=1  改过4,依旧
#innodb_additional_mem_pool_size=20M
innodb_log_buffer_size=8M
join_buffer_size=80M
sort_buffer_size=200M 降为80M后继续崩溃
innodb_checksum_algorithm=NONE
innodb_doublewrite=0
innodb_support_xa=1
innodb_thread_concurrency=0
innodb_flush_log_at_trx_commit=2
innodb_max_dirty_pages_pct=50
innodb_use_native_aio=1
innodb_stats_persistent = 1
innodb_spin_wait_delay= 6
innodb_adaptive_flushing = 1
innodb_flush_neighbors = 0
innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_io_capacity = 4000
innodb_purge_threads=1
innodb_adaptive_hash_index=0
#innodb_force_recovery=1

lock_wait_timeout=60

max_connections=1000

log_bin=mysql-bin
binlog_format=mixed
max_binlog_size=1073741824
log_bin_trust_function_creators=1
expire_logs_days=15 是不是过大了? 占了几百G


low_priority_updates=1  
key_buffer_size=200M
table_open_cache = 8000
back_log=1500
query_cache_type=0
thread_cache_size = 300
table_open_cache_instances=16
lower_case_table_names=1

symbolic-links=0

max_allowed_packet=512M

[mysql]
socket=/data/mysqldata/3306/mysql/mysql_3306.sock

[client]
socket=/data/mysqldata/3306/mysql/mysql_3306.sock

3个回答

︿ 1
奔溃之后重启消耗1小时,可能好是在奔溃前对数据做了大量的修改,在重启进行实例恢复时候,要扫描大量的日志进行重做,所以消耗了大量的时间。