核心内容摘要
花季少女3.08.30
本文详细记录了在使用Apache Doris过程中遇到的各种问题包括创建表时的错误、日志权限变更、磁盘空间不足、物化视图启用、Hive数据导入、LOAD任务失败等并提供了相应的解决方案例如调整内存、设置参数和修复权限问题等。
1执行创建语句过程中出现[Err] 1064 - errCode 2, detailMessage Failed to find enough host in all backends. need: 3原因语句中指定了 PROPERTIES(replication_num
;结果BE只有2个查看对应节点的日志. ./be.WARNING.log.
W1026 18:13:
3
139992 19091 utils.cpp:101] fail to get master client from cache. host
192.
168.
143, port9020, code7W1026 18:13:
3
140386 19091 task_worker_pool.cpp:1185] finish report olap table state failed. status:-1, master host:
192.
168.
143, port:9020W1026 18:13:
4
391201 19089 utils.cpp:101] fail to get master client from cache. host
192.
168.
143, port9020, code7W1026 18:13:
4
391471 19089 task_worker_pool.cpp:1060] finish report task failed. status:-1, master host:
192.
168.
143port:9020W1027 10:00:
3
385262 2359 data_dir.cpp:128] open file filed, error: IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_idW1027 10:00:
3
385926 2359 data_dir.cpp:95] _init_cluster_id failed, error: IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_idW1027 10:00:
3
385958 2359 storage_engine.cpp:192] Store load failed, statusIO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id, path/wyyt/software/doris/be/storageW1027 10:00:
3
386071 2353 storage_engine.cpp:148] _init_store_map failed, error: Internal error: init path failed, errorIO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;W1027 10:00:
3
386106 2353 storage_engine.cpp:96] open engine failed, error: Internal error: init path failed, errorIO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;F1027 10:00:
3
386186 2353 doris_main.cpp:189] fail to open StorageEngine, resinit path failed, errorIO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;找到原因之后解决问题。
我这里是打开文件失败权限给755试试然后重启BE节点。
如果重启失败直接删除 be.pid 再重启2日志权限用户变更了启动服务的时候是什么用户就是什么用户3创建doris表报错原因字段长度数字加起来不能超过10W。
如果要改可以设置但是不推荐4磁盘满了ErrorReason{codeerrCode 2, msgfailed to create task: errCode 2, detailMessage disk 6189104187500640169 on backend 11001 exceed limit usage导致所有的任务暂停5,开启物化视图create materialized view test_p_user_view as select user_id,user_name from test_p_user limit 8;ERROR 1064 (HY
: errCode 2, detailMessage The materialized view is coming soon解决可以在master上执行这个命令 ADMIN SET FRONTEND CONFIG (enable_materialized_view true);目前物化视图只支持duplicate key 表而且
12只支持部分,
13版本会完善6hive数据导入到doris流程1在doris创建对应的表2执行语句7type:LOAD_RUN_FAIL; msg:errCode 2, detailMessage there is no scanNode Backend从hdfs导入大表导致be节点挂掉解决方案对fe进行参数设置任务要显示指定内存查看be日志查看core文件查看是否是OOM。
参考https://blog.csdn.net/weixin_42135997/article/details/80732658https://blog.csdn.net/qq_15437667/article/details/83934113?utm_mediumdistribute.pc_aggpage_search_result.none-task-blog-2~all~sobaiduend~default-1-
nonecaseutm_termlinux%20%E6%80%8E%E4%B9%88%E7%9C%8Bcore%E6%96%87%E4%BB%B6spm
1000.
2123.
3
44308,突然之间执行不了命令查看be节点是Alive状态。
查看be节点日志 be.INFO be.WARN 日志都没发现啥后来发现是一个节点的磁盘出问题了 以后遇到这种问题就晓得怎么排查了。
。
9broker 导入hdfs数据规则
验证了broker导入hdfs数据导入数据使用uniq模式的情况下。
相同主键覆盖不是有序而是按照第二个字段的长度来替换的第二个字段长度最大相同长度则取时间最新的。
如果第二个字段一样同理比较第三个字段长度。
结果数据10Doris broker导入数据失败type:LOAD_RUN_FAIL; msg:errCode 2, detailMessage all partitions have no load data原始表数据为null。
没数据11同时执行多个broker任务导致BE节点挂掉原因应该是内存不足的原因导致BE死掉。
解决方案broker 单节点限制每次1个G或者更小12routine laod 报错 errCode 2, detailMessage failed to send task: errCode 2, detailMessage failedBE的任务并发是默认 max_routine_load_task_num_per * be数量比如be节点有3个那么所有的并发是 5*313通过insert into14导入任务失败内存不够修改内存15,ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel异常说明数据质量不好导致不能doris不能解析或者解析失败而取消导入任务可能原因
varchar字段太长分隔符问题
too_many_filtered_rows解决方案长文本不要导入长文本导入截断数据中包含分隔符16使用broker导入数据到doris之后发现内存没有释放解决方案尝试升级doris版本为
0.
1
15验证这个问题地址https://cloud.baidu.com/doc/PALO/s/Ikivhcwb517出现的错误doris版本为
0.
1
11 补丁版本。
18出现be节点的data目录很大有的be节点目录很正常。
初步判断原因集群负载有问题routine load写入太频繁查看表是否正常修改routine load参数 设置为60s(desired_concurrent_number3,max_batch_interval 60,max_batch_rows 300000,max_batch_size 209715200,strict_mode false,format json)19doris版本
0.
1
7 升级之后解决之前存在的问题 Too Many Tasks ................20doris
0.
1
7 内网3个fe部署之后写入数据以后fe有节点挂掉具体日志
09:09:25,172 ERROR (heartbeat mgr|
[BDBJEJournal.write():166] catch an exception when writing to database. sleep and retry. journal id 1526718com.sleepycat.je.rep.InsufficientAcksException: (JE
7.
3.
Transaction: -16160910 VLSN: 31,775,195, initiated at: 09:09:
Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks:
Missing replica acks:
Timeout: 2000ms. FeederState
192.
168.
5_9010_1625132780567(
[MASTER]Current feeds:
192.
168.
7_9010_1625192915300: feederVLSN31,775,198 replicaTxnEndVLSN31,775,
193192.
168.
4_9010_1625132697001: feederVLSN31,775,198 replicaTxnEndVLSN31,775,191at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.commit(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.commit(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.Database.put(Database.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.Database.put(Database.java:
~[je-
7.
3.
jar:
7.
7]at org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:
[palo-fe.jar:
3.
0]at org.apache.doris.persist.EditLog.logEdit(EditLog.java:
[palo-fe.jar:
3.
0]at org.apache.doris.persist.EditLog.logHeartbeat(EditLog.java:
[palo-fe.jar:
3.
0]at org.apache.doris.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:
[palo-fe.jar:
3.
0]at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:
[palo-fe.jar:
3.
0]at org.apache.doris.common.util.Daemon.run(Daemon.java:
[palo-fe.jar:
3.
0]
09:09:27,884 WARN (Thread-49|
[BDBJEMetricHandler.write():117] write metric data into bdb error, key:
192.
168.
7:8030_query_err_rate_1630026555000com.sleepycat.je.rep.InsufficientAcksException: (JE
7.
3.
Transaction: -16160912 VLSN: 31,775,198, initiated at: 09:09:
Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks:
Missing replica acks:
Timeout: 2000ms. FeederState
192.
168.
5_9010_1625132780567(
[MASTER]Current feeds:
192.
168.
7_9010_1625192915300: feederVLSN31,775,199 replicaTxnEndVLSN31,775,
196192.
168.
4_9010_1625132697001: feederVLSN31,775,199 replicaTxnEndVLSN31,775,191at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.commit(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.commit(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.Database.put(Database.java:
~[je-
7.
3.
jar:
7.
7]at org.apache.doris.metric.collector.BDBJEMetricHandler.write(BDBJEMetricHandler.java:
~[palo-fe.jar:
3.
0]at org.apache.doris.metric.collector.BDBJEMetricHandler.writeDouble(BDBJEMetricHandler.java:
~[palo-fe.jar:
3.
0]at org.apache.doris.metric.collector.MetricCollector.parseFeMetricJsonAndWriteMetric(MetricCollector.java:
~[palo-fe.jar:
3.
0]at org.apache.doris.metric.collector.MetricCollector.writeMetric(MetricCollector.java:
~[palo-fe.jar:
3.
0]at org.apache.doris.metric.collector.MetricCollector.lambdainit0(MetricCollector.java:
~[palo-fe.jar:
3.
0]at java.lang.Thread.run(Thread.java:
[?:
1.
0_162]
09:09:33,338 WARN (Thread-49|
[BDBJEMetricHandler.write():117] write metric data into bdb error, key:
192.
168.
7:8030_quantile
75_1630026555000com.sleepycat.je.rep.InsufficientAcksException: (JE
7.
3.
Transaction: -16160913 VLSN: 31,775,200, initiated at: 09:09:
Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks:
Missing replica acks:
Timeout: 2000ms. FeederState
192.
168.
5_9010_1625132780567(
[MASTER]Current feeds:
192.
168.
7_9010_1625192915300: feederVLSN31,775,202 replicaTxnEndVLSN31,775,
198192.
168.
4_9010_1625132697001: feederVLSN31,775,202 replicaTxnEndVLSN31,775,196at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.commit(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.commit(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.Database.put(Database.java:
~[je-
7.
3.
jar:
7.
7]at org.apache.doris.metric.collector.BDBJEMetricHandler.write(BDBJEMetricHandler.java:
~[palo-fe.jar:
3.
0]at org.apache.doris.metric.collector.BDBJEMetricHandler.writeDouble(BDBJEMetricHandler.java:
~[palo-fe.jar:
3.
0]at org.apache.doris.metric.collector.MetricCollector.parseFeMetricJsonAndWriteMetric(MetricCollector.java:
~[palo-fe.jar:
3.
0]at org.apache.doris.metric.collector.MetricCollector.writeMetric(MetricCollector.java:
~[palo-fe.jar:
3.
0]at org.apache.doris.metric.collector.MetricCollector.lambdainit0(MetricCollector.java:
~[palo-fe.jar:
3.
0]at java.lang.Thread.run(Thread.java:
[?:
1.
0_162]
09:09:37,283 ERROR (heartbeat mgr|
[BDBJEJournal.write():166] catch an exception when writing to database. sleep and retry. journal id 1526718com.sleepycat.je.rep.InsufficientAcksException: (JE
7.
3.
Transaction: -16160914 VLSN: 31,775,202, initiated at: 09:09:
Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks:
Missing replica acks:
Timeout: 2000ms. FeederState
192.
168.
5_9010_1625132780567(
[MASTER]Current feeds:
192.
168.
7_9010_1625192915300: feederVLSN31,775,205 replicaTxnEndVLSN31,775,
200192.
168.
4_9010_1625132697001: feederVLSN31,775,205 replicaTxnEndVLSN31,775,196at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.commit(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.commit(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.Database.put(Database.java:
~[je-
7.
3.
jar:
7.
7]at com.sleepycat.je.Database.put(Database.java:
~[je-
7.
3.
jar:
7.
7]at org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:
[palo-fe.jar:
3.
0]at org.apache.doris.persist.EditLog.logEdit(EditLog.java:
[palo-fe.jar:
3.
0]at org.apache.doris.persist.EditLog.logHeartbeat(EditLog.java:
[palo-fe.jar:
3.
0]at org.apache.doris.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:
[palo-fe.jar:
3.
0]at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:
[palo-fe.jar:
3.
0]at org.apache.doris.common.util.Daemon.run(Daemon.java:
[palo-fe.jar:
3.
0]
09:09:40,305 WARN (Thread-49|
[BDBJEMetricHandler.write():117] write metric data into bdb error, key:
192.
168.
7:8030_quantile
95_1630026555000com.sleepycat.je.rep.InsufficientAcksException: (JE
7.
3.
Transaction: -16160916 VLSN: 31,775,205, initiated at: 09:09:
Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks:
Missing replica acks:
Timeout: 2000ms. FeederState
192.
168.
5_9010_1625132780567(
[MASTER]如下图初步判断是不是心跳超时时间设置的太短了因为测试这个版本没有调整任何参数。
后来判断是不是fe元数据同步副本的时候写入失败重试失败。
重启了3次才起来