核心内容摘要
1.88克拉的11.7V:一场超越想象的璀璨邂逅
淘宝返利软件的可观测性架构Prometheus与Grafana监控体系搭建大家好我是 微赚淘客系统
0 的研发者省赚客在高并发、分布式环境下淘客系统的稳定性依赖于完善的可观测性能力。
微赚淘客系统
0 基于 Prometheus Grafana Spring Boot Actuator 构建了端到端监控体系覆盖 JVM 指标、HTTP 接口性能、业务关键链路及异常告警实现“问题可发现、根因可定位、趋势可预测”。
Spring Boot 应用暴露指标首先在juwatech.cn.rebate模块中集成 Micrometer 与 ActuatordependencygroupIdorg.springframework.boot/groupIdartifactIdspring-boot-starter-actuator/artifactId/dependencydependencygroupIdio.micrometer/groupIdartifactIdmicrometer-registry-prometheus/artifactId/dependency配置application-prod.yml开启 Prometheus 端点management:endpoints:web:exposure:include:health,info,prometheus,metricsendpoint:prometheus:enabled:truehealth:show-details:always启动后访问/actuator/prometheus即可获取指标文本例如http_server_requests_seconds_count{methodGET,uri/api/commission,status200,}
1
0 jvm_memory_used_bytes{areaheap,idG1 Eden Space,}
147e8
自定义业务指标埋点针对核心业务如佣金计算、订单同步我们通过MeterRegistry注入自定义计数器与直方图packagejuwatech.cn.rebate.service;importio.micrometer.core.instrument.Counter;importio.micrometer.core.instrument.MeterRegistry;importio.micrometer.core.instrument.Timer;importorg.springframework.stereotype.Service;ServicepublicclassCommissionCalculationService{privatefinalCountercommissionSuccessCounter;privatefinalCountercommissionFailureCounter;privatefinalTimercommissionProcessTimer;publicCommissionCalculationService(MeterRegistrymeterRegistry){this.commissionSuccessCounterCounter.builder(rebate.commission.success).description(成功计算佣金次数).register(meterRegistry);this.commissionFailureCounterCounter.builder(rebate.commission.failure).description(佣金计算失败次数).register(meterRegistry);this.commissionProcessTimerTimer.builder(rebate.commission.duration).description(佣金计算耗时秒).register(meterRegistry);}publicBigDecimalcalculate(LongorderId){returncommissionProcessTimer.recordCallable(()-{try{// 模拟佣金计算逻辑BigDecimalamountdoCalculate(orderId);commissionSuccessCounter.increment();returnamount;}catch(Exceptione){commissionFailureCounter.increment();throwe;}});}privateBigDecimaldoCalculate(LongorderId){// 实际业务逻辑returnnewBigDecimal(
12.
;}}
Prometheus 配置服务发现在prometheus.yml中配置基于 Kubernetes 的服务发现自动抓取所有rebate服务实例scrape_configs:-job_name:rebate-appkubernetes_sd_configs:-role:podrelabel_configs:-source_labels:[__meta_kubernetes_pod_label_app]action:keepregex:rebate-system-source_labels:[__meta_kubernetes_pod_annotation_prometheus_io_scrape]action:keepregex:true-source_labels:[__meta_kubernetes_pod_ip]target_label:__address__replacement:$1:8080-source_labels:[__meta_kubernetes_namespace]target_label:namespace确保 Pod 注解包含metadata:labels:app:rebate-systemannotations:prometheus.io/scrape:true
Grafana 仪表盘配置导入或创建以下关键面板JVM 内存与 GC使用JVM (Micrometer)官方模板ID: 4701HTTP 请求延迟分布查询语句histogram_quantile(
95, rate(http_server_requests_seconds_bucket{jobrebate-app}[5m]))业务成功率rate(rebate_commission_success_total[5m]) / (rate(rebate_commission_success_total[5m]) rate(rebate_commission_failure_total[5m]))同时配置告警规则例如接口错误率突增# alert.rules.ymlgroups:-name:rebate-alertsrules:-alert:HighCommissionErrorRateexpr:rate(rebate_commission_failure_total[5m])
1for:2mlabels:severity:criticalannotations:summary:佣金计算失败率过高description:过去5分钟失败率超过10%当前值在 Prometheus 中加载该规则rule_files:-alert.rules.yml
日志与链路追踪联动虽然本文聚焦指标监控但实际生产中需结合 Loki日志与 Jaeger链路。
我们在关键方法添加 Trace ID 日志importorg.slf4j.Logger;importorg.slf4j.LoggerFactory;importorg.springframework.web.filter.OncePerRequestFilter;publicclassTraceIdFilterextendsOncePerRequestFilter{privatestaticfinalLoggerlogLoggerFactory.getLogger(TraceIdFilter.class);OverrideprotectedvoiddoFilterInternal(HttpServletRequestrequest,HttpServletResponseresponse,FilterChainfilterChain)throwsIOException,ServletException{StringtraceIdUUID.randomUUID().toString().replace(-,).substring(0,
;MDC.put(traceId,traceId);log.info(Start request: {} {},request.getMethod(),request.getRequestURI());try{filterChain.doFilter(request,response);}finally{MDC.clear();}}}Grafana 中可通过 Explore 关联日志与指标实现“从告警 → 指标 → 日志 → 代码”的完整排查链路。
本文著作权归 微赚淘客系统