在一个微服务中,链路追踪能够准确监控所有调用链路,从而定位慢调用,失败调用。
简介
skywalking是分布式系统的应用程序性能监视工具,专为微服务、云原生架构和基于容器(Docker、K8s、Mesos)架构而设计。
发起者是吴晟,现在是Apache项目。
原理简介

- skywalking分为四个部分:探针,平台后端,存储,UI
- Probes,探针,探针因使用的语言不同而不通,收集数据并且格式化为skywalking所需的格式。
- Platform backend 平台后端,对应于zipkin server,可以集群部署,聚合,分析,将数据展示在UI中
- Storage:存储,可扩展的存储,可以使es,H2,MySQL集群
- UI 丰富的可视化功能,提供身份验证
搭建
下载
版本:6.0.0-GA
下载地址
解压
解压后目录

服务端配置
数据持久化方式
MySQL
- 创建一个数据库,例如create database wstest;
- 将MySQL的驱动包添加入解压后/oap-libs文件夹
- 修改 - config/application.yml,注释storage下的h2,添加一下内容- 1 
 2- storage: 
 mysql:
- 修改 - config/datasource-settings.properties,写入对应属性,例如- 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13- jdbcUrl=jdbc:mysql://localhost:3306/swtest?useSSL=false&serverTimezone=UTC 
 dataSource.user=root
 dataSource.password=root
 dataSource.cachePrepStmts=true
 dataSource.prepStmtCacheSize=250
 dataSource.prepStmtCacheSqlLimit=2048
 dataSource.useServerPrepStmts=true
 dataSource.useLocalSessionState=true
 dataSource.rewriteBatchedStatements=true
 dataSource.cacheResultSetMetadata=true
 dataSource.cacheServerConfiguration=true
 dataSource.elideSetAutoCommits=true
 dataSource.maintainTimeStats=false
- 启动服务端,运行 - lib/startup.bat
- 检查,浏览器中输入localhost:8080
- 其它,服务端默认使用端口:8080,11800(gRPC),12800(HTTP),可以在webapp/webapp.yml和config/application.yml中更改
数据自动清理
在config/application.yml中可以设置数据过期时间1
2
3
4
5
6
7
8
9
10downsampling:
- Hour
- Day
- Month
# Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted.
recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month
告警配置
在服务端config/alarm-settings.yml中配置,例如1
2
3
4
5
6
7
8
9
10rules:
  # Rule unique name, must be ended with `_rule`.
  service_resp_time_rule:
    indicator-name: service_resp_time
    op: ">"
    threshold: 1000
    period: 10
    count: 3
    silence-period: 5
    message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
- service_resp_time_rule:告警规则名称 ***_rule (规则名称可以自定义但是必须以’_rule’结尾)
- indicator-name:指标数据名称: 参见https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/backend-alarm.md#list-of-all-potential-metric-name
- op: 操作符: > , < , = 【当然你可以自己扩展开发其他的操作符】
- threshold:目标值:指标数据的目标数据 如sample中的1000就是服务响应时间,配合上操作符就是大于1000ms的服务响应
- period: 告警检查持续时间
- counts: 达到告警阈值的次数,在period内达到这个次数将被告警
- silence-period:在时间t告警,那么在( t , t + period )时间段内保持沉默,不再告警
- message:告警信息
- webhooks:服务告警通知服务地址客户端
- 将解压出来的agent拷贝到一个空目录
- 将服务的jar包放入agent目录
- 配置agent - agent/config/agent.config,例如- 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46- # Licensed to the Apache Software Foundation (ASF) under one 
 # or more contributor license agreements. See the NOTICE file
 # distributed with this work for additional information
 # regarding copyright ownership. The ASF licenses this file
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License. You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # The agent namespace
 # agent.namespace=${SW_AGENT_NAMESPACE:default-namespace}
 # The service name in UI
 #agent.service_name=${SW_AGENT_NAME:Your_ApplicationName}
 agent.service_name=consumer
 # The number of sampled traces per 3 seconds
 # Negative number means sample traces as many as possible, most likely 100%
 # agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:-1}
 # Authentication active is based on backend setting, see application.yml for more details.
 # agent.authentication = ${SW_AGENT_AUTHENTICATION:xxxx}
 # The max amount of spans in a single segment.
 # Through this config item, skywalking keep your application memory cost estimated.
 # agent.span_limit_per_segment=${SW_AGENT_SPAN_LIMIT:300}
 # Ignore the segments if their operation names start with these suffix.
 # agent.ignore_suffix=${SW_AGENT_IGNORE_SUFFIX:.jpg,.jpeg,.js,.css,.png,.bmp,.gif,.ico,.mp3,.mp4,.html,.svg}
 # If true, skywalking agent will save all instrumented classes files in `/debugging` folder.
 # Skywalking team may ask for these files in order to resolve compatible problem.
 # agent.is_open_debugging_class = ${SW_AGENT_OPEN_DEBUG:true}
 # 配置集群时只需要添加多个address即可
 # Backend service addresses.
 collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:127.0.0.1:11800}
 # Logging level
 logging.level=${SW_LOGGING_LEVEL:DEBUG}
- 启动服务,上述配置文件中的变量,既可以添加入环境变量,也可以写入到上述配置文件中 - 1 
 2
 3- java:javaagent:/toAgentPath/skywalking-agent.jar -jar 自己的服务.jar 
 #例如
 java -javaagent:skywalking-agent.jar -jar service1-1.0-SNAPSHOT.jar
- 检查,发起一些请求,观察skywalking UI 
UI简介
https://skywalking.apache.org/zh/blog/2018-12-18-Apache-SkyWalking-5-0-UserGuide.html
