第一步: 找到CPU占用率高的進(jìn)程
3850系列交換機(jī)使用4核CPU, 使用show process cpu 命令將會(huì)看到:
3850-2#show processes cpu sort | exclude 0.0
Core 0: CPU utilization for five seconds: 53%; one minute: 39%; five minutes: 41%
Core 1: CPU utilization for five seconds: 43%; one minute: 57%; five minutes: 54%
Core 2: CPU utilization for five seconds: 95%; one minute: 60%; five minutes: 58%
Core 3: CPU utilization for five seconds: 32%; one minute: 31%; five minutes: 29%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
8525 472560 2345554 7525 31.37 30.84 30.83 0 iosd
5661 2157452 9234031 698 13.17 12.56 12.54 1088 fed
6206 19630 74895 262 1.83 0.43 0.10 0 eicored
6197 725760 11967089 60 1.41 1.38 1.47 0 pdsd
根據(jù)以上輸出,可以看到iosd和fed進(jìn)程占用大量CPU.
*Forwarding Engine Driver (FED): This is the heart of the Cisco Catalyst 3850 Series Switch and is responsible for all hardware programming/forwarding.
當(dāng)由于中斷(interrupts)導(dǎo)致CPU利用率高,將會(huì)看到IOSd和FED進(jìn)程占用大量CPU,以下子進(jìn)程將占用CPU:
FED Punject TX
FED Punject RX
FED Punject replenish
FED Punject TX complete
我們可以使用 show process cpu detailed 命令去獲得進(jìn)程的更詳細(xì)內(nèi)容.
在這個(gè)案例中,我們看看這2個(gè)進(jìn)程:
3850-2#show processes cpu detailed process iosd sort | ex 0.0
Core 0: CPU utilization for five seconds: 36%; one minute: 39%; five minutes: 40%
Core 1: CPU utilization for five seconds: 73%; one minute: 52%; five minutes: 53%
Core 2: CPU utilization for five seconds: 22%; one minute: 56%; five minutes: 58%
Core 3: CPU utilization for five seconds: 46%; one minute: 40%; five minutes: 31%
PID T C TID Runtime(ms)Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
8525 L 556160 2356540 7526 30.42 30.77 30.83 0 iosd
8525 L 1 8525 712558 284117 0 23.14 23.33 23.38 0 iosd
59 I 1115452 4168181 0 42.22 39.55 39.33 0 ARP Snoop
198 I 3442960 4168186 0 25.33 24.22 24.77 0 IP Host Track Proce
30 I 3802130 4168183 0 24.66 27.88 27.66 0 ARP Input
283 I 574800 3225649 0 4.33 4.00 4.11 0 DAI Packet Process
3850-2#show processes cpu detailed process fed sorted | ex 0.0
Core 0: CPU utilization for five seconds: 45%; one minute: 44%; five minutes: 44%
Core 1: CPU utilization for five seconds: 38%; one minute: 44%; five minutes: 45%
Core 2: CPU utilization for five seconds: 42%; one minute: 41%; five minutes: 40%
Core 3: CPU utilization for five seconds: 32%; one minute: 30%; five minutes: 31%
PID T C TID Runtime(ms)Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
5638 L 612840 1143306 536 13.22 12.90 12.93 1088 fed
5638 L 3 8998 396500 602433 0 9.87 9.63 9.61 0 PunjectTx
5638 L 3 8997 159890 66051 0 2.70 2.70 2.74 0 PunjectRx
在IOSd的輸出中,看到ARP Snoop, IP Host Track Process, 和 ARP Input 比較高. 比較常見(jiàn)的是由于A(yíng)RP包造成CPU中斷高.
第二步:確定導(dǎo)致CPU利用率高的CPU隊(duì)列
思科3850系列交換機(jī)為不同類(lèi)型的包提供了不同的隊(duì)列(FED維護(hù)32個(gè)RX CPU Queue,這些queue會(huì)直接上CPU).
監(jiān)視這些Queue可以發(fā)現(xiàn)哪些包被送上了CPU,哪些被IOSd進(jìn)程處理. 這些Queue是基于每個(gè)Port-ASIC的. 例如,有2個(gè)Port-ASIC, 接口1到24屬于Port-ASIC 0.
可以使用showplatform punt statistics port-asic cpuq direction 命令去看.
在showplatform punt statistics port-asic 0 cpuq -1 direction rx 命令中, -1參數(shù)代理所有queue, 引起,這條命令將顯示Port-ASIC 0上所有receivequeue.
現(xiàn)在,必須確定哪個(gè)Queue有大量的數(shù)據(jù)包在以高速率進(jìn)入queue. 在這個(gè)案例中, 檢查發(fā)現(xiàn)了名叫 CPU_Q_PROTO_SNOOPING的第16個(gè)queue有異常.
RX (ASIC2CPU) Stats (asic 0 qn 16lqn 16):
RXQ 16: CPU_Q_PROTO_SNOOPING
----------------------------------------
Packets received from ASIC : 79099152
Send to IOSd total attempts : 79099152
Send to IOSd failed count : 1240331
RX suspend count : 1240331
RX unsuspend count : 1240330
RX unsuspend send count : 1240330
RX unsuspend send failed count : 0
RX dropped count : 0
RX conversion failure dropped : 0
RX pkt_hdr allocation failure : 0
RX INTACK count : 0
RX packets dq'd after intack : 0
Active RxQ event : 9906280
RX spurious interrupt : 0
另一種方法是使用show platform punt client 命令去發(fā)現(xiàn):
3850-2#show platform punt client
tag buffer jumbo fallback packets received failures
65559 0/ 16/1600 0/4 0/0 0 0 0 0 0
65560 0/ 16/1600 0/4 0/0 0 0 0 0 0
s65561 421/ 512/1600 0/0 0/128 79565859131644697 478984244 0 37467
65563 0/ 512/1600 0/16 0/256 0 0 0 0 0
65564 0/ 512/1600 0/16 0/256 0 0 0 0 0
找到最多數(shù)據(jù)包這一行對(duì)應(yīng)的tag,在這個(gè)案例中, tag為65561.
*Handle:A handle can be thought of as a pointer. It is a means to discover moredetailed information about specific variables that are used in the outputs thatthe box produces. This is similar to the concept of Local Target Logic (LTL)indices on the Cisco Catalyst 6500 Series Switch.
*PacketDelivery System (PDS): This is thearchitecture and process of how packets aredelivered to and from varioussubsystem. As an example, it controls how packetsare delivered from the FED tothe IOSd and vice versa.
輸入以下命令,可以看到queue名是Rx Proto Snoop.
3850-2#show pds tag all | in Active|Tags|65561
Active Client Client
Tags HandleName TDA SDA FDA TBufD TBytD
65561 7296672 Punt Rx ProtoSnoop 79821397 79821397 0 79821397 494316524
在65561前的s表示大量入向數(shù)據(jù)包導(dǎo)致FED handle被暫?;蛘弑粔嚎?FEDhandle is suspended and overwhelmed). 如果s不消失,代表queue永久性的卡住了.
第三步:轉(zhuǎn)儲(chǔ)發(fā)送到CPU的數(shù)據(jù)包
在show pds tag all 的輸出中看到 handle是7296672.
在使用show pds client packet last sink命令之前,需要開(kāi)啟debug pds pktbuf-last.否則將看到以下輸出:
3850-2#show pds client 7296672 packet last sink
% switch-2:pdsd:This command works in debug mode only. Enable debug using
"debug pds pktbuf-last" command
開(kāi)啟debug pds pktbuf-last后,將看到以下輸出:
3850-2#show pds client 7296672 packet last sink
Dumping Packet(54528) # 0 of Length 60
-----------------------------------------
Meta-data
0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0010 00 00 16 1d 00 00 00 00 00 00 00 00 55 5a 57 f0 ............UZW.
0020 00 00 00 00 fd 01 10 df 00 5b 70 00 00 10 43 00 .........[p...C.
0030 00 10 43 00 00 41 fd 00 00 41 fd 00 00 00 00 00 ..C..A...A......
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050 00 00 00 3c 00 00 00 00 00 01 00 19 00 00 00 00 ...<............
0060 01 01 b6 80 00 00 00 4f 00 00 00 00 00 00 00 00 .......O........
0070 01 04 d8 80 00 00 00 33 00 00 00 00 00 00 00 00 .......3........
0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00a0 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00 ................
Data
0000 ff ff ff ff ff ff aa bb cc dd 00 00 08 06 00 01 ................
0010 08 00 06 04 00 01 aa bb cc dd 00 00 c0 a8 01 0a ................
0020 ff ff ff ff ff ff c0 a8 01 14 00 01 02 03 04 05 ................
0030 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 ............
Meta-data是被系統(tǒng)內(nèi)部使用的,Data輸出是真實(shí)數(shù)據(jù)包的信息. 以下命令將使用Meta-data 0070開(kāi)始的前16bits.
* IIF=Interface ID Factory
3850-2#show platform port-asic ifm iif-id 0x0104d88000000033
Interface Table
Interface IIF-ID : 0x0104d88000000033
Interface Name : Gi2/0/20
Interface Block Pointer : 0x514d2f70
Interface State : READY
Interface Stauts : IFM-ADD-RCVD, FFM-ADD-RCVD
Interface Ref-Cnt : 6
Interface Epoch : 0
Interface Type : ETHER
Port Type : SWITCH PORT
Port Location : LOCAL
Slot : 2
Unit : 20
Slot Unit : 20
Acitve : Y
SNMP IF Index : 22
GPN : 84
EC Channel : 0
EC Index : 0
ASIC : 0
ASIC Port : 14
Port LE Handle : 0x514cd990
Non Zero Feature Ref Counts
FID : 48(AL_FID_L2_PM), Ref Count : 1
FID : 77(AL_FID_STATS), Ref Count : 1
FID : 51(AL_FID_L2_MATM), Ref Count : 1
FID : 13(AL_FID_SC), Ref Count : 1
FID : 26(AL_FID_QOS), Ref Count : 1
Sub block information
FID : 48(AL_FID_L2_PM), Private Data : 0x54072618
FID : 26(AL_FID_QOS), Private Data : 0x514d31b8
從以上輸出可以看到ARP數(shù)據(jù)包來(lái)源于接口Gi2/0/20. 此時(shí),如果關(guān)閉這個(gè)接口,將解決這個(gè)問(wèn)題并降低CPU利用率.
第四步:使用FED tracing
在第3步討論的方法的缺點(diǎn)是只可以解析列隊(duì)中最后一個(gè)包,也許并不是真正導(dǎo)致CPU高的數(shù)據(jù)包.
有一個(gè)更好的排錯(cuò)方法是使用 FED tracing, 可以將FED送上CPU的數(shù)據(jù)包進(jìn)行抓包.
=====
步驟:
=====
1/ 需要enable detail tracking, 才能抓包
3850-2#set trace control fed-punject-detail enable
2/ 調(diào)整抓包buffer. 按需調(diào)整detail tracing的buffer.默認(rèn)buffer大小是32768.
3850-2#show mgmt-infra trace settings fed-punject-detail
One shot Trace Settings:
Buffer Name: fed-punject-detail
Default Size: 32768
Current Size: 32768
Traces Dropped due to internal error: No
Total Entries Written: 0
One shot mode: No
One shot and full: No
Disabled: False
可以通過(guò)以下命令修改buffer大小
3850-2#set trace control fed-punject-detail buffer-size
3850-2#set trace control fed-punject-detail buffer-size ?
<8192-67108864> The new desired buffer size, in bytes
default Reset trace buffer size to default
3/ 添加抓包filter. 你可以添加不同的filter,并且使用match all 或者match any來(lái)混合使用這些filter進(jìn)行抓包.
3850-2#set trace fed-punject-detail direction rx filter_add
3850-2#set trace fed-punject-detail direction rx filter_add ?
cpu-queue rxq 0..31
field field
offset offset
3850-2#set trace fed-punject-detail direction rx ?
Clear_all Clear all debug configured
Dump_all Dump all debug info: on
Dump_all_off Dump all debug info: off
Filter_add Add debug filter condition
Filter_clear Clear a debug filter configured
Filter_disable Disable configured filter condition(s)
Filter_enable Enable configured filter condition(s)
Match_all Match all configured filter conditions
Match_any Match any configured filter condition
Show_all Show all debug configured
在第二步案例分析中, 我們看到Queue16有大量數(shù)據(jù)包送上CPU, 所以需要查看這個(gè)queue,看看哪些數(shù)據(jù)包被送上CPU了.
使用以下命令去設(shè)置查看哪個(gè)cpu queue
3850-2#set trace fed-punject-detail direction rx filter_add cpu-queue
例如:
3850-2#set trace fed-punject-detail direction rx filter_add cpu-queue 16 16
使用match all 或者match any來(lái)混合使用已經(jīng)定義了的filter來(lái)進(jìn)行抓包
3850-2#set trace fed-punject-detail direction rx match_all
3850-2#set trace fed-punject-detail direction rx filter_enable
4/顯示抓到的數(shù)據(jù)包
使用show mgmt-infra trace messages fed-punject-detail命令查看抓到的包.
3850-2#show mgmt-infra trace messages fed-punject-detail
[11/25/13 07:05:53.814 UTC 2eb0c9 5661]
00 00 00 00 00 4e 00 40 07 00 02 08 00 00 51 3b
00 00 00 00 00 01 00 00 03 00 00 00 00 00 00 01
00 00 00 00 20 00 00 0e 00 00 00 00 00 01 00 74
00 00 00 04 00 54 41 02 00 00 00 00 00 00 00 00
[11/25/13 07:05:53.814 UTC 2eb0ca 5661]
ff ff ff ff ff ff aa bb cc dd 00 00 08 06 00 01
08 00 06 04 00 01 aa bb cc dd 00 00 c0 a8 01 0a
ff ff ff ff ff ff c0 a8 01 14 00 01 02 03 04 05
06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 f6 b9 10 32
[11/25/13 07:05:53.814 UTC 2eb0cb 5661] Frame descriptors:
[11/25/13 07:05:53.814 UTC 2eb0cc 5661]
=========
fdFormat=0x4 systemTtl=0xe
loadBalHash1=0x8 loadBalHash2=0x8
spanSessionMap=0x0 forwardingMode=0x0
destModIndex=0x0 skipIdIndex=0x4
srcGpn=0x54 qosLabel=0x41
srcCos=0x0 ingressTranslatedVlan=0x3
bpdu=0x0 spanHistory=0x0
sgt=0x0 fpeFirstHeaderType=0x0
srcVlan=0x1 rcpServiceId=0x2
wccpSkip=0x0 srcPortLeIndex=0xe
cryptoProtocol=0x0 debugTagId=0x0
vrfId=0x0 saIndex=0x0
pendingAfdLabel=0x0 destClient=0x1
appId=0x0 finalStationIndex=0x74
decryptSuccess=0x0 encryptSuccess=0x0
rcpMiscResults=0x0 stackedFdPresent=0x0
spanDirection=0x0 egressRedirect=0x0
redirectIndex=0x0 exceptionLabel=0x0
destGpn=0x0 inlineFd=0x0
suppressRefPtrUpdate=0x0 suppressRewriteSideEfects=0x0
cmi2=0x0 currentRi=0x1
currentDi=0x513b dropIpUnreachable=0x0
srcZoneId=0x0 srcAsicId=0x0
originalDi=0x0 originalRi=0x0
srcL3IfIndex=0x2 dstL3IfIndex=0x0
dstVlan=0x0 frameLength=0x40
fdCrc=0x7 tunnelSpokeId=0x0
=========
[11/25/13 07:05:53.814 UTC 2eb0cd 5661]
[11/25/13 07:05:53.814 UTC 2eb0ce 5661] PUNT PATH (fed_punject_rx_process_packet:
830):RX: Q: 16, Tag: 65561
[11/25/13 07:05:53.814 UTC 2eb0cf 5661] PUNT PATH (fed_punject_get_physical_iif:
579):RX: Physical IIF-id 0x104d88000000033
[11/25/13 07:05:53.814 UTC 2eb0d0 5661] PUNT PATH (fed_punject_get_src_l3if_index:
434):RX: L3 IIF-id 0x101b6800000004f
[11/25/13 07:05:53.814 UTC 2eb0d1 5661] PUNT PATH (fed_punject_fd_2_pds_md:478):
RX: l2_logical_if = 0x0
[11/25/13 07:05:53.814 UTC 2eb0d2 5661] PUNT PATH (fed_punject_get_source_cos:638):
RX: Source Cos 0
[11/25/13 07:05:53.814 UTC 2eb0d3 5661] PUNT PATH (fed_punject_get_vrf_id:653):
RX: VRF-id 0
[11/25/13 07:05:53.814 UTC 2eb0d4 5661] PUNT PATH (fed_punject_get_src_zoneid:667):
RX: Zone-id 0
[11/25/13 07:05:53.814 UTC 2eb0d5 5661] PUNT PATH (fed_punject_fd_2_pds_md:518):
RX: get_src_zoneid failed
[11/25/13 07:05:53.814 UTC 2eb0d6 5661] PUNT PATH (fed_punject_get_acl_log_direction:
695): RX: : Invalid CMI2
[11/25/13 07:05:53.814 UTC 2eb0d7 5661] PUNT PATH (fed_punject_fd_2_pds_md:541):RX:
get_acl_log_direction failed
[11/25/13 07:05:53.814 UTC 2eb0d8 5661] PUNT PATH (fed_punject_get_acl_full_direction:
724):RX: DI 0x513b ACL Full Direction 1
[11/25/13 07:05:53.814 UTC 2eb0d9 5661] PUNT PATH (fed_punject_get_source_sgt:446):
RX: Source SGT 0
[11/25/13 07:05:53.814 UTC 2eb0da 5661] PUNT PATH (fed_punject_get_first_header_type:680):
RX: FirstHeaderType 0
[11/25/13 07:05:53.814 UTC 2eb0db 5661] PUNT PATH (fed_punject_rx_process_packet:916):
RX: fed_punject_pds_send packet 0x1f00 to IOSd with tag 65561
[11/25/13 07:05:53.814 UTC 2eb0dc 5661] PUNT PATH (fed_punject_rx_process_packet:744):
RX: **** RX packet 0x2360 on qn 16, len 128 ****
[11/25/13 07:05:53.814 UTC 2eb0dd 5661]
buf_no 0 buf_len 128
以上輸出提供了足夠信息,讓我們知道數(shù)據(jù)包從哪里來(lái) ,以及含有什么信息.
ff ff ff ff ff ff - 目的 MAC地址
aa bb cc dd 00 00 - 源 MAC地址
現(xiàn)在可以根據(jù)源MAC地址來(lái)找到相應(yīng)的接口.
在log中, 也包含了非常有用的信息:
[11/25/13 07:05:53.814 UTC 2eb0ce 5661] PUNT PATH (fed_punject_rx_process_packet:
830):RX: Q: 16, Tag: 65561
[11/25/13 07:05:53.814 UTC 2eb0cf 5661] PUNT PATH (fed_punject_get_physical_iif:
579):RX: Physical IIF-id 0x104d88000000033
通過(guò)第一條log很容易看到數(shù)據(jù)包來(lái)自哪個(gè)queue和tag.
第二條log更有用, 因?yàn)樗嗽唇涌诘腎IF-ID, 可以使用以下命令找到源接口.
3850-2#show platform port-asic ifm iif-id 0x0104d88000000033
Interface Table
Interface IIF-ID : 0x0104d88000000033
Interface Name : Gi2/0/20
Interface Block Pointer : 0x514d2f70
Interface State : READY
Interface Stauts : IFM-ADD-RCVD, FFM-ADD-RCVD
Interface Ref-Cnt : 6
Interface Epoch : 0
Interface Type : ETHER
Port Type : SWITCH PORT
Port Location : LOCAL
Slot : 2
Unit : 20
Slot Unit : 20
Acitve : Y
SNMP IF Index : 22
GPN : 84
EC Channel : 0
EC Index : 0
ASIC : 0
ASIC Port : 14
Port LE Handle : 0x514cd990
Non Zero Feature Ref Counts
FID : 48(AL_FID_L2_PM), Ref Count : 1
FID : 77(AL_FID_STATS), Ref Count : 1
FID : 51(AL_FID_L2_MATM), Ref Count : 1
FID : 13(AL_FID_SC), Ref Count : 1
FID : 26(AL_FID_QOS), Ref Count : 1
Sub block information
FID : 48(AL_FID_L2_PM), Private Data : 0x54072618
FID : 26(AL_FID_QOS), Private Data : 0x514d31b8
- 北京炫億時(shí)代專(zhuān)業(yè)服務(wù)器運(yùn)輸,機(jī)房搬遷包裝以及倉(cāng)儲(chǔ)托
- 思科CISCO無(wú)線(xiàn)AP 啟動(dòng)加載不了ios
- 【云爍俱樂(lè)部分享會(huì)】第十二期 某游戲開(kāi)發(fā)公司內(nèi)部網(wǎng)
- 【云爍俱樂(lè)部分享會(huì)】第十一期 某政府網(wǎng)絡(luò)安全架構(gòu)方
- 【云爍俱樂(lè)部分享會(huì)】第九期 中小企業(yè)網(wǎng)絡(luò)搭建思路分
- 【云爍俱樂(lè)部分享會(huì)】第八期 某倉(cāng)庫(kù)無(wú)線(xiàn)覆蓋項(xiàng)目分享
- 【云爍俱樂(lè)部分享會(huì)】第七期 工程師網(wǎng)絡(luò)設(shè)備實(shí)操理論
- 【云爍俱樂(lè)部分享會(huì)】第六期 工程師百度無(wú)線(xiàn)覆蓋項(xiàng)目
- 【云爍俱樂(lè)部分享會(huì)】第五期 工程師項(xiàng)目分享會(huì)
- 【云爍俱樂(lè)部分享會(huì)】第四期 工程師項(xiàng)目分享會(huì)
熱門(mén)文章
- 北京機(jī)房搬遷改造公司,系統(tǒng)集成有哪些公司?-2019-11-04
- 如何打造一個(gè)安全的網(wǎng)絡(luò)環(huán)境?政府要做到這四點(diǎn)-2019-11-04
- 北京辦公室網(wǎng)絡(luò)布線(xiàn),綜合布線(xiàn)施工價(jià)格-2019-11-04
- 深信服產(chǎn)品專(zhuān)業(yè)上網(wǎng)行為管理,安全設(shè)備有保障-2019-11-04
- 兩年內(nèi)網(wǎng)絡(luò)安全市場(chǎng)規(guī)模將達(dá)千億級(jí)別?來(lái)看看詳細(xì)分析-2019-11-04
- 北京炫億時(shí)代專(zhuān)業(yè)機(jī)房設(shè)備除塵,機(jī)房網(wǎng)絡(luò)改造-2019-11-04
- 企業(yè)如何保障移動(dòng)辦公的安全性?移動(dòng)辦公存在的七大安-2019-11-01
- 北京深信服AC-1000-A400產(chǎn)品租賃購(gòu)買(mǎi),專(zhuān)業(yè)IT上網(wǎng)行為管理-2019-11-01
- 互聯(lián)網(wǎng)企業(yè)軟件開(kāi)發(fā)如何才能沒(méi)有漏洞?專(zhuān)業(yè)工程師給出-2019-11-01
- 如何避免APP的“越權(quán)”行為?要靠制定相關(guān)法律法規(guī)-2019-11-01