|
1
|
|
|
2
|
- Everyone has horror stories
- full of embellished gory
details.
- They are like veterans talking about war wounds!
- Lack of documentation about tools and techniques.
- Further mystifies the black art.
- There will be some difficult hardware only bugs.
- But majority are quite easy to progress using systematic methods.
|
|
3
|
- lets start with what you know already
- The emulator and Metrowerks CodeWarrior
|
|
4
|
- What happens when a thread panics?
- Breakpoint is hit. Causing the emulator to stop at the line that caused
the failure.
- A Source level call stack is shown.
- objects and variables in all functions of the call stack can be
examined
- You are spoilt! Always try to reproduce problems on the emulator first -
it is a good debugging environment.
|
|
5
|
|
|
6
|
- Find the line of code which calls user::Panic
|
|
7
|
- Or an access violation
- below - trying to call Cancel() on a NULL pointer
|
|
8
|
- Make sure Just In Time debugging is enabled
- Set the following registry value:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\AeDebug]
"UserDebuggerHotKey"=dword:00000000
"Debugger"="\"C:\\apps\\Metrowerks\\bin\\IDE.exe\"
-p %ld -e %ld" "Auto"="0"
- also ensure that the following macro is removed from
\epoc32\data\epoc.ini:
JustInTime 0
- Debug messages also appear in %Temp%epocwind.out
|
|
9
|
- Enable Logging of System messages…
- From the "Target Settings" panel, go to the "Debugger |
Debugger Settings" options and tick the box labelled "Log
System Messages“
|
|
10
|
- A panic is a Symbian term used to denote an unexpected exit of a thread
- A thread is the unit of execution on Symbian OS
- Processes must have at least one thread to begin executing code
- A panic denotes a serious coding error.
- Either by the caller of a function which has violated an API contract
(e.g calling a function with invalid paramaters)
- or that a object or memory structure has moved into a bad internal
state causing an Invariant
- Panics are helpful
- They aim to inform you about the
exact nature of the problem during development
|
|
11
|
- TReal PercentageToDecimal(Tint aPercentage)
- {
- ASSERT__ALWAYS(aPercentage>=0 && aPercentage <=100), Panic(
EInvalidInput)
- TReal result = aPercentage/100;
- ASSERT(result>=0.0 && result<=1.0);
- return result;
}
|
|
12
|
|
|
13
|
- The cascade of function calling functions which resulted in the panic.
- Shows some history of the current operation.
- This often gives a pretty good idea of the chain of events leading to a
panic.
- Essential for tracking down problems and knowing where to put
breakpoints.
- Also a good way of identifying duplicate defects.
|
|
14
|
|
|
15
|
- Provides logging for:
- memory allocations
- process and thread creation
- leaves
- more in the future?
- main use is for most pin-pointing the source of leaked memory
- To use this tool you need to:
- Install it on your machine: download from Symbian DevNet
- Attach the hooks to EUSER.DLL
- Run HookLogger.EXE
- Run the code to be hooked
|
|
16
|
- Run “HookEUser.cmd” from the “emulator” drive.
- x:\> HookEUser WINSCW
- ‘x’ is the drive containing the epoc32 folder
- Replaces EUSER with a hook "parasite" DLL
- Undo by using the “-r” (remove) option
|
|
17
|
- Run “HookLogger.exe”
- Connection status shown in title bar
- Set the options for monitoring heaps threads
|
|
18
|
- Start the emulator and reproduce the memory leak
- Break into Codewarrior
- Walk back up the stack to User::__DbgMarkEnd
- take a note of leaked memory location (badCell) and thread id.
|
|
19
|
|
|
20
|
- Go to the Threads tab in the hook logger
- find the thread that leaked memory
|
|
21
|
- Right-click and select "Show allocations”
- may take 10 to 20 seconds to respond
|
|
22
|
- Order list by “Ptr”
- Find address indicated by “badCell” in part 3
- Double click to get a nice callstack
|
|
23
|
|
|
24
|
- What happens when a thread panics?
- Either a panic dialog appears
- or device reboots
- No context is stored
- Oh dear - No wonder it’s scary.
- But you need to use tools to get the same information which emulator
gives so easily.
|
|
25
|
- Marking a thread or process as “system critical” means that it is an
integral and essential part of the system
- e.g. the file server
- The thread or process is being declared necessary for correct
functioning of the device
- If a system critical thread exits or panics the device will reboot
- This is why panics in some threads cause the device to reset
|
|
26
|
- \src\cedar\generic\base\e32\kernel\sthread.cpp
- void DThread::Exit()
- {
- if (iExitType!=EExitKill && (iFlags &
(KThreadFlagSystemPermanent|KThreadFlagSystemCritical)))
- K::Fault(K::ESystemThreadPanic);
- <snip>
|
|
27
|
- The most important information to get hold of
- Which thread panicked/caused an access violation
- What was the panic reason and number?
- What was the callstack of the thread when it paniced?
|
|
28
|
- There are two kinds
- Application panic
- Where a Panic dialog appears
- not critical - device carries on working
- System thread panic
- critical - the device halts and resets.
- Or possibly device may enter a special debug mode (called crash
debugger or debug monitor)
|
|
29
|
- Dialog will tell you
- Thread which panicked
- Panic reason
- What else do we need?
- A tool called D_EXC can provide the call stack.
|
|
30
|
- Must enable a tool called the debug monitor (or crash debugger) to get
more info
- Crash debugger tells you
- which thread paniced
- the category and number of the panic
- where the stack for the paniced thread is located in memory
- The crashdebugger can be coaxed to dump the callstack
|
|
31
|
|
|
32
|
- E.g if the dialog says “KERN-EXEC 3”.
- Type in KERN-EXEC panic into the search
- this will help you understand what to look for in code
|
|
33
|
- To get a useful call stack two things are always needed.
- A hex dump of the memory used by the stack of the thread which paniced
- A ROM symbol file for the software flashed onto the device.
- With this information a Symbian perl script can decode a human readable
call stack ( similar to the call
stack seen in the emulator).
|
|
34
|
- run d_exc tool on the device first
- reproduce the panic.
- d_exc dialog pops up
- telling you some information about the panic. Press OK to save the
stack to disk.
- d_exc will have dumped 2 files to disk
- a binary .stk file containing the thread’s stack
- a .txt file detailing the panic code and category
- get those files onto a PC and have your symbol file at hand.
|
|
35
|
|
|
36
|
- Open the output in notepad.
- Do a find for “>>>>”.
- This takes you to the top of the decoded stack!
- >>>> current stack
pointer >>>>
- r00=80007204 00000000 80000368 80000003
- r04=00801bb0 00000001 00000000 00802bc4
- r08=00000002 50340f15 00802bc4 00000000
- r12=8041b36c 00801bb0 50160ff8 5000b34c
- PC = 5000b34c L..P
__ArmVectorSwi(void) + 0x124
- LR = 50160ff8 ...P
SvSendReceive(int, void *) + 0x1c
- >>>> current stack
pointer >>>>
|
|
37
|
- Scroll down the text. Sometimes you may see this familiar finger print
for a panic:
- >>>> current stack
pointer >>>>
- r00=80007204 00000000 80000368 80000003
- r04=00801bb0 00000001 00000000 00802bc4
- r08=00000002 50340f15 00802bc4 00000000
- r12=8041b36c 00801bb0 50160ff8 5000b34c
- PC = 5000b34c L..P
__ArmVectorSwi(void) + 0x124
- LR = 50160ff8 ...P
SvSendReceive(int, void *) + 0x1c
- >>>> current stack
pointer >>>>
- 1bb0 80000001 ....
- 1bb4 00000082 ....
- 1bb8 50161018 ...P SvSendReceiveCheck(int, void *) + 0x8
- 1bbc 5016594c LY.P RThread::Panic(TDesC16 const &,
int) + 0x24
- 1bc0 ffff8001 ....
- 1bc4 00000082 ....
- 1bc8 00801bdc .... Stack + 0x1bdc
- 1bcc 0000003c <...
- 1bd0 50162024 $ .P User::Panic(TDesC16 const &, int)
+ 0x24
- 1bd4 ffff8001 ....
- 1bd8 5016ce20 ..P
Panic(TCdtPanic) + 0x24
- 1bdc 10000004 ....
- 1be0 50178000 ...P TUnicode::CjkWidthFoldTable + 0x5408
|
|
38
|
- Look at all the functions that follow
- In my case, after cutting out the lines that looked garbled. I got:
- 1e18 50650031 1.eP CBaLockChangeNotifier::DoRunL(void) +
0x5d
- 1e24 5064fdef ..dP
RBaBackupSession::GetBacukupOperationEvent(..
- 1e5c 5064ff65 e.dP CBaLockChangeNotifier::RunL(void) +
0x19
- That was enough to tell me to look at the code for DoRunL(), and to put
some logging in there to see what is going on.
- That’s the basics for d_exc
|
|
39
|
- Same idea - we want to get panic reason and call stack:
- Firstly get the base porting people to show you how to enable crash
debugger build.
- Reproduce the problem - If the device enters crash debugger - then you
can get more information.
- You use a terminal program on the pc to “talk” the the crashed device.
|
|
40
|
- Same as always
- Which thread caused a panic or access violation
- What is the panic reason and number
- What is the callstack of the thread
|
|
41
|
- Launch terminal emulator (e.g. hyperterm) on your PC
- Connect serial port to serial port which provides debug tracing
- The terminal window should show a “password” prompt
- Type in “replacement” and you have entered the debug monitor prompt
- the kernel is frozen allowing you to interrogate it’s current state
|
|
42
|
- Type ‘f’ into the crash debugger to get the Fault information
- If the category is KERN 4 then you are in business.
- KERN 4 simply says that a panic happened in a system thread
- The actual panic, such as KERN-EXEC3, is hidden
- Type ‘i’ into the crash debugger to get information about the real panic
reason
- Sometimes even this doesn’t work - a non-system critical thread which
crashes can cause the process to exit if it is process critical e.g.
the main thread
- If another thread in the process is marked as system critical this will
take down the platform.
- A fool proof method of finding the real panic is to look at the output
from the KPanic debug tracing
|
|
43
|
- Type ‘r’ to get the values of all the registers
- the ones to look at depends on the processor mode!
- Type ‘c0’ to get the details of all the threads
- ‘C0’ will pause between each screen full
|
|
44
|
- The ARM Procedure Calling Standard (APCS)
- Imposes conventions on the use of registers
- So we always know the important registers to look at
|
|
45
|
- If you have KPanic debug tracing enabled use that to identify the panic
- RLibrary::Load - aFileName: BMPANSRV.DLL, -aPath: threadName: Wserv
- RLibrary::Load - OK
- RLibrary::Load ......1
- RLibrary::Load ......2
- RLibrary::Load ......3
- RLibrary::Load ......4
- RLibrary::Load ......5
- RLibrary::Load Init() - OK
- Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
- R0=00614a40 R1=806a21d7 R2=006029c4 R3=006029c4
- R4=0060f448 R5=0060f4c8 R6=006126b8 R7=0060f484
- R8=00000012 R9=00000040 R10=c8087d78 R11=00000000
- R12=8009fced R13=004060e0 R14=8108b0f8 R15=8144710c
- R13Svc=c924c000 R14Svc=80020108 SpsrSvc=00000010
- Thread 37, KernCSLocked=0
- FAULT: KERN 00000004
- Password: replacement
|
|
46
|
- The type of exception or why the processor was unhappy
- Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
|
|
47
|
- The processor mode or which registers are valid
- Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
|
|
48
|
- The Fault Address Register (FAR) indicates the dodgy address that was
accessed
- Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
- Least significant 4 bits of the Fault Status Register (FSR) indicates
the MMU fault
- Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
|
|
49
|
- The “i” command gives you a lot more information, but all you are
interested in is finding a fingerprint.
- <snip>
- THREAD at c8084ef0 VPTR=00000000 AccessCount=6 Owner=c80848a8
- Full name apprun.exe::Calcsoft
- Thread MState READY
- Default priority 16 WaitLink Priority 16
- ExitInfo 2,3,KERN-EXEC
- Flags 00000002, Handles c8084a70
- Supervisor stack base c9208000 size 4000
- User stack base 00402000 size 5000
- Id=29, Alctr=00600000, Created alctr=00600000, Frame=00406e1c
- <snip>
- R13_USR 8005e414 R14_USR 000002a8 SPSR_SVC c8084ef0
- R4 c8085198 R5 00000000 R6 00000000 R7 00000001
- R8 00000000 R9 8005e834 R10 000002a8 R11 c8085198
- PC 8005e81c
- TheCurrentProcess=c80848a8
- PROCESS at c80848a8 VPTR=00000000 AccessCount=7 Owner=00000000
- Full name apprun.exe
- ExitInfo 3,0,
- <snip>
|
|
50
|
|
|
51
|
- From the same information as previous page
- look at R15 and copy that
number.
- This is the PC - address of the
last instruction to execute in the thread which panicked
- Be careful to get the right version depending on the “mode” of the
processor
- Exc 1 Cpsr=48000030 FAR=01000003 FSR=00000001
- R0=00600080 R1=00405ee0 R2=08cc014c R3=00000000
- R4=00619ae8 R5=00000000 R6=00ffffff R7=ffffffff
- R8=00000012 R9=00000040 R10=c808a2e0 R11=00000000
- R12=800c3a35 R13=00405ee0 R14=810b1b59 R15=80728562
- R13Svc=c924c000 R14Svc=80020234 SpsrSvc=08000010
- Thread 37, KernCSLocked=0
- Thread eiksrvs.exe::!EikAppUiServer Die: 2,3,KERN-EXEC
- Thread eiksrvs.exe::!EikAppUiServer SetDefaultPriority 16
- Thread eiksrvs.exe::!EikAppUiServer SetRequiredPriority def 16 cleanup
-1 nest 0
- Thread eiksrvs.exe::!EikAppUiServer MState 2 SetActualPriority 16
- Exec::ThreadId
- Exec::SemaphoreWait
- Thread eiksrvs.exe::!EikAppUiServer Panic KERN-EXEC 3
- FAULT: KERN 00000004
|
|
52
|
- Lookup the PC to find the “top” of the callstack
- Either look at the symbol file directly or use printsym to decode the
address
- You were probably half way through a function
- so you may have to look for the closest match, e.g. where R15=80728562
- 80728498 0000
CAknViewAppUi::~CAknViewAppUi__sub_object() avkon.in(.text)
- 80728584 0010
CAknViewAppUi::~CAknViewAppUi__deallocating() avkon.in(.text)
- R14 (the link register – lr) may also give you a clue
- 810b1b56 001c
CCoeEnv::CreateResourceReaderLC(TResourceReader&, int)
const CONE.in(.text)
- 810b1b72 0074
CCoeEnv::ReadResourceAsDes16(TDes16&, int) const CONE.in(.text)
|
|
53
|
- From the same information as previous page
- look at R13 and copy that
number.
- This is the address of the
stack.
- Exc 1 Cpsr=48000030 FAR=01000003 FSR=00000001
- R0=00600080 R1=00405ee0 R2=08cc014c R3=00000000
- R4=00619ae8 R5=00000000 R6=00ffffff R7=ffffffff
- R8=00000012 R9=00000040 R10=c808a2e0 R11=00000000
- R12=800c3a35 R13=00405ee0 R14=810b1b59 R15=80728562
- R13Svc=c924c000 R14Svc=80020234 SpsrSvc=08000010
- Thread 37, KernCSLocked=0
- Thread eiksrvs.exe::!EikAppUiServer Die: 2,3,KERN-EXEC
- Thread eiksrvs.exe::!EikAppUiServer SetDefaultPriority 16
- Thread eiksrvs.exe::!EikAppUiServer SetRequiredPriority def 16 cleanup
-1 nest 0
- Thread eiksrvs.exe::!EikAppUiServer MState 2 SetActualPriority 16
- Exec::ThreadId
- Exec::SemaphoreWait
- Thread eiksrvs.exe::!EikAppUiServer Panic KERN-EXEC 3
- FAULT: KERN 00000004
|
|
54
|
- All you need to do now is
- Type command M. into the crash debugger with the address of the stack
from R13
- and take dump about 200 bytes of stack - that should be plenty.
- You can dump the stacks of all threads by using the ‘S’ command
- Command is…
|
|
55
|
- That will dump some HEX and text to your terminal:
- 00405ee0: 00 00 00 00 00 00 00 00 44 04 77 80 c6 56 00 10
........D.w..V..
- 00405ef0: 4c 01 cc 08 e8 9a 61 00 40 04 77 80 50 2a 60 00
L.....a.@.w.P*`.
- 00405f00: ff ff ff ff e3 78 72 80 00 00 60 00 18 00 00 00
.....xr...`.....
|
|
56
|
- Type the following into a windows command prompt :
|
|
57
|
- Always be aware to check whether you’re suffering from a stack overflow
- A stack overflow will cause unexplainable KERN-EXEC 3 errors
- If you can’t get hold of the stack (you see a line like that shown
below), it may indicate a stack overflow
- .m 00414ff8 00415fff
- Exception: Type 1 Code 80073280 Data 00414ff8 Extra 00000007
|
|
58
|
- You need the value of R13 and the thread id
- Exc 1 Cpsr=68000030 FAR=00414fe8 FSR=00000807
- R0=00415190 R1=00415190 R2=800e19bc R3=00000038
- R4=7fffffff R5=00000000 R6=00415028 R7=00000100
- R8=00000000 R9=00000040 R10=c808a2e8 R11=00000000
- R12=800c3699 R13=00414ff8 R14=800c8d43 R15=800c8a1c
- R13Svc=c931c000 R14Svc=8002014c SpsrSvc=08000010
- Thread 58, KernCSLocked=0
- Look up the details of the thread from the output of the ‘i’ or ‘c0’
commands to get the stack base
- THREAD at c809ae88 VPTR=00000000 AccessCount=3 Owner=c8089e30
- Full name eiksrvs.exe::KeySoundServerThread
- <snip>
- Supervisor stack base c9318000 size 4000
- User stack base 00415000 size 1000
- Id=58, Alctr=00600000, Created alctr=00600000, Frame=00415bd4
- <snip>
|
|
59
|
- The stack has overflowed if R13 < stack base
- Plus: the exception id will indicate a data abort
- Exc 1 Cpsr=68000030 FAR=00414fe8 FSR=00000807
- R12=800c3699 R13=00414ff8 R14=800c8d43 R15=800c8a1c
- User stack base 00415000 size 1000
|
|
60
|
- The output in will be similar to the decoded d_exc stack except
- The top of printout represents the function which called Panic() (with
d_exc you have to find the top)
- So your output may start with something like this:
- 00405f00: ff ff ff ff e3 78 72 80 00 00 60 00 18 00 00 00
.....xr...`.....
- = ffffffff ....
- = 807278e3 .xr.
CAknNoteAttributes::ConstructFromResourceL(TResourceReader&) avkon.in(.text) + 0x1a1
- = 00600000 ..`.
- = 00000018 ....
- 00405f10: 00 00 60 00 a1 40 0d 80 a8 60 40 00 ce 7b 61 00
..`..@...`@..{a.
- = 00600000 ..`.
- = 800d40a1 .@..
RHeap::Alloc(int)
euser.in(.text) + 0x8b
- = 004060a8 .`@.
- = 00617bce .{a.
|
|
61
|
- D_EXC knows how to decode RAM based symbols also
- D_EXC .txt file, lists any DLLs which were loaded into RAM (hence not
present in the ROM symbol file
- You have to place the .MAP file of every RAM DLL you are interested in
into the same directory as the d_exc trace
- Run printstk.pl as usual, and it
should pick up the addresses correctly
|
|
62
|
- Now that you know the thread, panic code and have the stack for both
application panics and system panics:
- It gives you a good idea of what functions to put logging in
- It quickly allows you to see if a defect is a duplicate (if the
callstack has already been posted on a previous defect)
|
|
63
|
- But with practice it’s a systematic method - not a black art
- Application and System thread panics cover 90% of application side
hardware crashes
- So learn how to diagnose these first, then worry about more advanced
debugging facilities
- Print out the documentation as you work, it will help you with other
kinds of problems and is a good reference
- Read the other debugging guides to help you understand what is going on
under the hood
|
|
64
|
- Try these techniques out on a panic you put in the code yourself
- so that you are confident about the result
- make sure the call stack matches what you know to be the problem
- When looking into a defect. It is often enough to find the component
which paniced
- This is what triage may do
- The component owner may then be able to take over apply some knowledge
and logging etc
- Sometimes it can be helpful to make every thread a system thread
- so all panics go to the debug monitor
|
|
65
|
- 5055ced4 004c
CMmPhoneTsy::UpdatePhoneIndicator(RMobileCall::TMobileCallStatus
- 5055cf20 0054
CMmPhoneTsy::UpdatePhoneIndicator(RMobilePhone::TMobilePhoneRegistr
- 5055cf74 001c
CMmPhoneTsy::GetSubscriberIdL(TBuf<15> &)
- 5055cf90 0038
CMmPhoneTsy::CompleteReadNamData(int, TPtrC8)
- 5055cfc8 0058
CMmPhoneTsy::CompleteProductInfoNumId(TBuf8<50> &)
|