What | Get the whole picture of system , get a direction to anayze base on experience | |||
What's the issue? | What's the business? Which product components be used? How much the workload? | |||
Reproduciable or not? | ||||
Reproduciable? | Get the steps of reproduce | |||
Can't reproduce | Keep watching and collect required loggs | |||
Suggestion: monitor method or tuning ,try to find the regular pattern | ||||
Environment Information? | ||||
Component structure | platform/HA/Cluster | |||
Vendor & Versions | Any known issue of other products | |||
Bussiness data model | Data distribution | |||
External services | Any backend service : webservice | |||
Contact information | Email&Phone | |||
Follow up Space | 1. Easy To follow next actions 2. Archive history work | 1. new Email thread 2. new one forum thread to discussion | 1.Notes 2. Lotus connection forum of community | |
Why | To know current action ,and know what's the next base on different result by tools | |||
Reproduce issue in local or product base on steps from customer | ||||
In Local | Cost consideration (new or reuse), simulate business and data model | |||
In Customer | 1. Give guide to reproduce and collect required information 2. Get access information (host/user/pwd) , do backup firstly | |||
Issue Pattern | ||||
Always/Sometimes | 1. Always: resource shortage, need to monitor system resource usage. 2. Sometime: workload(hardware resource shortage or poor design) or time range(business or backend tasks). | |||
Single User/Multiple User | 1. Single : profile every tier cost to locate bottleneck 2. Multiple : check appserver/db pool usage (thread/connection…) | |||
High Resource usage | ||||
High CPU | 1. Using OS tools to monitor which process cost CPU 2. What's the incomming requests 3. Frequency GC | 1.nmon 2. GCMV 3.perfmon | ||
Crash | Which process cause | vmstat iostat | ||
High Response Time or low throughput | ||||
Functions/operations | To confirm the functions pattern 1.one. Find which one function slow, to know the business logic or backend service. 2. Some functions, they are simillar or not (retrieve / create ….) 3 all functions , to monitor system resource | |||
Isolate the logic tiers | 1. Custom API testing log time to confirm the issue of product or custom application 2. Using system performance tool to monitor all system resource usage to confirm which tier is bad 3. Check log to find any time | 1.qatool 2.nmon 3.perfmon/ | ||
Isolate the function module | 1. Profiling tool by single user. 2. Check trace log, apply for test env with single user. | 1.Jprofiler 2.WAS PMI 3.WAS Performance tuning toolkit | ||
Goals | ||||
customer | Discuss with custom to confirm a goal base on business | |||
common | Set a common goal base on industry standard | |||
How | To give different solution base on skills and resources | |||
CPU | ||||
High | 1. add CPU | |||
Low | Less network roundtrip | |||
Memory | ||||
wrap | 1. increase memory 2. increase heap size | |||
Disk | ||||
Busy | 1. RAID 2. less write logic | iostat/perfmon | ||
Network | ||||
App server | ||||
GC | 1. GC policy 2. Heap Size | |||
Cache | 1. increase cache size | |||
Pool limit | 1. increase pool size : thread pool/ data source | |||
DB | Snapshot SQL Explain | |||
Buffer pool | ||||
Poor Index | ||||
Bad Design ,need a workaround | ||||
Bad index | ||||
Load too much at one page | ||||
Summarize | Growth | |||
Personal | Skill Improvement | 1. Technical skill ,which knowledge is new ,to learn it 2. Troubleshooting skill ,experience improvement | ||
Customer | Best practise for other customers | 1.To design solution for other customers or resolve same problem 2. Base current customer env ,give some suggestion to system stable. | ||
Product | Design suggestion | 1.Analyze the reason of the issue,any possible to redesign the product? 2. Analyze why it doesn't be test? 3. Give a suggestion to improve product monitor |
Means: a. system resource limitation. b. bad logic.
- Always there (Means some resource is not enough)
- Single User? (To locate which tier consume the most time. Need profiling tool to figure out.)
- AppServer profiling tool
- JVM profiling too
- Trace log
- Application debug log
- Multiple Users? To locate the bottleneck
- AppServer
- Thread pool
- Data source
- Cache
- Database
- Agents limitation
- Locks
- Buffer pool limitation
- AppServer
- System resource limitation
- High CPU
- Memory
- Disk I/O
- Network
- JVM(Optional)
- Application Performance Data (to check application logic)
- Single User? (To locate which tier consume the most time. Need profiling tool to figure out.)
- Sometimes
- Check what's functional work doing at that time
- Batch operation
- Backup operation
- Migration operation
- Cache refresh
- Check any outside factors effection
- Check what's functional work doing at that time
- Percific functions (Need to know the application logic and relative DB operations)
- Authentication
- Create
- Update
- Delete
- Search
- Retrieve