Find the most frequently edited files in the box
We are currently planning a quality improvement exercise and I would like to target the most edited files in our transparencies. Since we've just gone through the bug fixing phase, the most frequently edited files should be good at pointing out where the code is most error-prone, and therefore most in need of quality improvement.
Does anyone know if there is a way to get a list of the 100 most edited files? This will preferably cover changes that occur across multiple branches.
source to share
(the previous answer was for a simpler case: single branch )
Since "most dev projects did not all happen on the same branch, so version numbers do not necessarily mean most edited", the "way to get the number of checks across all branches" would be:
- find all versions created since the date of the last bug fix phase,
- sort them by file,
- then on entry.
Something line by line:
C:\Prog\cc\test\test>ct find -all -type f -ver "created_since(16-Oct-2009)" -exec "cleartool descr -fmt """%En~%Sn\n""""""%CLEARCASE_XPN%"""" | grep -v "\\0" | awk -F ~ "{print $1}" | sort | uniq -c | sort /R | head -100
Or, for Unix syntax:
$ ct find -all -type f -ver 'created_since(16-Oct-2009)' -exec 'cleartool descr -fmt "%En~%Sn\n" "%CLEARCASE_XPN%"' | grep -v "/0" | awk -F ~ '{print $1}' | sort | uniq -c | sort -rn | head -100
- replace the date with a label indicating the start of the error correction phase.
- Note again the double quotes around "
%CLEARCASE_XPN%
" to place spaces in the filenames. - <
%CLEARCASE_XPN%
' Is used here , not'%CLEARCASE_PN%
'because we want all versions. -
grep -v "/0"
here to exclude version 0 (/main/0
,/main/myBranch/0
, ...) -
awk -F ~ "{print $1}"
used to print only the first part of each line:C:\Prog\cc\test\test\a.txt~\main\mybranch\2
becomesC:\Prog\cc\test\test\a.txt
- From there, counting and sorting will start:
-
sort
to make sure every identical row is grouped. -
uniq -c
to remove duplicate lines and count the number of duplicates before each remaining line -
sort -rn
(orsort /R
for Windows) for the most editable files at the top -
head -100
to save only the 100 most edited files.
-
Again, GnuWin32 comes in handy for a single layer version of Windows.
source to share
(see answer for more complex case: multiple branches )
First, use a dynamic view: it's easier and faster to update its content and tinker with its configuration specification rules.
If a bug is fixed in a branch starting at a given tag, set up a dynamic view with the following config spec as:
element * .../MY_BRANCH/LATEST
element * MY_STARTING_LABEL
element * /main/LATEST
Then you will find all files with their current version number (closely related to the number of changes)
ct find . -type f -exec "cleartool desc -fmt """%Ln\t\t%En\n""" """%CLEARCASE_PN%""""|sort /R|head -100
This is Windows syntax (except for triple "double quotes" around %CLEARCASE_PN%
to place spaces in filenames.
command ' head
' comes from the GnuWin32 library .
The most edited version is at the top of the list.
The Unix version will be as follows:
$ ct find . -type f -exec 'cleartool desc -fmt "%Ln\t\t%En\n" "$CLEARCASE_PN"' | sort -rn | head -100
The most edited version will be at the top.
Don't forget that raw numbers are not enough for metrics, trends are also important .
source to share