Uniqing delimited file based on subset of fields
I have data like below:
1493992429103289,207.55,207.5
1493992429103559,207.55,207.5
1493992429104353,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
Due to the nature of the last two columns, their values ββchange throughout the day, and their values ββare repeated regularly. By concatenating the path given in my desired output (below), I can view every time their values ββhave changed (with the enoch time in the first column). Is there a way to achieve the desired result shown below:
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
So, I am consolidating the data across two columns. However, the consolidation is not entirely unique (as seen in reiteration 207.55, 207.5)
I tried:
uniq -f 1
However, the output only prints the first line and doesn't go through the list
The awk solution below does not allow repetition of the output that was previously printed, and therefore gives the result (below the awk code):
awk '!x[$2 $3]++'
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
I dont want to sort the data by two two columns. However, since the first time is an epoch, it can be sorted by the first column.
source to share
You can use the operator Awk
as below,
awk 'BEGIN{FS=OFS=","} s != $2 && t != $3 {print} {s=$2;t=$3}' file
which produces output as needed.
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
The idea is to store the values ββof the second and third columns in variables s
and t
accordingly print the contents of the row only if the current row is unique.
source to share
You can try manually (with a loop) comparing the current line with the previous line.
previous_line=""
# start at first line
i=1
# suppress first column, that don't need to compare
sed 's@^[0-9][0-9]*,@@' ./data_file > ./transform_data_file
# for all line within file without first column
for current_line in $(cat ./transform_data_file)
do
# if previous record line are same than current line
if [ "x$prev_line" == "x$current_line" ]
then
# record line number to supress after
echo $i >> ./line_to_be_suppress
fi
# record current line as previous line
prev_line=$current_line
# increment current number line
i=$(( i + 1 ))
done
# suppress lines
for line_to_suppress in $(tac ./line_to_be_suppress) ; do sed -i $line_to_suppress'd' ./data_file ; done
rm line_to_be_suppress
rm transform_data_file
source to share
Since your first field is a fixed length of 18 characters (including the separator ,
), you can use an option -s
uniq
that would be more optimal for large files:
uniq -s 18 file
Gives this output:
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
From man uniq
:
-f num
Ignore the first num fields on each line of input when doing comparisons. The field is a string of nonblank characters, separated from adjacent fields by spaces. Field numbers are based on one, i.e. The first field is a field.
-s chars
Ignore the first characters of characters in each line of input when doing comparisons. If specified in combination with the -f option, the first characters after the first num fields will be ignored. Character numbers are based on one, that is, the first character is a character character.
source to share