Google clarifies cross-reference between row and column

I'm not sure if this can be achieved with Google Refine at all. But basically I have data like this.

enter image description here

enter image description here

The first table is the table of all users. The second table shows all friends. However, in the second table in the column, "friends"

not all id's exist in the first table that I want to get rid of. So how can I find every id in a column friends

in the second table and get rid of an id that doesn't exist in table 1?

+3


source to share


1 answer


Place the two tables in separate projects (we'll call them Table1

and Table2

).

In Table2

column friends

:

  • use "split multi-valued cells" to get each value on a separate row
  • convert the visitors column to numbers (or vice versa the user_id in table 1 to a string)
  • use "add new column based on this column" with the expression cross(cell,'Table1','user_id').length()



This will return 0 if there is no match, 1 if there is a match, or N> 1 if there are duplicates in table 1

If you want the data to be returned in its original format, set the facet to filter on the confidence column, cover up any bad values, and then use "concatenate multi-valued cells" to undo the split operation you did in front.

I fixed some caching errors with cross () for OpenRefine 2.6, so if cross doesn't work, try stopping and restarting the Refine server.

+4


source







All Articles