Find cell in html table containing specific icon
I am looking for code that can tell me which cell of the html table a particular icon is in. This is what I am working with:
u <- "http://www.transfermarkt.nl/lionel-messi/leistungsdaten/spieler/28003/saison/2014/plus/1"
doc <- rvest::html(u)
tab <- rvest::html_table(doc, fill = TRUE)[[6]]
Column "Pos." indicates the player's position in the field. Some of them have an additional icon. I see the presence of these icons on the page as follows:
rvest::html_nodes(doc, ".kapitaenicon-table")
but that doesn't tell me where they are. I would like my code to return that the icon occurs in rows 2, 10, 11, 27 of the "Pos." In the table. How can i do this?
source to share
A bit more rvest
and XPath magic can get you indexes:
library(rvest)
library(magrittr)
library(XML)
pg <- html("http://www.transfermarkt.nl/lionel-messi/leistungsdaten/spieler/28003/saison/2014/plus/1")
pg %>%
html_nodes("table") %>%
extract2(6) %>%
html_nodes("tbody > tr") %>%
sapply(function(x) {
length(xpathSApply(x, "./td[8]/span[@class='kapitaenicon-table icons_sprite']")) == 1
}) %>% which
## [1] 2 10 11 27
This gets the 6th table, fetches tr
, then loops through them for the 8th td
with the correct span
/ class
in it. If the XPath search fails, it returns an empty list, so you can use length to determine which strings have td
, with an icon in them, and which don't.
It:
pg %>%
html_nodes(xpath="//table[6]/tbody/tr/td[8]") %>%
xmlSApply(xpathApply, "boolean(./span[@class='kapitaenicon-table icons_sprite'])") %>%
which
also works, and it's a little tighter (and faster). It uses XPath operation boolean
to test for existence. This is convenient if you have no other operations to perform on node (s).
This is a version xml2
, although I must believe there xml2
must be a better way to do it:
library(xml2)
library(magrittr)
pg2 <- read_html("http://www.transfermarkt.nl/lionel-messi/leistungsdaten/spieler/28003/saison/2014/plus/1")
pg2 %>%
xml_find_all("//table[6]/tbody/tr/td[8]") %>%
as_list %>%
sapply(function(x) {
inherits(try(xml_find_one(x, "./span"), silent=TRUE), "xml_node")
}) %>% which
UPDATE
For version 0.1.0.9000
of, xml2
I had to do the following:
pg2 %>% xml_find_all("//table") %>%
as_list %>%
extract2(6) %>%
xml_find_all("./tbody/tr/td[8]") %>%
as_list %>%
sapply(function(x) {
inherits(try(xml_find_one(x, "./span"), silent=TRUE), "xml_node")
}) %>% which
It doesn't have to be, and I wrote a bug report .
Session info -------------------------------------------------------------------------
setting value
version R version 3.2.0 (2015-04-16)
system x86_64, darwin13.4.0
ui RStudio (0.99.441)
language (EN)
collate en_US.UTF-8
tz America/New_York
Packages -----------------------------------------------------------------------------
package * version date source
curl * 0.5 2015-02-01 CRAN (R 3.2.0)
devtools * 1.7.0 2015-01-17 CRAN (R 3.2.0)
magrittr 1.5 2014-11-22 CRAN (R 3.2.0)
Rcpp * 0.11.5 2015-03-06 CRAN (R 3.2.0)
rstudioapi * 0.3.1 2015-04-07 CRAN (R 3.2.0)
xml2 0.1.0 2015-04-20 CRAN (R 3.2.0)
source to share