RegEx PowerShell Compliance
I have the following website http://www.shazam.com/charts/top-100/australia which displays songs, I want to record songs using RegEx and PowerShell. Below is the PowerShell code:
$ie = New-Object -comObject InternetExplorer.Application
$ie.navigate('http://www.shazam.com/charts/top-100/australia')
Start-Sleep -Seconds 10
$null = $ie.Document.body.innerhtml -match 'data-chart-position="1"(.|\n)*data-track-title=.*content="(.*)"><a href(.|\n)*data-track-artist=\W\W>(.|\n)*<meta\scontent="(.*)"\sitemprop';$shazam01artist = $matches[5];$shazam01title = $matches[2]
data chart-position
track data-title
data track artist
Each of the songs listed has 3 values (above) associated with each one, I want to capture the Artist and Title for each song based on different positions (numbers) of the chart. So a regex to find the actual position of the chart, then the final Artist and Title.
If I run RegEx separately for Artist and Title (code below) it finds them, however it only finds the first Artist and Title. I need to find the Artist and Title for each song based on a different position in the chart.
$null = $ie.Document.body.innerhtml -match 'data-track-artist=\W\W>(.|\n)*<meta\scontent="(.*)"\sitemprop';$shazam01artist = $matches[2]
$null = $ie.Document.body.innerhtml -match 'data-track-title=.*content="(.*)"><a href';$shazam01title = $matches[1]
$shazam01artist
$shazam01title
source to share
Using a regular expression to parse partial HTML is an absolute nightmare, you may need to reconsider this approach.
Invoke-WebRequest
returns a named property ParsedHtml
that contains a reference to the preprocessed HTMLDocument object. Use this instead:
# Fetch the document
$Top100Response = Invoke-WebRequest -Uri 'http://www.shazam.com/charts/top-100/australia'
# Select all the "article" elements that contain charted tracks
$Top100Entries = $Top100Response.ParsedHtml.getElementsByTagName("article") |Where-Object {$_.className -eq 'ti__container'}
# Iterate over each article
$Top100 = foreach($Entry in $Top100Entries){
$Properties = @{
# Collect the chart position from the article element
Position = $Entry.getAttribute('data-chart-position',0)
}
# Iterate over the inner paragraphs containing the remaining details
$Entry.getElementsByTagName('p') |ForEach-Object {
if($_.className -eq 'ti__artist') {
# the ti__artist paragraph contains a META element that holds the artist name
$Properties['Artist'] = $_.getElementsByTagName('META').item(0).getAttribute('content',0)
} elseif ($_.className -eq 'ti__title') {
# the ti__title paragraph stores the title name directly in the content attribute
$Properties['Title'] = $_.getAttribute('content',0)
}
}
# Create a psobject based on the details we just collected
New-Object -TypeName psobject -Property $Properties
}
Now let's see how Tay-Tay is doing below:
PS C:\> $Top100 |Where-Object { $_.Artist -match "Taylor Swift" }
Position Title Artist
-------- ----- ------
42 Bad Blood Taylor Swift Feat. Kendrick Lamar
Sweet!
source to share