Scrape YouTube Internet Play list to get URL links (and Video Title)

User avatar
DocAElstein
3StarLounger
Posts: 263
Joined: 18 Jan 2022, 15:59
Location: Naked, in Hof, Bavaria, Germany

Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by DocAElstein »

Scrape YouTube Internet Play list to get URL links (and Video Title)


Hi
I am collecting / watching, making some notes from, a lot of YouTube videos from Play lists, some of those play lists have a lot of videos in them .

For various reasons it would be helpful to get a list of all the URLs from a play list, and if possible the accompanying tile. (If as a bonus I can get other info such as the date, view count then so much the better)

Current example: the play list here:
https://www.youtube.com/watch?v=rM-CtC6 ... JHdtoul_9A

Code: Select all

https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A 
Image

The links from all the videos in the play lists are a bit different, so I cant just take the first one and change index1 to index2 to index3 … etc.

Code: Select all

https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A     ( <-- Play list main link )
https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A&index=1
https://www.youtube.com/watch?v=YsnmNoq6OTg&list=UULFwInqvNXb-GN0JHdtoul_9A&index=2
https://www.youtube.com/watch?v=KIx_8comGEc&list=UULFwInqvNXb-GN0JHdtoul_9A&index=3
https://www.youtube.com/watch?v=LU-nln_3q9A&list=UULFwInqvNXb-GN0JHdtoul_9A&index=4
The macro below represents just about all my entire knowledge of scrapping a web site. It gets me a text document, WieGehtsYouTube.txt. I don’t understand all of it. But In the past I have often been able to find the info I want from that text file.
But in this case the text file does not seem to contain any useful info for me. Maybe it does, but in my ignorance I can’t see it.
That text file is not very big. Neither are the text files from the links for each video. They are all quite small and only slightly different in content.

WieGehtsYouTubePlayList.txt https://app.box.com/s/nxt27ivwnowkt1jy4npgktxof4ix6r6g
WieGehtsYouTubeIndex1.txt https://app.box.com/s/yavaakedxs3wes1u0qyy60o80rbwoxj0
WieGehtsYouTubeIndex2.txt https://app.box.com/s/bklozkokkyztdjz6w75pjir0kq3pql4r
WieGehtsYouTubeIndex3.txt https://app.box.com/s/3y5otnjekzbtogkzq1krwnm4ftbfjxm3


Any help please? I don’t care what form I get the info in , arrays, spreadsheet, text file etc. I am happy to manipulate the data and put it in some convenient form in a spreadsheet myself. Even if the data is hidden in a massive text file, then I expect I can do some string manipulation to get out what I want.

Thanks

Alan

Code: Select all

 Option Explicit
Sub WieGehtsYouTubeURL()   '        https://excelfox.com/forum/showthread.php/2656-Automated-Search-Results-Returning-Nothing            https://excelfox.com/forum/showthread.php/973-Lookup-First-URL-From-Google-Search-Result-Using-VBA
 On Error GoTo Bed
    '_1 First section get the long text string of the HTML coding of the internet Page
    '_1(i) get the long single text string
        With CreateObject("msxml2.xmlhttp")
         .Open "GET", "https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A", False ' 'just preparing the request type, how and what type... "The True/False argument of the HTTP Request is the Asynchronous mode flag. If set False then control is immediately returns to VBA after Send is executed. If set True then control is returned to VBA after the server has sent back a response.
         '.Open "GET", "https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A&index=1", False
         '.Open "GET", "https://www.youtube.com/watch?v=YsnmNoq6OTg&list=UULFwInqvNXb-GN0JHdtoul_9A&index=2", False
         'No extra info here for type GET
         .setRequestHeader bstrheader:="Ploppy", bstrvalue:="PooH" ' YOU MAY NEED TO TAKE OUT THIS LINE
                                                                                    '.setRequestHeader bstrheader:="If-Modified-Since", bstrvalue:="Sat, 1 Jan 2000 00:00:00 GMT" '  https://www.autohotkey.com/boards/viewtopic.php?t=9554  ---   It will caching the contents of the URL page. Which means if you request the same URL more than once, you always get the same responseText even the website changes text every time. This line is a workaround : Set cache related headers.
         .send ' varBody:= ' No extra info for type GET. .send actually makes the request
            While .readyState <> 4: DoEvents: Wend ' Allow other processes to run while the web page loads. Think this is part of the True option
        Dim PageSrc As String: Let PageSrc = .responseText ' Save the HTML code in the (Global) variable. ': Range("P1").Value = PageSrc 'For me for a print out copy to text file etc.    The responseText property returns the information requested by the Open method as a text string
        End With
    '_1(ii)  Optional secion  to put the text string into a text file , for ease of code developments
    Dim FileNum2 As Long: Let FileNum2 = FreeFile(0)                                  ' https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/freefile-function
    Dim PathAndFileName2 As String
     Let PathAndFileName2 = ThisWorkbook.Path & "\" & "WieGehtsYouTube" & ".txt"   '
    Open PathAndFileName2 For Output As #FileNum2 ' ' The text file will be made if not there, and if it is there and already contains data, then the data will be overwritten
     Print #FileNum2, PageSrc '
     Close #FileNum2
    
Exit Sub  '  Normal code error in the case of no errors
Bed:
 MsgBox prompt:=Err.Number & ":  " & Err.Description: Debug.Print Err.Number & ":  " & Err.Description
End Sub   ' Code end in the case of any error
_.________________________________________________________________________________________________________________

WieGehtsYouTube.txt https://app.box.com/s/rz4nfqoeeqtuqfp9zt0fsq4obncumyj9
WieGehtsYouTube.xls https://app.box.com/s/a4q4v4zfxtwh6pzqicw6ntuz8qtddbfr
You do not have the required permissions to view the files attached to this post.
Sure you can give me a DVM instead, when you take the AVOmeter out of my cold dead hands,

YasserKhalil
PlatinumLounger
Posts: 4727
Joined: 31 Aug 2016, 09:02

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by YasserKhalil »

May be using this line instead is better

Code: Select all

With CreateObject("MSXML2.ServerXMLHTTP")
Now you can a text file with too lengthy content

Then you can dig out to find out the desired information

Code: Select all

        Dim sTitle As String
        sTitle = Split(Split(PageSrc, """title"":{""runs"":[{""text"":""")(1), """}]}")(0)
        
        Dim sViews As String
        sViews = Split(Split(PageSrc, """shortViewCount"":{""simpleText"":""")(1), """}}}")(0)

User avatar
SpeakEasy
3StarLounger
Posts: 318
Joined: 27 Jun 2021, 10:46

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by SpeakEasy »

The problem here is that Trident (MSHTML) by default presents itself as Internet Explorer (since Trident is indeed the engine that drove that browser). And unfortunately YouTube no longer support IE.

Fortunately we can get Trident to lie ... :-)

Simply add

Code: Select all

.setRequestHeader "User-Agent", "Chrome"
directly after after

Code: Select all

 .Open "GET", "https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A", False ' 'just preparing the request type, how and what type... "The True/False argument of the HTTP Request is the Asynchronous mode flag. If set False then control is immediately returns to VBA after Send is executed. If set True then control is returned to VBA after the server has sent back a response.
This should mean you get a more expansive, more useful response from Youtube ...

However, it may not be all that useful in the end. YouTube does NOT like being scraped, so they obfuscate their links by e..g putting them into iFrames and requiring jscript. I've never managed to properly scrape; I understand people have had some success with Selenium. tools

User avatar
DocAElstein
3StarLounger
Posts: 263
Joined: 18 Jan 2022, 15:59
Location: Naked, in Hof, Bavaria, Germany

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by DocAElstein »

YasserKhalil wrote:
22 Jan 2023, 20:57
May be using this line instead is better....

sTitle = Split(Split(PageSrc, """title"":{""runs"":[{""text"":""")(1), """}]}")(0)
sViews = Split(Split(PageSrc, """shortViewCount"":{""simpleText"":""")(1), """}}}")(0)
Wow, thanks very much Yasser.
I tried that and I have now a much much much bigger text file!
Image

And it looks initially like possibly all the information is there somewhere.

And those code lines with the sTitle and sViews appear to be getting me that information as well

That’s great! Thanks again.

Alan

Code: Select all

 Sub WieGehtsYouTubeURLBigTextFile()   '     Yasser   https://eileenslounge.com/viewtopic.php?p=303638#p303638        https://excelfox.com/forum/showthread.php/2656-Automated-Search-Results-Returning-Nothing            https://excelfox.com/forum/showthread.php/973-Lookup-First-URL-From-Google-Search-Result-Using-VBA
 On Error GoTo Bed
    '_1 First section get the long text string of the HTML coding of the internet Page
    '_1(i) get the long single text string
        With CreateObject("MSXML2.ServerXMLHTTP")
         .Open "GET", "https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A", False ' 'just preparing the request type, how and what type... "The True/False argument of the HTTP Request is the Asynchronous mode flag. If set False then control is immediately returns to VBA after Send is executed. If set True then control is returned to VBA after the server has sent back a response.
         '.Open "GET", "https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A&index=1", False
         '.Open "GET", "https://www.youtube.com/watch?v=YsnmNoq6OTg&list=UULFwInqvNXb-GN0JHdtoul_9A&index=2", False
         'No extra info here for type GET
         '.setRequestHeader bstrheader:="Ploppy", bstrvalue:="PooH" ' YOU MAY NEED TO TAKE OUT THIS LINE
                                                                                    '.setRequestHeader bstrheader:="If-Modified-Since", bstrvalue:="Sat, 1 Jan 2000 00:00:00 GMT" '  https://www.autohotkey.com/boards/viewtopic.php?t=9554  ---   It will caching the contents of the URL page. Which means if you request the same URL more than once, you always get the same responseText even the website changes text every time. This line is a workaround : Set cache related headers.
         .send ' varBody:= ' No extra info for type GET. .send actually makes the request
            While .readyState <> 4: DoEvents: Wend ' Allow other processes to run while the web page loads. Think this is part of the True option
        Dim PageSrc As String: Let PageSrc = .responseText ' Save the HTML code in the (Global) variable. ': Range("P1").Value = PageSrc 'For me for a print out copy to text file etc.    The responseText property returns the information requested by the Open method as a text string
        End With
    Dim sTitle As String
     Let sTitle = Split(Split(PageSrc, """title"":{""runs"":[{""text"":""")(1), """}]}")(0)
    
    Dim sViews As String
     Let sViews = Split(Split(PageSrc, """shortViewCount"":{""simpleText"":""")(1), """}}}")(0)
    
    
    
    '_1(ii)  Optional secion  to put the text string into a text file , for ease of code developments
    Dim FileNum2 As Long: Let FileNum2 = FreeFile(0)                                  ' https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/freefile-function
    Dim PathAndFileName2 As String
     Let PathAndFileName2 = ThisWorkbook.Path & "\" & "WieGehtsYouTubeBig" & ".txt"   '
    Open PathAndFileName2 For Output As #FileNum2 ' ' The text file will be made if not there, and if it is there and already contains data, then the data will be overwritten
     Print #FileNum2, PageSrc '
     Close #FileNum2
    
Exit Sub  '  Normal code error in the case of no errors
Bed:
 MsgBox prompt:=Err.Number & ":  " & Err.Description: Debug.Print Err.Number & ":  " & Err.Description
End Sub   ' Code end in the case of any error
_._______________________________________

WieGehtsYouTubeBig.txt https://app.box.com/s/gs73n1roxdufyc8kzpadmscq6g3sx9nq
Last edited by DocAElstein on 23 Jan 2023, 08:09, edited 2 times in total.
Sure you can give me a DVM instead, when you take the AVOmeter out of my cold dead hands,

User avatar
DocAElstein
3StarLounger
Posts: 263
Joined: 18 Jan 2022, 15:59
Location: Naked, in Hof, Bavaria, Germany

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by DocAElstein »

SpeakEasy wrote:
22 Jan 2023, 21:26
The problem here is that Trident (MSHTML) by default presents itself as Internet Explorer (since Trident is indeed the engine that drove that browser). And unfortunately YouTube no longer support IE......
Thanks, I thought there seemed to be a lot of the text in the original small text files saying about unsupported browsers.
I will check your suggestions out as well and report back
Sure you can give me a DVM instead, when you take the AVOmeter out of my cold dead hands,

User avatar
DocAElstein
3StarLounger
Posts: 263
Joined: 18 Jan 2022, 15:59
Location: Naked, in Hof, Bavaria, Germany

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by DocAElstein »

SpeakEasy wrote:
22 Jan 2023, 21:26
Simply add

Code: Select all

.setRequestHeader "User-Agent", "Chrome"
OK, so that’s also got me a big text file, pretty well the same size, just very slightly bigger than that from Yasser’s suggestion.
Image

So looks like I have some work to do digging into those 2 big text files.

Thanks a lot both of you. If I get something automated working well I will report back

Alan

_._______________

WieGehtsYouTubeChrome.txt https://app.box.com/s/vgvytvrary5253woenln695fz90wynuh

Code: Select all

 Sub WieGehtsYouTubeURLChrome()   '  SpeakEasy   Mike   https://eileenslounge.com/viewtopic.php?p=303639#p303639        https://excelfox.com/forum/showthread.php/2656-Automated-Search-Results-Returning-Nothing            https://excelfox.com/forum/showthread.php/973-Lookup-First-URL-From-Google-Search-Result-Using-VBA
 On Error GoTo Bed
    '_1 First section get the long text string of the HTML coding of the internet Page
    '_1(i) get the long single text string
        With CreateObject("msxml2.xmlhttp")
         .Open "GET", "https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A", False ' 'just preparing the request type, how and what type... "The True/False argument of the HTTP Request is the Asynchronous mode flag. If set False then control is immediately returns to VBA after Send is executed. If set True then control is returned to VBA after the server has sent back a response.
         '.Open "GET", "https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A&index=1", False
         '.Open "GET", "https://www.youtube.com/watch?v=YsnmNoq6OTg&list=UULFwInqvNXb-GN0JHdtoul_9A&index=2", False
         'No extra info here for type GET
         .setRequestHeader "User-Agent", "Chrome"
         '.setRequestHeader bstrheader:="Ploppy", bstrvalue:="PooH" ' YOU MAY NEED TO TAKE OUT THIS LINE
                                                                                    '.setRequestHeader bstrheader:="If-Modified-Since", bstrvalue:="Sat, 1 Jan 2000 00:00:00 GMT" '  https://www.autohotkey.com/boards/viewtopic.php?t=9554  ---   It will caching the contents of the URL page. Which means if you request the same URL more than once, you always get the same responseText even the website changes text every time. This line is a workaround : Set cache related headers.
         .send ' varBody:= ' No extra info for type GET. .send actually makes the request
            While .readyState <> 4: DoEvents: Wend ' Allow other processes to run while the web page loads. Think this is part of the True option
        Dim PageSrc As String: Let PageSrc = .responseText ' Save the HTML code in the (Global) variable. ': Range("P1").Value = PageSrc 'For me for a print out copy to text file etc.    The responseText property returns the information requested by the Open method as a text string
        End With
    Dim sTitle As String
     Let sTitle = Split(Split(PageSrc, """title"":{""runs"":[{""text"":""")(1), """}]}")(0)
    
    Dim sViews As String
     Let sViews = Split(Split(PageSrc, """shortViewCount"":{""simpleText"":""")(1), """}}}")(0)
    
    '_1(ii)  Optional secion  to put the text string into a text file , for ease of code developments
    Dim FileNum2 As Long: Let FileNum2 = FreeFile(0)                                  ' https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/freefile-function
    Dim PathAndFileName2 As String
     Let PathAndFileName2 = ThisWorkbook.Path & "\" & "WieGehtsYouTubeChrome" & ".txt"   '
    Open PathAndFileName2 For Output As #FileNum2 ' ' The text file will be made if not there, and if it is there and already contains data, then the data will be overwritten
     Print #FileNum2, PageSrc '
     Close #FileNum2
    
Exit Sub  '  Normal code error in the case of no errors
Bed:
 MsgBox prompt:=Err.Number & ":  " & Err.Description: Debug.Print Err.Number & ":  " & Err.Description
End Sub   ' Code end in the case of any error
Sure you can give me a DVM instead, when you take the AVOmeter out of my cold dead hands,

User avatar
DocAElstein
3StarLounger
Posts: 263
Joined: 18 Jan 2022, 15:59
Location: Naked, in Hof, Bavaria, Germany

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by DocAElstein »

Hi, some feed back and solution sharing…
I got the first full working solution(s). I pretty well got what I want and need. :-)
I won’t give in depth detail to anything as at the moment any coding I doned so far is very the very first crude, simplest novice type inefficient spreadsheet interaction hard coding rubbish. But I will tidy it all up and improve it in the future, so I will share some links to where I have the coding and results and stuff now and where I will also have the improved stuff later.
That might be helpful to someone catching this thread much later on a “YouTube scrapping” search, ( and as ever help me find my messy organised stuff!)

Some brief notes of what I did, problems etc.
_ In the long play list I looked at it seems you only get a text file of all the stuff I want for a bit more than 75 videos at a time. This makes sense and ties up with the experience when you view manually in real time: The scroll box only goes up to on average a bit over the first 75. ( https://i.postimg.cc/tTh0kcxJ/Only-get- ... one-go.jpg )
Scrapping that, or rather to say, playing around with the text file from the page source text from this

Code: Select all

 https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A    '  --- main play list link
,give links of this form

Code: Select all

https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A&index=1
https://www.youtube.com/watch?v=YsnmNoq6OTg&list=UULFwInqvNXb-GN0JHdtoul_9A&index=2
……  up to about       &index=79      
If you want the next chunk of videos, and a new text file of it all, you have to click on a video towards the bottom. ( https://i.postimg.cc/65L3ydNF/Click-tow ... xt-lot.jpg )
I thought I would keep stuff in some organised order, so tried getting all the text in a text file from these 9 links, the ones ending with &index=1, &index=76, &index=151, &index=226 …. 301, 376,451,526,601
That sort of worked…. eventually…
_ I end up with 9 big text files to play with So that is sort of Part 1. I got now all the info I need, somewhere I expect, in those files… https://bit.ly/3wqCh1A


_ a small snag: Previously using the main link,

Code: Select all

https://www.youtube.com/watch?v=rM-CtC6cklI&list=UULFwInqvNXb-GN0JHdtoul_9A 
, gets the first 79 and with the index number, which is not essential but useful to have. But use a link with the extra &index=123 and I can’t find or get the index number from those 9 text files. Could be hidden there somewhere. I can’t see it initially. Maybe later.
No matter, not so important
_ ( I am actually using initially a hybrid Yasser/ SpeakEasy suggestion code to get those. So
Object "MSXML2.ServerXMLHTTP"
and the
.setRequestHeader "User-Agent", "Chrome".
Maybe that’s a sort of “belt and braces” approach? I don’t know. I have not had the time to look in great detail at the differences yet in the three files. The hybrid comes out the smallest of the three.
( https://bit.ly/3H6sBym ) )

Part 2
I decided to get out all 11 digit unique YouTube bits you have in a typical YouTube video link
I used a macro to get that for all 9 text files. ( https://bit.ly/3XTw2PI )
For now I manually copy that 9 times and stick it all in column A of my main file, WieGehtsYouTube.xls ( https://bit.ly/3XRHko5 ).
Image
(There are a few extra videos that seems to be advertisements or some video he recommends from someone else. Doesn’t matter – its obvious usually from the title wots wot. ( There is also duplicates due to overlap - by clicking on the 76th link I will get in both the first two text files the 11 digit bit for the 76th 77th 78th and 79th
The last main macro checks for and ignores duplicates ) )

Part 3 How I got all the info I wanted
The last main coding to get Title and other info

This went easier and smother than I thought it would.
And The final spreadsheet interaction coding isn’t that slow, - it’s still speed of light compared with doing it all manually, as I was. It’s actually nice to watch the spreadsheet filling up. Its fun when you think of the days of boring manual copying and pasting its saving and you get an initial check that the data looks sensible.
I don’t want to do this a thousand times a day for a year, - more like a few times a day for a couple of weeks. So I might stay with the slower novice code, - it’s easier to check and change. (It’s a bit cold though. I might put some clothes on. I don’t need to view this in my default skin really).
There is not much point in explaining in detail how I manipulated the text file to get all the information I wanted. I expect if I did it a dozen times , forgetting every time how I did it the last time, then I would end up with as many different solutions. Just a matter of messing with string manipulation.
Here is a final macro https://bit.ly/3kITNLM

One thing I did find nice is that Split Split stuff from Yasser. Maybe lots of people know about and use that. I saw it for the first time and it’s a very nice way to get a working coding to get stuff out of a big text file. Like…

jdhAJ Ex I want this Zed llmbldsm
So split by Ex , take second array element (1) from that
, then split the result by Zed and take the first element (0) of that
Simple but nice- I had always previously done some Instr Left Right Mid stuff before
So you can have a nice Pretty one line to start with,
Split(Split(PageSource, " ")(1), " ")(0)
Find what you are looking for, then drop in a bit of the stuff either side
= Split(Split(PageSource, " Ex ")(1), " Zed ")(0)

_.____________________

I am aware that the profi way to do all this is to chuck the page source text ( the big text file ) in a HTMLDoc thing, and do some OOP type workings on that to get stuff.
This might also be a thing easy done in Power Query. ( I mostly got up to Office 2013, so I probably got PQ, but don't know what to do with it, Lol )
Any comments, contributions, are of course always very welcome, but I am happy I got a good solution I can use to get on with for now.
Thanks again for the help

Alan

_.___________________

Bit more detail and info starting here
https://bit.ly/3Hsga10
https://bit.ly/3Hv96Rl
Sure you can give me a DVM instead, when you take the AVOmeter out of my cold dead hands,

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 14077
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by ChrisGreaves »

SpeakEasy wrote:
22 Jan 2023, 21:26
... I've never managed to properly scrape; I understand people have had some success with Selenium. tools
@Alan: I see that you claim that you are almost there, so this response is more or less to confirm Stuart's comment.
@All: If you've never heard of Selenium, I recommend the first of a zillion video tutorials at WiseOwl

I spent about ten years (2000-2010) scraping with ie, but then web pages got a little smarter and about October 2021 tried out Selenium.
I had a modicum of success grabbing data from a succession of pages from AirBnb and used the scraped data to obtain contact details for BnBs, something with AirBnb struggles to prevent.
See the topics ,
Selenium and VBA - traps for Young Players, and Web Elements - Question 3 SwissCows : Selenium.Clear method doesn’t work.

I still have 60+ Word2003 & VBA documents with answers to some questions.

Cheers, Chris
A watched pot never boils over

User avatar
DocAElstein
3StarLounger
Posts: 263
Joined: 18 Jan 2022, 15:59
Location: Naked, in Hof, Bavaria, Germany

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by DocAElstein »

Hi Chris,
Thx for that input…..
ChrisGreaves wrote:
25 Jan 2023, 11:44
., so this response is more or less to confirm Stuart's comment…
Stuart? He ‘aint even in this Thread – I think I recall he don’t like YouTube. (- maybe they banned him because of his green jacket, Lol)
_.____
ChrisGreaves wrote:
25 Jan 2023, 11:44
@All: If you've never heard of Selenium, I recommend the first of a zillion video tutorials at WiseOwl
That’s good to know, thanks. – because Learning Selenium is on my long list of things to try and get around to do, and that WiseOwl chap was pretty well the best YouTube learning person I came across when I was learning VBA, he is a real YouTube good teacher, IMO.
_._____________________________________________-

I doubt I will get around to learning Selenium for a while.
What I got so far pretty well does close to it all what I need just now on this one. It’s come out much better and easier than I thought. So far nothing I want is hidden as SpeakEasy suggested it might be. Could have been just lucky. The coding was easier than I thought it would be.
I have not checked out other channels play lists yet with my coding. But with getting on for 1000 videos on the subject I am interested in, the channel I am considering here will keep me busy for a while. ( On my computer generated list from my coding, some videos came out that he had recommended from other people. They scraped for me with my coding pretty well the same with no problem. )
I know that sites can change, but if I get this all done in the next few weeks, then I won’t be interested in doing anything similar for a while. Unless I am very unlucky I expect the sites will stay the same for a while.
(The only thing my coding is not doing on my current project is downloading the videos and converting to .wmv ( 1080 x 1920 res ).
I doubt VBA can do that, and there are/ were a lot of problems for me just now with downloading YouTube things with strange things in the title and/or the German characters , like ö ä ü etc. They screw up downloaders working as well as causing problems playing with the files with VBA when they are in windows file explorer

This is the way it’s all working for me just now, ( the downloading and sorting side of it )

_1 I got all the info now I want in the Excel file. I use the list on my excel file to copy the URL link from, and then I paste that in 4K downloader manually. So far 4K has been very tolerant of strange things in the titles of the video.

_2) Soon I will have a full folder full of .mp4 videos.
They come out in some weird text formats sometimes in the windows file explorer.
No matter: I can manipulate/ change the titles appearing in windows explorer list manually and with some VBA. In other words I change the .mp4 file names a bit.
The .mp4 videos with the sanitised file name I then paste in AVC to convert to .wmv ( 1080 x 1920 res ).
( Previously I have just pasted the URL directly in AVC and it downloads and converts in one go. But AVC has been crapping out badly with most of the videos in this play list because of the weird stuff in the titles. Strangely so far I have not had that problem with 4K. Just a lucky break there, perhaps)

That’s it. Done. Or it will be in a few days.


Alan
Sure you can give me a DVM instead, when you take the AVOmeter out of my cold dead hands,

User avatar
StuartR
Administrator
Posts: 12166
Joined: 16 Jan 2010, 15:49
Location: London, Europe

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by StuartR »

It's not that I don't like YouTube, just that I prefer reading to listening

Maybe I should change my avatar to one with my red or yellow jacket
StuartR


User avatar
DocAElstein
3StarLounger
Posts: 263
Joined: 18 Jan 2022, 15:59
Location: Naked, in Hof, Bavaria, Germany

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by DocAElstein »

I also prefer reading and writing to learn stuff, and also generally, watching Telly and YouTube Stuff is not my thing.
But I found that strangely I pick up things subconsciously in a very efficient way. So what I do when learning something new is get a lot of videos together first on a very long play list, chuck them on a few of the older computers I have all over the place and let them run in the background while I am doing something else.
A few weeks later I sit down and read and write on the subject, I find that somehow the weeks before of the stuff running in the background seems to make that go better and easier. I have a problem that I always miss the point the first few times around. I have learnt that this video running in the background on the subject a few weeks before seems to strangely cure that problem a bit..


I come from a very primitive background and only by the best of luck and very hard work got to University. For my first major job interview in the last year, everyone said that at the very least I must get myself a Jacket. I had absolutely no idea about those things, what was normal, acceptable.
I tried to get a purple one. Only years later I realised one of the reasons why a lot of the people at the time were laughing behind my back :(
One thing you notice when you are English and come to Germany for the first time is the much more colorful Jackets, glasses and house colors , at least that was the case 20-30 years ago. Maybe England got prettier in the meantime, it couldn't have got much uglier.


Great Yellow!!!
I was going to suggest that!!! ( Of all the colors I do see walking about here, Yellow makes a refreshing change - I don't see much of that color , not in the otherwise very colorful Jackets here)

(Just recently I have been thinking we as humans have made a bit of a mistake putting too much emphasis on or taking for granted that everyone is in clothes almost all the time. A bit of lateral thinking is needed perhaps. I did a few things without clothes recently and got some unexpected results. Joking aside it does seem to do something psychologically sometimes, somehow it can free your thoughts a bit maybe. I really think almost anyone could benefit from trying it a few times )
(Edit:- I am often finding I re learn things again…, As a research engineer, especially as one a bit out of my depth , I often did a ridiculous amount of unpaid overtime to get the job done. I got to know when I was almost certain to be alone there. Sometimes I walked around naked. ( Just wearing my socks. Now its just my socks and home made Hernia Belt/ string thing/thong)
I only just remembered that. I had a short but extremely succesful career, maybe that contributed to that.. )
Sure you can give me a DVM instead, when you take the AVOmeter out of my cold dead hands,

User avatar
SpeakEasy
3StarLounger
Posts: 318
Joined: 27 Jun 2021, 10:46

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by SpeakEasy »

> hidden as SpeakEasy suggested it might be

Ah, I didn't say hidden. I said obfuscated, subtly different!

See, typically we'd walk the DOM to scrape a web page. But these YouTube don't fully populate the DOM with the correct info until a browser renders them (at that time a bunch of javascript is fed a bunch of parameters to create the final contents and layout of the page). So we cannot walk the DOM as the objects we have access to (e.g MSHTML) do not have the ability to examine a rendered page. Selenium, on the other hand, can.

What YOU are doing is effectively hacking the source code ...

Pretty much all the stuff you are picking out from the response text is bits of json data that represents the data that gets passed to those javascripts

User avatar
DocAElstein
3StarLounger
Posts: 263
Joined: 18 Jan 2022, 15:59
Location: Naked, in Hof, Bavaria, Germany

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by DocAElstein »

Ah, OK , Thanks, good to know these things, - I think I get the general idea, or the overview at least. I can’t really understand what all that Jason Jaffa cake script stuff is about – I am well out of my depth there. But I like to hear someone who knows about it tell it like it is. Sometimes I take something subconsciously in that then later gets me to twig to something of greater importance.

I did actually do something the “proper way” rather than a “hack", way back, all be it with a very lot of help from someone at a forum. I had no idea what at all was going on then, having just come out of a self imposed technology and computer coma, living as a hermit for 25 years….
I know it’s like there are typically two parts to such a scrapping thing, like in this screenshot:
Image

I see what you are saying I think. Makes sense. The first bit is where I got the start bit of my original coding from here in this Thread ( the one we are in now) - I took that from the top part of that older coding of mine in the screenshot.

Back then when I did the coding in that screenshot, I got a lot with that original coding, and thought in my ignorance that I must do it that way with the two code sections. I did not initially get the point that the first bit gets a text file that is not so difficult to read/”hack”. (Rather strangely convenient that actually)**
In that screenshot, the second half is that DOM object thing you are talking about, which gives something that a profi knows what to do with in a OOP style-eo . Back then I never realised that I could fairly easily read/ “hack” the text. In ingorance I assumed that text was something like this ewkfckjehkjhf kwenkfnhhkifhwkfhe98t4343ufc – in a totally unreadable stuff to the human eye, like secret coding stuff.

It was a pleasant surprise to find the text quite readable,( as it was also a similar nice surprise to learn that I could save a word doc as .htm , then import that as a text file and hack around stuff in that quite easily. )

I wonder sometimes if we humans are stupidly obfuscated and confusing ourselves, at least sometimes, unecerssarily. Like someone might learn at college all that DOM OOP stuff and get very competent at it doing clever things with it. Then someone like YouTube doesn’t “play the game” anymore and don’t do it in the standard convention, so his DOM object model stuff don’t work anymore and he’s snookered. He can’t see the wood for the trees, and forgets it’s easy to hack. I know that is a bit oversimplifying things. I expect all this DOM object model stuff is optimised for great efficiency and stuff. I don’t need that. If my code does in a few seconds or at most minutes what takes me otherwise days, while I relaxed drink my coffee and watch it, then I am very happy.


**I am puzzled that YouTube on the one hand stop the profi DOM way from working, but then don’t somehow make the text in a wzfkjwhgoiweheejj8 form that can’t be read by anyone …… maybe the answer there is that
_ they would, if we all used a browser that could understand all the wzfkjwhgoiweheejj8 form. Maybe we are lucky for the time being , - maybe later they will somehow encourage browser manufacturers to go in business with them and make stuff to suit them that we can’t make head nor tail of…
Or
_ perhaps the answer is they don’t care what little things little people like me do, whereas if the efficient DOM stuff worked, some profis could do something much more significant that they might not like.
Sure you can give me a DVM instead, when you take the AVOmeter out of my cold dead hands,

User avatar
SpeakEasy
3StarLounger
Posts: 318
Joined: 27 Jun 2021, 10:46

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by SpeakEasy »

> stop the profi DOM way from working

To be fair, they don't really, not for playlists (what they actually hide is the real file links, the actual MP4s ...) The problem is Trident. Trident doesn't run the necessary javascript to update the DOM correctly since it doesn't support latest javascript as used by YouTube. Sure, we lied about it being 'Chrome' to get past the YouTube block, but in reality it still uses the aging engine that underpinned IE11 (this effects ServerXML and WinHTTP as well), and as a result can no longer run polymer_desktop (a javascript library) applications - and the YouTube player is a polymer_desktop application.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 14077
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by ChrisGreaves »

StuartR wrote:
25 Jan 2023, 13:06
... just that I prefer reading .... Maybe I should change my avatar to one with my red or yellow jacket
Stuart, who am I to argue with "never judge a book by its [dust] jacket"? :laugh:
Cheers, Chris
A watched pot never boils over

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 14077
Joined: 24 Jan 2010, 23:23
Location: brings.slot.perky

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by ChrisGreaves »

SpeakEasy wrote:
25 Jan 2023, 16:57
What YOU are doing is effectively hacking the source code ...
Guilty as charged.
I would like another fifty years of hacking to be taken into account ... :grin:

(truth is, I don't care how I steal data as long as the client pays me for it :rofl: :rofl: :rofl: )
Cheers, Chris
A watched pot never boils over

User avatar
SpeakEasy
3StarLounger
Posts: 318
Joined: 27 Jun 2021, 10:46

Re: Scrape YouTube Internet Play list to get URL links (and Video Title)

Post by SpeakEasy »

hah! :laugh: :laugh: :laugh: