PDA

View Full Version : Javascript: treat string as HTML


mmm..pi..3.14..
06-17-2005, 07:36 PM
I really have arrived at a pickle here...

Most people know about javascript commands like "document.getElementsByTagName()", "document.getElementById()", but the problem is that you can only use those to read the "innerHTML" of a file. What I need is to be able to use "getElementsByTagName()", except use it with a string that's formatted like HTML. :D

My script uses the XMLHttpRequest function to send a request to this site, then return the HTML code of the page. I need to extract all "link" tags (<a href="blablablabla">) from the string that I receive, which is actually just the source code of the page, so it's formatted like HTML.

This is for a FireFox extension, so the script won't work in an ordinary HTML file, but here is part of the script if it helps anyone.


var req = new XMLHttpRequest();
req.onreadystatechange = asorg.handleReadyStateChange;
req.open("POST", "http://www.actionscript.org/forums/search.php3", true);
req.setRequestHeader('Content-Type', 'text/html');
req.onerror = function()
{
req.abort();
}
req.onload = function()
{
if(req.readyState == 4 && req.status == 200){
var sourceCode = req.responseText;
alert(sourceCode);
}
}
req.send('&s=&do=process&searchuser=' + username + '&starteronly=0&exactname=1&replyless=0&replylimit=0&searchdate=30&beforeafter=after&sortby=lastpost&order=descending&showposts=0');


Again, a short explanation...treat a string as HTML so I can use the javascript command "getElementsById()" :)

Thanks,

Eric

acolyte
06-17-2005, 10:09 PM
Hello mmm...pi...3.14

i found a few things after googling around for about an hour :-)
-a little sad that about this crapload of pages you cannot find something specific anymore without spending hours and hours in the googlemask-

http://www.oracle.com/technology/products/database/htmldb/howtos/htmldb_javascript_howto2.html

http://www.pcquest.com/content/coding/2004/104080302.asp

http://www.codingforums.com/archive/index.php/t-1655.html

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemxmlxmlelementclassgetelementsbytagnamet opic.asp

http://www.quirksmode.org/blog/archives/2005/02/javascript_memo.html

mmm..pi..3.14..
06-17-2005, 11:46 PM
Very cool, but I found out some bad news...

It seems there IS such a thing as something that can create a "virtual" HTML document, unfortunately it's so new that it hasn't been implemented in browsers yet, right now it's only found in documentation. It would work like this according to the documentation I found:


var HTMLDoc = new HTMLDomImplementation();
HTMLDoc.createHTMLDocument(src);
HTMLDoc.getElementsByTagName("DIV");
....blablablabla


That seems like my only hope if I wanted to try and do a search by Tag Names or something similar. I think I'll have to go to a backup plan and use Regular Expressions to "strip" out the data I need. A pain in the @rse for coding, but it's fairly basic, so I see no reason why it wouldn't work ;)

Thanks anyways,

Eric :)

acolyte
06-18-2005, 01:32 AM
Haha Eric thats just friggin funny because
i`ve seen this treat and i was thinking / hey dude Mat anytime if you got a problem with
some shizle menizle stuff you dont know mmhh....pi....3.14 :cool: is answering promptly all about everything
so i`ve seen this wanted to post some Links because nobody postet anything until now

is there no function similar to

myString.Split(Htmlcode[Delimiter]Htmlcode[Delimiter]Htmlcode);

in Jscript ? / im not so the Js Godest you know

i hope i could help you with this :o

cheers Mat

mmm..pi..3.14..
06-18-2005, 04:31 AM
lol...funny :D

Unfortunately, I don't think there is...but I'm not totally sure. But you might be able to come up with a good pattern to solve this. This was my secondary option which I was trying to avoid but it's become pretty much obvious that I have to do it this way. I'm using the javascript RegExp function to grab the username from a link that appears when you do a search (long story...not easy to explain)

Now, where I'm trying to grab the name from appears in a link (in the HTML code) that always looks something like this:


<a href="http://www.actionscript.org/forums/member.php3?find=lastposter&amp;t=76279">!!username_Would-Go_Here!!</a>


So, using RegExp I'm easily able to strip out the data of a normal name (i.e. - name's without special characters), like CyanBlue, snapple, Ricod, Ruben, farafiro, etc, using the following script:


var re = new RegExp("find=lastposter[^A-Z0-9]+[0-9]+\">[A-Za-z0-9]+", "g");
var s = req.responseText; //the data that's received via the XMLHttpRequest object, the source code for the page
for (var results = re.exec(s); results != null; results = re.exec(s)) {
document.write(results[0] + "<br>");
}


That's easy, in my example code which I just copy/pasted and modified (just the names were modified), I was able to get the following output on my window:


find=lastposter&t=76236">CyanBlue
find=lastposter&t=76279">Ruben
find=lastposter&t=76169">snapple


Simple, eh??...but my problem lies in the people who have names with not-so-common characters in them (!, @, #, >, %, <, &, *, etc...). I can't just use this RegExp:


find=lastposter[^A-Z0-9]+[0-9]+\">[^<]+


The reason being because that will stop reading the data when it comes to the first "<" in the string. So I could take a chance and just hope that the person's name is not "<--Billy Bob-->", since it would quit reading the data at the first character, but the chances of someone not having that character in their name is not likely, I've already found at least 20 people with that in their name, so that's out.

What I need is some way to tell it to keep reading till it reaches the end of the link tag (</a>), since that seems like an extremely rare thing to have in your username, and it comes immediately after the end of the person username, as shown in the first bit of code in this post. I tried the following, but it of course would not work:


find=lastposter[^A-Z0-9]+[0-9]+\">^(</a>)+


If I was lucky, that would have kept reading data until it came across the string "</a>", but lady luck must fear Regular Expressions :(

Any thoughts anyone??

Thanks,

Eric :)

acolyte
06-18-2005, 03:08 PM
Hello Eric this is actually beyond my imagination skills

perhaps something like this

i just pasted this in the google mask and found something you may can use as reference



------------------sniped from the code--------------------

rxlastposter = [OGRegularExpression regularExpressionWithString: @"<a href=\"member.php.find=lastposter.t=[0-9]+\">(.*)</a>

----------------------------------------------------------

what i was wondering about meanwhile this search action is. With this Crapload of Pages nowadays it gets harder and harder if you want to find something specific in google

http://paste.lisp.org/display/2077 :cool:


thnx mat

mmm..pi..3.14..
06-18-2005, 06:44 PM
nope, no workie here.. :(

very cool though :cool: