HTML Routines
AppleScript scripts are often used to read and write HTML text.
The following sub-routines help automate some common tasks
involving HTML markup.
Converting RGB to HTML Color
The following sub-routine can be used to
convert RGB color values to the HEX-based format used in HTML documents.
An RGB color is stated as list of three numbers,
each with a value between 0 and 65536. The following sub-routine
converts those values to 8-bit or 256 color-based values
which are then converted to their corresponding HEX values.
To use the sub-routine, pass it a list of RBG values and it
will return the HTML code matching the passed RGB color.
RBG2HTML({13107, 489, 56020})
--> "#3301DA"
on RBG2HTML(RGB_values)
-- NOTE: this sub-routine expects the RBG values to be from 0 to 65536
set the hex_list to ¬
{"0", "1", "2", "3", "4", "5", "6", "7", "8", ¬
"9", "A", "B", "C", "D", "E", "F"}
set the the hex_value to ""
repeat with i from 1 to the count of the RGB_values
set this_value to (item i of the RGB_values) div 256
if this_value is 256 then set this_value to 255
set x to item ((this_value div 16) + 1) of the hex_list
set y to item (((this_value / 16 mod 1) * 16) + 1) of the hex_list
set the hex_value to (the hex_value & x & y) as string
end repeat
return ("#" & the hex_value) as string
end RBG2HTML
Removing Markup Codes From Text
This sub-routine can be used to remove angle bracket enclosed tags
from text passed to the sub-routine.
set this_text to "This is a <B>great</B> time to own a Mac!"
remove_markup(this_text)
--> "This is a great time to own a Mac!"
on remove_markup(this_text)
set copy_flag to true
set the clean_text to ""
repeat with this_char in this_text
set this_char to the contents of this_char
if this_char is "<" then
set the copy_flag to false
else if this_char is ">" then
set the copy_flag to true
else if the copy_flag is true then
set the clean_text to the clean_text & this_char as string
end if
end repeat
return the clean_text
end remove_markup
Parsing an HTML File
The following large sub-routine can be used to extract
specific tags and their contents from HTML text.
The routine will return all
matches of a specific opening and closing tag combination
passed to the sub-routine.
There is also a parameter for indicating whether
to include the specific enclosing tags with the returned text.
You can use this sub-routine to do the following:
Return All Links in an HTML Document
Pass the file path to the sub-routine
as the first parameter. Leave the other settings as shown.
read_parse
(this_file, "<A HREF=", "</A>", false)
--> <A HREF="http://www.apple.com/fileA.html">click here</A>
--> <A HREF="http://www.apple.com/fileB.html">click here</A>
Return All Images in an HTML Document
Pass the file path to the sub-routine
as the first parameter. Leave the other settings as shown. Note the passed value for
the closing tag paramter is a null string (""). The sub-routine is
written to pass the results as single tagged elements if the closing tag
parameter is null.
read_parse
(this_file, "<IMG ", "", false)
--> <IMG SRC="gfx/clipboard.gif" BORDER="0">
--> <IMG SRC="printer_stopped.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">
--> <IMG SRC="printer_on.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">
Return All Tables in an HTML Document
Pass the file path to the sub-routine
as the first parameter. Leave the other settings as shown.
read_parse
(this_file, "<TABLE", "</TABLE>", false)
<TABLE WIDTH="440">
<TR>
<TD ALIGN="CENTER" VALIGN="TOP">
<FONT FACE="Geneva" SIZE="1">
<A HREF="../AppleScript%20Help">
AppleScript table of contents</A>
</FONT>
</TD>
</TR>
</TABLE>
on read_parse(this_file, opening_tag, closing_tag, contents_only)
try
set this_file to this_file as text
set this_file to open for access file this_file
set the combined_results to ""
set the open_tag to ""
repeat
read this_file before "<" -- start of a tag
set this_tag to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if this_tag does not start with "<" then ¬
set this_tag to ("<" & this_tag) as string
-- EXAMINE THE TAG
if this_tag begins with the opening_tag then
--store the complete tag, not just the search string
set the open_tag to this_tag
-- check for single tag indicator
if the closing_tag is "" then
if the combined_results is "" then
set the combined_results to the combined_results & ¬
the open_tag
else
set the combined_results to the combined_results & ¬
return & the open_tag
end if
else
-- reset the text buffer
set the text_buffer to ""
-- extract the contents between the open and close tags
repeat
set the text_buffer to the text_buffer & ¬
(read this_file before "<") -- start of a tag
set the tag_buffer to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if the tag_buffer does not start with "<" then ¬
set the tag_buffer to ("<" & the tag_buffer) as string
-- check for the closing tag
if the tag_buffer is the closing_tag then
if contents_only is false then
set the text_buffer to the open_tag & ¬
the text_buffer & the tag_buffer
end if
if the combined_results is "" then
set the combined_results to the combined_results & ¬
the text_buffer
else
set the combined_results to the combined_results & ¬
return & the text_buffer
end if
exit repeat
else
set the text_buffer to the text_buffer & the tag_buffer
end if
end repeat
end if
end if
end repeat
close access this_file
on error error_msg number error_num
try
close access this_file
on error
end try
if error_num is not -39 then return false
end try
return the combined_results
end read_parse