AppleScript Guidebook: Essential Sub-Routines

HTML Routines

AppleScript scripts are often used to read and write HTML text. The following sub-routines help automate some common tasks involving HTML markup.


Converting RGB to HTML Color

The following sub-routine can be used to convert RGB color values to the HEX-based format used in HTML documents.

An RGB color is stated as list of three numbers, each with a value between 0 and 65536. The following sub-routine converts those values to 8-bit or 256 color-based values which are then converted to their corresponding HEX values.

To use the sub-routine, pass it a list of RBG values and it will return the HTML code matching the passed RGB color.

RBG2HTML({13107, 489, 56020})
--> "#3301DA"
on RBG2HTML(RGB_values)
-- NOTE: this sub-routine expects the RBG values to be from 0 to 65536
set the hex_list to ¬
{"0", "1", "2", "3", "4", "5", "6", "7", "8", ¬
"9", "A", "B", "C", "D", "E", "F"}
set the the hex_value to ""
repeat with i from 1 to the count of the RGB_values
set this_value to (item i of the RGB_values) div 256
if this_value is 256 then set this_value to 255
set x to item ((this_value div 16) + 1) of the hex_list
set y to item (((this_value / 16 mod 1) * 16) + 1) of the hex_list
set the hex_value to (the hex_value & x & y) as string
end repeat
return ("#" & the hex_value) as string
end RBG2HTML


Removing Markup Codes From Text

This sub-routine can be used to remove angle bracket enclosed tags from text passed to the sub-routine.

set this_text to "This is a <B>great</B> time to own a Mac!"
remove_markup(this_text)
--> "This is a great time to own a Mac!"
on remove_markup(this_text)
set copy_flag to true
set the clean_text to ""
repeat with this_char in this_text
set this_char to the contents of this_char
if this_char is "<" then
set the copy_flag to false
else if this_char is ">" then
set the copy_flag to true
else if the copy_flag is true then
set the clean_text to the clean_text & this_char as string
end if
end repeat
return the clean_text
end remove_markup


Parsing an HTML File

The following large sub-routine can be used to extract specific tags and their contents from HTML text.

The routine will return all matches of a specific opening and closing tag combination passed to the sub-routine.

There is also a parameter for indicating whether to include the specific enclosing tags with the returned text.

You can use this sub-routine to do the following:


Return All Links in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown.

read_parse (this_file, "<A HREF=", "</A>", false)
--> <A HREF="http://www.apple.com/fileA.html">click here</A>
--> <A HREF="http://www.apple.com/fileB.html">click here</A>


Return All Images in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown. Note the passed value for the closing tag paramter is a null string (""). The sub-routine is written to pass the results as single tagged elements if the closing tag parameter is null.

read_parse (this_file, "<IMG ", "", false)
--> <IMG SRC="gfx/clipboard.gif" BORDER="0">
--> <IMG SRC="printer_stopped.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">
--> <IMG SRC="printer_on.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">


Return All Tables in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown.

read_parse (this_file, "<TABLE", "</TABLE>", false)

<TABLE WIDTH="440">
<TR>
<TD ALIGN="CENTER" VALIGN="TOP">
<FONT FACE="Geneva" SIZE="1">
<A HREF="../AppleScript%20Help">
AppleScript table of contents</A>
</FONT>
</TD>
</TR>
</TABLE>


on read_parse(this_file, opening_tag, closing_tag, contents_only)
try
set this_file to this_file as text
set this_file to open for access file this_file
set the combined_results to ""
set the open_tag to ""
repeat
read this_file before "<" -- start of a tag
set this_tag to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if this_tag does not start with "<" then ¬
set this_tag to ("<" & this_tag) as string
-- EXAMINE THE TAG
if this_tag begins with the opening_tag then
--store the complete tag, not just the search string
set the open_tag to this_tag
-- check for single tag indicator
if the closing_tag is "" then
if the combined_results is "" then
set the combined_results to the combined_results & ¬
the open_tag
else
set the combined_results to the combined_results & ¬
return & the open_tag
end if
else
-- reset the text buffer
set the text_buffer to ""
-- extract the contents between the open and close tags
repeat
set the text_buffer to the text_buffer & ¬
(read this_file before "<") -- start of a tag
set the tag_buffer to read this_file until ">" -- end of a tag
-- to make up for a bug in the "read before" command
if the tag_buffer does not start with "<" then ¬
set the tag_buffer to ("<" & the tag_buffer) as string
-- check for the closing tag
if the tag_buffer is the closing_tag then
if contents_only is false then
set the text_buffer to the open_tag & ¬
the text_buffer & the tag_buffer
end if
if the combined_results is "" then
set the combined_results to the combined_results & ¬
the text_buffer
else
set the combined_results to the combined_results & ¬
return & the text_buffer
end if
exit repeat
else
set the text_buffer to the text_buffer & the tag_buffer
end if
end repeat
end if
end if
end repeat
close access this_file
on error error_msg number error_num
try
close access this_file
on error
end try
if error_num is not -39 then return false
end try
return the combined_results
end read_parse