从 HTML 文件中获取标题 |
|
<!-- <title>ack</title> -->
的罕见注释行)。
使用示例(来自 shell)
$ ls *.html cgi.html htaccess.html mod_include.html urlmapping.html configuring.html mod_auth.html mod_rewrite.html core.html mod_cgi.html rewriteguide.html $ ./title.lua *.html cgi.html: Apache Tutorial: Dynamic Content with CGI configuring.html: Configuration Files core.html: Apache Core Features htaccess.html: Apache Tutorial: .htaccess files mod_auth.html: Apache module mod_auth mod_cgi.html: Apache module mod_cgi mod_include.html: Apache module mod_include mod_rewrite.html: Apache module mod_rewrite rewriteguide.html: Apache 1.3 URL Rewriting Guide urlmapping.html: Mapping URLs to Filesystem Locations - Apache HTTP Server
以下是 Lua 程序title.lua
#!/usr/bin/env lua function getTitle(fname) local fp = io.open(fname, "r") if fp == nil then return false end -- Read up to 8KB (avoid problems when trying to parse /dev/urandom) local s = fp:read(8192) fp:close() -- Remove optional spaces from the tags. s = string.gsub(s, "\n", " ") s = string.gsub(s, " *< *", "<") s = string.gsub(s, " *> *", ">") -- Put all the tags in lowercase. s = string.gsub(s, "(<[^ >]+)", string.lower) local i, f, t = string.find(s, "<title>(.+)</title>") return t or "" end if arg[1] == nil then print("Usage: lua " .. arg[0] .. " <filename> [...]") os.exit(1) end i = 1 while arg[i] do t = getTitle(arg[i]) if t then print(arg[i] .. ": " .. t) else print(arg[i] .. ": File opening error.") end i = i + 1 end os.exit(0)
或者,可以使用 [lua-gumbo] 库
#!/usr/bin/env lua local gumbo = require "gumbo" local document = assert(gumbo.parseFile(arg[1] or io.stdin)) print(document.title)
在这种情况下,HTML5 解析器和Document.title
实现完全符合规范,应该产生与现代浏览器完全相同的结果。
lua-gumbo 可通过以下方式获得:luarocks install gumbo