I'm thinking if it could be possible to get a site's "characteristic" color. For instance, TechCrunch would be green, ReadWriteWeb would be red, CNN also red, Microsoft blueish, PHP purple, etc...
It doesn't have to be accurate, just a best guess.
Some things I have on my mind:
- parse all css rules and find the one matching the most elements
- parse all css rules and find background colors of the elements having the biggest dimensions
- getting the body element's background image and getting the predominant color of that (is this possible for an image)
- somehow finding the site's "header" (first element in DOM with background css attribute set?) and getting its background
Also I would need a way to eliminate blacks, greys and white.
Is this feasible? Do you have any other ideas?
P.S. Sorry for my English
Feasible, definitely. You can use the
wgettool and some simple regular expressions to parse out CSS colors. You can then collect all those colors and see which one is used most. That will however not always be a good representation of the actual predominant color in a website as it could be possible that several colors occur in many CSS rules but aren't used often.This is actually a nontrivial project you have here.
My approach would be as follows:
<a>tags or<h1>tags (but not if they're grey or black/white).#FFEEEEis the same as#FFEAEA, as they're only marginally different.#FFF,#FFFFF,"white",rgb(255,255,255), and so on.R(120), G(240), B(80), it will most likely be green. Then count this for all the pixels and find the predominant component.To sum it up, the task you're defining is worth a thesis, in my opinion :)