User agent: PHP parsing, browscap and nonconventional "browsers"

1.1k Views Asked by At

I am looking for a reliable way to find out what user agent is requesting my PHP page(s). I'm aware of get_browser as well as $_SERVER['HTTP_USER_AGENT'] but neither seems to be reliable.

With get_browser, you need browscap PHP directive set to an ini file defining user agents. PHP recommends this one - http://browsers.garykeith.com/downloads.asp - so I installed the "full" one specified for LAMP.

It works great with get_browser for your usual mix of browsers, but I am specifically dealing with requests from MS Office. In that case, it seems to return nothing, like with OS X Excel (note: $_SERVER['HTTP_USER_AGENT'] is the first line, followed by array output of get_browser):

Mozilla/5.0 (Macintosh; Intel Mac OS X) Excel/14.0.0

Array
(
    [browser_name_regex] =     ^.*$
    [browser_name_pattern] =     *
    [browser] =     Default Browser
    [version] =     0
    [majorver] =     0
    [minorver] =     0
    [platform] =     unknown
    [alpha] =     
    [beta] =     
    [win16] =     
    [win32] =     
    [win64] =     
    [frames] =     
    [iframes] =     
    [tables] =     
    [cookies] =     
    [backgroundsounds] =     
    [javascript] =     
    [vbscript] =     
    [javaapplets] =     
    [activexcontrols] =     
    [isbanned] =     
    [ismobiledevice] =     
    [issyndicationreader] =     
    [crawler] =     
    [cssversion] =     0
    [aolversion] =     0
)

To make matters worse, it seems to not even mention Office on some Windows cases:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)

Array
(
    [browser_name_regex] =     ^mozilla/4\.0 (compatible; msie 7\.0.*; .*windows nt 6\.1.*).*$
    [browser_name_pattern] =     Mozilla/4.0 (compatible; MSIE 7.0*; *Windows NT 6.1*)*
    [parent] =     IE 7.0
    [platform] =     Win7
    [browser] =     IE
    [version] =     7.0
    [majorver] =     7
    [win32] =     1
    [frames] =     1
    [iframes] =     1
    [tables] =     1
    [cookies] =     1
    [backgroundsounds] =     1
    [javascript] =     1
    [vbscript] =     1
    [javaapplets] =     1
    [activexcontrols] =     1
    [cssversion] =     2
    [minorver] =     0
    [alpha] =     
    [beta] =     
    [win16] =     
    [win64] =     
    [isbanned] =     
    [ismobiledevice] =     
    [issyndicationreader] =     
    [crawler] =     
    [aolversion] =     0
)

Judging by these examples, it seems that get_browser is actually less reliable here and more information can be gathered from $_SERVER['HTTP_USER_AGENT'] which at least spits out a bunch of .NET references for the Office requests.

With that in mind, can anyone point me to a well written function that breaks down $_SERVER['HTTP_USER_AGENT']? Every search I ran ends up with recommendations to use get_browser instead.

Any thoughts on why get_browser seems to fail at referencing MS Office at all from Windows based installations are welcome too... here are 10 tests of various users clicking links in different Office apps ($_SERVER first then get_browser result): http://pastebin.com/5m2zWMrt - notice the lack of any signs of office after the first three examples from OS X. I also asked a related question over at MSDN: http://social.msdn.microsoft.com/Forums/en-US/officegeneral/thread/8ad594cd-0dfe-4110-8ffc-4d0caee4c29f

To sum it up, I'd like to get a short term solution going with a good parser of $_SERVER['HTTP_USER_AGENT'], ideally one that can figure out if the request is coming from MS Office. In the long term, I need to figure out why get_browser doesn't work with MS Office despite having an up-to-date ini with Office data.

1

There are 1 best solutions below

5
On

It's simply not possible for the server to correctly guess which browser/app it's talking to, as the provided user agents are - as you've discovered for yourself - unreliable to say the least. It's pretty easy to write a user agent parser for the most used browsers - but then but about the rest? MS Office, for instance?

So no, just don't try to guess the browser from the server side. 'Cause that's what you're doing, guessing - not knowing.

Why do you need to know the browser's make and model anyway? If it's to adapt the page because of which browser the user has, you should instead use CSS conditional comments and/or testing which features can be relied on with JavaScript. Be creative, and do everything else, just don't try to guess the browser.