Given the SPARQL / SPARQL Update endpoint, how to test empirically the RDF store for RDFS/RDFS+/OWL 1/2 * / maybe SPIN capabilities it offers? Is there any readily available set of queries?
That is, after running those queries and depending on the returned triples, the inference capabilities support (expressiveness) and level of compliance could be guessed.
This seems like really trivial idea (and probably a no-brainer for an expert to come up with the set of queries), but can't find such material anywhere.
Not sure whether SPARQL service description is relevant to this, but declaring something is one thing and really delivering is another.
I think you should have a look at SPARQL benchmarks. The DBpedia SPARQL Benchmark also does SPARQL feature analysis. While I don't know if this will cover all of the features you are looking for.
Other SPARQL benchmarks would be the Lehigh University Benchmark (LUBM), the Berlin SPARQL Benchmark (BSBM) or SP²Bench.
Further you could take a look at LODStats which gathers some statistics about dataset. While it currently is focused on analyzing data dumps the code already contains some capability of directly gathering statistics from SPARQL endpoints. You should be able to extend the code to also test some SPARQL features resp. features of the underlaying Triple Store.