Using regular expressions makes it quite easy to capture sub strings, e.g. the string "Jaco was an American bassist" matches this regular expression (PCRE2 syntax):
(?sm)^([Jj]aco).+(was|is).+?(American|famous).+(dancer|bassist|singer|programmer|dueller)
and captures these strings
- Jaco
- was
- American
- bassist.
Here is a DCG that matches the string as well as generating all the possible strings. But it doesn't capture the specific sub strings.
jaco_bassist --> ("J" ; "j"), "aco", space, ("was" ; "is"), space, ("a" ; "an"), space,
("American" ; "famous"), space,
("dancer" ; "bassist" ; "singer" ; "programmer" ; "dueller").
space --> " ".
What would be the best - or at last a good - way of getting the same captures using Prolog's DCGs. Preferably an approach that also generates the possible strings.
For simple problems like this one can use member/2 to enumerate all the alternatives:
jaco_bassist2([Name,WasIs,Adj,Noun]) --> who(Name), space, was_is(WasIs), space,
("a" ; "an"), space, adj(Adj), space,
noun(Noun).
who(Who) --> [Who], {member(Who,["Jaco","jaco"])}.
was_is(WasIs) --> [WasIs], {member(WasIs,["was","is"])}.
adj(Adj) --> [Adj], {member(Adj,["American","famous"])}.
noun(Noun) --> [Noun], {member(Noun,["dancer","bassist","singer","programmer","dueller"])}.
To get the captures:
% ...
phrase(jaco_bassist2,[Who,WasIs,Adj,Noun], String)
A major drawback of this approach is that for more complex structures the enumeration can be a little tricky, for example if the name in the subject string instead of "[Jj]aco" would be one of the 48 spellings of my last name (kjellerstrand):
kjellerstrand --> "k", ("je" ; "ä"), "ll", ("" ; "er" ; "ar"),
("st" ; "b"), ("" ; "r"), "a", (""; "n"), "d".
Please note that I'm looking for "basic" DCG, for example those supported by e.g. B-Prolog (i.e. not requiring SWI-Prolog's fancy DCG stuff).
Let me re-
phrasethat: Given a goalphrase(NT__0, Cs0,Cs), capture the sequence described byNT__0.First of all we need to restrict ourselves to DCGs without semicontext. For a (non-empty) semicontext may be represented with two variables (which in that context do not form a difference) but cannot be captured with a single list.
append(Capture, Cs, Cs0)should be it. At least declaratively when considering only ground terms.So far, the procedural reality of Prolog is a bit different.
append/3only works for lists but not for partial lists. There infinite, rational trees show up. And the occurs-check does not help that much, it just prevents the display of such answers, but keeps non-termination.Time for a new version of
append/3,append2u/3So with the help of the occurs-check it is possible to get this right, also for the more general case. A new non-terminal
phrase_capture//2now uses the following internal definition:For systems without a built-in occurs-check like B, rewrite
append2u/3usingunify_with_occurs_check/2explicitly. That is, also for(\=)/2.Some further optimizations may be done to avoid costs that depend on the size of
Cs0+Csinstead of the length ofCapture. Like special casing forvar(Cs),Cs == [], and partial strings. IfCsis a list constructor, an internal implementation may also just skip throughCs0to find that very address ofCsfirst, and only resort to more costly means otherwise. Care must be given to ensure that this is always terminating, thus using mechanisms similar to'$skip_max_list'/4.Also, what to do if
Cs0andCsdo not fit, that is, if they are not the result of a valid grammar. Such a case may happen with generalizations to explain unexpected failure.Usage:
So this version terminates also when generating strings. And it has the potential to incur costs that are in many cases only depending on the length of the captured string. The original version using
append/3will always visit the entire string.Lest I forget, there will always be some oddities should you be into infinite lists. Think of:
These are all typical paradoxa that infinite lists ensue. First luring people into them only to smother them.
The following version of
phrase_capture//2does not rely on internal details. It uses the^s oflibrary(lambda)which are responsible for parameter passing only. (The other lambda-related construct\is for renaming.)