I'm using the yaml-cpp library to parse yaml. Abbreviated sample:
YAML::Node def = YAML::LoadFile(defFile);
for (auto itemPair = def.begin(); itemPair != def.end(); ++itemPair) {
// Grab a reference so `itemPair->second` doesn't need to be copied all over the place
auto& item = itemPair->second;
// A few instances of the below in series
if (item["key"].IsDefined()) { doSomething(item["key"].as<std::string>()); }
// Problem happens here
if (item["issue"].IsDefined()) {
if (!item["issue"].IsMap()) { continue; }
for (auto x = item["issue"].begin(); x != item["issue"].end(); ++x) {
LOG(INFO) << "Type before: " << item.Type() << " : " << itemPair->second.Type();
auto test = x->first.as<std::string>();
LOG(INFO) << "Type after: " << item.Type() << " : " << itemPair->second.Type();
// Using item as a map fails because it no longer is one!
// Next loop attempt also crashes when it attempts to use [] on item.
}
}
The problem happens in the nested loop, where the reference taken at the beginning of the snippet suddenly changes, however the variable it's referencing seems to be unaffected:
I1218 12:44:04.697798 296012 main.cpp:123] Type before: 4 : 4
I1218 12:44:04.697813 296012 main.cpp:125] Type after: 2 : 4
My understanding of references is that they act as an alias for another variable. I understand the yaml library might be doing some magic behind the scenes which would change the underlying data, but I can't comprehend why the reference seems to be getting updated yet the original value remains.
Edit: Some serious mind-blowing behaviour is happening here. The reference gets "reset" back to the correct value after any call to itemPair->second.Type()
. Thus if I add another log call:
LOG(INFO) << "Type after: " << item.Type() << " : " << itemPair->second.Type();
LOG(INFO) << "Type afterer: " << item.Type() << " : " << itemPair->second.Type();
The result:
I1218 12:58:59.965732 297648 main.cpp:123] Type before: 4 : 4
I1218 12:58:59.965752 297648 main.cpp:125] Type after: 2 : 4
I1218 12:58:59.965766 297648 main.cpp:126] Type afterer: 4 : 4
Reproducible example:
test.yaml
:
---
one:
key: x
issue:
first: 1
two:
key: y
issue:
first: 1
second: 2
main.cpp
same as above but with hardcoded test.yaml
, LOG
swapped for std::cout
, and the mock function:
#include <iostream>
#include <yaml-cpp/yaml.h>
void doSomething(std::string x) { std::cout << "Got key: " << x << std::endl; }
int main() {
YAML::Node def = YAML::LoadFile("test.yaml");
for (auto itemPair = def.begin(); itemPair != def.end(); ++itemPair) {
// Grab a reference so `itemPair->second` doesn't need to be copied all over the place
auto& item = itemPair->second;
// A few instances of the below in series
if (item["key"].IsDefined()) { doSomething(item["key"].as<std::string>()); }
// Problem happens here
if (item["issue"].IsDefined()) {
if (!item["issue"].IsMap()) { continue; }
for (auto x = item["issue"].begin(); x != item["issue"].end(); ++x) {
std::cout << "Type before: " << item.Type() << " : " << itemPair->second.Type() << std::endl;
auto test = x->first.as<std::string>();
std::cout << "Type after: " << item.Type() << " : " << itemPair->second.Type() << std::endl;
std::cout << "Type afterer: " << item.Type() << " : " << itemPair->second.Type() << std::endl;
// Using item as a map fails because it no longer is one!
// Next loop attempt also crashes when it attempts to use [] on item.
}
}
}
}
Result:
$ ./build/out
Got key: x
Type before: 4 : 4
Type after: 2 : 4
Type afterer: 4 : 4
Got key: y
Type before: 4 : 4
Type after: 2 : 4
Type afterer: 4 : 4
Type before: 4 : 4
Type after: 2 : 4
Type afterer: 4 : 4
Node
is designed to hold a reference, and the iterator behaves like a pointer tostd::pair<Node, Node>
and will return a temporaryNode
. If you bind to thatNode
, you will bind to a destroyedNode
. So you need a copy here. Changeauto&
toauto
will solve the problem.It is designed this way, because it doesn't want you to touch the underneath memory. Otherwise a possible dangling reference could be made when reallocating the memory.
An example for dangling reference:
Also, I wrote why it is designed this way. See here: https://github.com/jbeder/yaml-cpp/issues/977#issuecomment-771041297 and I'll just copy it here.
Why reference is UB here.
When using
->
, the iteratoriter
creates a temporary derefence result on the stack, returns its pointer, and destroys this object immediately after scope.This is to make
iter->second
behaves alike the(*iter).second
.It is hard to decide when to destroy that object if you put that deref result on the heap.
The behavior is expected to be the same as
(*iter).second
. But(*iter).second
is a rvalue, the compiler won't allowauto&
. That's not the case in theiter->second
, because the compiler thinksiter->second
as a lvalue.C++ standards makes
p->m
, the built-in member of pointer expression, a lvalue. So there's no way to forbid binding to reference.In conclusion, the behavior is correct when
Here are some possible amendments for the author:
detail::iterator_value
object long living or just simply leak the memory.operator->()
.auto
.Method 1 can cause much trouble. Method 2, 3 are good solutions, I think.
Why copy works like a reference here.
Node
.Node
is the public interface of the underlying memory. It is designed to become polymorphism. And the real type of the underlyding data is decided during running, which is unknown in the compile time. Soauto& list = iter->second
is impossible to bind to the correct underlying type.This could be done with some efforts. It will be something like
but still not that convenient to use.
You cannot bind to it. It only allows you to copy, not write.
Node
ensures you to use his interface to assign the value. It's quite important because assigning a data means 3 or more things to do.std::pair, std::array, std::list, std::vector, std::map, bool, Binary
NodeType
.isDefined
.When reading, it also need to decode if the data was encoded. So it shouldn't give you direct write/read access.
Copy works like a reference is a must in the current design.
Conclusion
Use
auto iter = iter->first;
or use(*iter).first
.