Closed Bug 1201314 Opened 9 years ago Closed 9 years ago

Allow considering classes that aren't ours (e.g., in std::) as if MOZ_NON_MEMMOVABLE

Categories

(Developer Infrastructure :: Source Code Analysis, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla44

People

(Reporter: jld, Assigned: jld)

References

Details

Attachments

(1 file, 2 obsolete files)

The analysis in bug 1159433, which prevents known non-memmove()-safe classes from being used to instantiate containers that would memmove() them, would be more powerful if we could apply it to classes we don't control, even if it's just a static blacklist in the plugin source.

For example, the stlport implementation of std::basic_string, and possibly others, uses an nsAutoCString-like representation for short strings; bug 1198979 comment #0 has more information.

(What would be *really* nice would be if we could detect non-memmove()-safety via an escape analysis, but manual blacklisting is much easier to implement.  Also, this lets us use a Linux desktop build to detect issues for other platforms.)
(In reply to Jed Davis [:jld] {UTC-7} from comment #0)
> The analysis in bug 1159433, which prevents known non-memmove()-safe classes
> from being used to instantiate containers that would memmove() them, would
> be more powerful if we could apply it to classes we don't control, even if
> it's just a static blacklist in the plugin source.
> 
> For example, the stlport implementation of std::basic_string, and possibly
> others, uses an nsAutoCString-like representation for short strings; bug
> 1198979 comment #0 has more information.
> 
> (What would be *really* nice would be if we could detect
> non-memmove()-safety via an escape analysis, but manual blacklisting is much
> easier to implement.  Also, this lets us use a Linux desktop build to detect
> issues for other platforms.)

I think the best option for us right now would be a blacklist directly within the plugin. Anything else is adding too much complexity without much benefit.
I'm not sure if subclassing CustomTypeAnnotation is the right way to do this, and I'm especially not sure if I'm interrogating the LLVM data structures in the right way.  (But it works.)
Attachment #8658926 - Flags: review?(ehsan)
Attachment #8658926 - Flags: feedback?(michael)
Also, this patch is based on top of the one from bug 1201309; they'll conflict if applied out of order.
Comment on attachment 8658926 [details] [diff] [review]
Patch: consider std::basic_string non-memmovable

Review of attachment 8658926 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/clang-plugin/clang-plugin.cpp
@@ +269,3 @@
>    bool hasLiteralAnnotation(QualType T) const;
> +  // Allow subclasses to apply annotations to external code:
> +  virtual bool hasFakeAnnotation(const CXXRecordDecl *D) { return false; }

Nit: please indent properly.  (You can use clang-format with LLVM style.)

@@ +290,5 @@
> +protected:
> +  bool hasFakeAnnotation(const CXXRecordDecl *D) override {
> +    // ...is there an easier way to test if this is `std::basic_string`?
> +    if (const auto* DC = D->getDeclContext()) {
> +      if (const auto* NC = dyn_cast<NamespaceDecl>(DC)) {

Please use getDeclarationNamespace() instead.
Attachment #8658926 - Flags: review?(ehsan) → feedback+
Comment on attachment 8658926 [details] [diff] [review]
Patch: consider std::basic_string non-memmovable

Review of attachment 8658926 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/clang-plugin/clang-plugin.cpp
@@ +264,5 @@
>        dumpAnnotationReason(Diag, T, Loc);
>      }
>    }
>  
> +protected:

Only hasFakeAnnotation has to be protected, I feel like we can leave the rest private.

@@ +269,3 @@
>    bool hasLiteralAnnotation(QualType T) const;
> +  // Allow subclasses to apply annotations to external code:
> +  virtual bool hasFakeAnnotation(const CXXRecordDecl *D) { return false; }

I'd generalize this to use TagDecl *D rather than CXXRecordDecl *D. That way we also cover things like Enums in the future.

Also, I see no reason for this to not be `const`, so I'd mark it as such. (Especially important if we do as I suggest and move the callsite into hasLiteralAnnotation).

@@ +288,5 @@
> +  MemMoveAnnotation()
> +      : CustomTypeAnnotation("moz_non_memmovable", "non-memmove()able") {}
> +protected:
> +  bool hasFakeAnnotation(const CXXRecordDecl *D) override {
> +    // ...is there an easier way to test if this is `std::basic_string`?

I _think_ you can do `D->getQualifiedNameAsString() == "::std::basic_string"`, but I haven't tested it. I'd `printf("%s\n", D->getQualifiedNameAsString().c_str());` to check what is dumped out, and then figure out what check has to happen here.

@@ +292,5 @@
> +    // ...is there an easier way to test if this is `std::basic_string`?
> +    if (const auto* DC = D->getDeclContext()) {
> +      if (const auto* NC = dyn_cast<NamespaceDecl>(DC)) {
> +        bool isStd = false;
> +        if (const auto* NCI = NC->getIdentifier()) {

LLVM style has *s with the variable name.

@@ +314,5 @@
> +  }
> +};
> +
> +static MemMoveAnnotation NonMemMovable = MemMoveAnnotation();
> +

extra space

@@ +800,5 @@
>    }
>  
>    // Recurse into base classes
>    if (const CXXRecordDecl *Decl = T->getAsCXXRecordDecl()) {
> +    if (hasFakeAnnotation(Decl)) {

I feel like the logic of "does this type have an annotation" should be kept in the hasLiteralAnnotation method. I'd call hasFakeAnnotation there.
Attachment #8658926 - Flags: feedback?(michael) → feedback+
Some other std:: containers usually are implemented as objects that will break when memmoved. Some that come to mind are std::list (usually implemented as a doubly-linked list with the root object pointed back to by the head and tail elements), std::set/multiset/map/multimap (usually implemented as red/black trees with parent pointers), and std::unordered_set/_multiset/_map/_multimap/ (sometimes includes std::list as member).
We should probably deal with all of those classes here too.
(In reply to Ehsan Akhgari (don't ask for review please) from comment #7)
> We should probably deal with all of those classes here too.

I briefly experimented with blacklisting all of std::, with an earlier version of the patch where the non-memmovable check still used its own traversal instead of CustomTypeAnnotation  If I recall correctly, the only exceptions I had to make to get my local build to finish were std::pair, std::atomic, and superclasses/members of std::atomic (but possibly I could have gotten away with just whitelisting std::__atomic_base).  Since I'm going to have to make a bunch of changes to the patch anyway, I can try to make that work.
Sounds good.  Note that I think we shouldn't just blacklist std::*, instead we should figure out what things are actually non-memmovable and only blacklist those.  I don't want this analysis to start become overly conservative.
(In reply to Ehsan Akhgari (don't ask for review please) from comment #9)
> Sounds good.  Note that I think we shouldn't just blacklist std::*, instead
> we should figure out what things are actually non-memmovable and only
> blacklist those.  I don't want this analysis to start become overly
> conservative.

This does require examining all supported STL implementations, and there is some risk that a new release of an implementation will invalidate previous findings. Also, not all STL implementations do things the same way. For example, STLPort implements std::list differently from VS.
How about blacklisting STL objects with non-trivial copy constructors? Would that work?
(In reply to Michael Layzell [:mystor] from comment #11)
> How about blacklisting STL objects with non-trivial copy constructors? Would
> that work?

That would disallow all types that are actually broken by memmoving, but it would also disallow every type for which memmoving violates the C++ standard but does not actually cause bad behavior, which was a concern in comment 9.
(In reply to q1 from comment #12)
> (In reply to Michael Layzell [:mystor] from comment #11)
> > How about blacklisting STL objects with non-trivial copy constructors? Would
> > that work?
> 
> That would disallow all types that are actually broken by memmoving, but it
> would also disallow every type for which memmoving violates the C++ standard
> but does not actually cause bad behavior, which was a concern in comment 9.

I think that our problem is that we don't ever truly _know_ the implementation of STL classes, so we should treat them as black boxes. If they have a non-trivial copy constructor, that means that they are black boxes which, per spec, "require some code to be run" to copy them around. That code could be equivalent to memmove, or it could be something more complex, but per the spec we can't know what it is.

Thus, I think we have to assume that memmoving std:: classes with non-trivial copy constructors is unsafe, as we don't know the implementation. My thought behind this idea was to be slightly less over-conservative than ehsan was worrying. Based on the fact that jld ran into so few problems with his attempt to make all std:: classes non-memmovable, I don't think it would end up being a big problem, and we can whitelist things if we need to.

I'm not super opinionated on this though - I'd be fine with just keeping a blacklist.
(In reply to Michael Layzell [:mystor] from comment #13)
> (In reply to q1 from comment #12)
> > (In reply to Michael Layzell [:mystor] from comment #11)
> > > How about blacklisting STL objects with non-trivial copy constructors? Would
> > > that work?
> > 
> > That would disallow all types that are actually broken by memmoving, but it
> > would also disallow every type for which memmoving violates the C++ standard
> > but does not actually cause bad behavior, which was a concern in comment 9.
> 
> I think that our problem is that we don't ever truly _know_ the
> implementation of STL classes, so we should treat them as black boxes....
> Thus, I think we have to assume that memmoving std:: classes with
> non-trivial copy constructors is unsafe, as we don't know the
> implementation.

I agree.

> My thought behind this idea was to be slightly less
> over-conservative than ehsan was worrying. Based on the fact that jld ran
> into so few problems with his attempt to make all std:: classes
> non-memmovable, I don't think it would end up being a big problem, and we
> can whitelist things if we need to.

That seems like a good idea.
(In reply to Michael Layzell [:mystor] from comment #13)
> (In reply to q1 from comment #12)
> > (In reply to Michael Layzell [:mystor] from comment #11)
> > > How about blacklisting STL objects with non-trivial copy constructors? Would
> > > that work?
> > 
> > That would disallow all types that are actually broken by memmoving, but it
> > would also disallow every type for which memmoving violates the C++ standard
> > but does not actually cause bad behavior, which was a concern in comment 9.
> 
> I think that our problem is that we don't ever truly _know_ the
> implementation of STL classes

That's not true.  We already have code to detect stlport, libstdc++ and libc++ <http://mxr.mozilla.org/mozilla-central/source/mfbt/Compiler.h#97>.  For Dinkumware, we can use _CPPLIB_VER.  I don't think we need to worry about the specific versions of these libraries.

(Note that I'm not asking Jed to do all of that work in this bug.  In general I think if std::foo is non-memmovable is one of these libraries, it should be considered as non-memmovable in all since most of our code is platform independent, and we don't run the static analysis checks in all builds.  IOW, what he has here is fine.)
Thanks for the review.  A few comments:

(In reply to Michael Layzell [:mystor] from comment #5)
> @@ +269,3 @@
> >    bool hasLiteralAnnotation(QualType T) const;
> > +  // Allow subclasses to apply annotations to external code:
> > +  virtual bool hasFakeAnnotation(const CXXRecordDecl *D) { return false; }
> 
> I'd generalize this to use TagDecl *D rather than CXXRecordDecl *D. That way
> we also cover things like Enums in the future.

Is that safe?  The other use of TagDecl is guarded by #if CLANG_VERSION_FULL >= 306.

> @@ +288,5 @@
> > +  MemMoveAnnotation()
> > +      : CustomTypeAnnotation("moz_non_memmovable", "non-memmove()able") {}
> > +protected:
> > +  bool hasFakeAnnotation(const CXXRecordDecl *D) override {
> > +    // ...is there an easier way to test if this is `std::basic_string`?
> 
> I _think_ you can do `D->getQualifiedNameAsString() ==
> "::std::basic_string"`, but I haven't tested it. I'd `printf("%s\n",
> D->getQualifiedNameAsString().c_str());` to check what is dumped out, and
> then figure out what check has to happen here.

The other thing I was worried about there was efficiency, given that this will be called on every type — especially if I make this part of hasLiteralAnnotation, which currently isn't cached (and is called before checking the cache).  The current version is kind of verbose, but the comments on the LLVM interfaces suggested it should be relatively fast.
And there's one more std:: thing I forgot about: one of the uses of std::vector in the graphics code.  I don't know if any std::vector implementations are known to be non-memmove()-safe, but we normally prefer nsTArray anyway, so I'll see if the graphics people agree.
(In reply to Jed Davis [:jld] from comment #17)
> And there's one more std:: thing I forgot about: one of the uses of
> std::vector in the graphics code.  I don't know if any std::vector
> implementations are known to be non-memmove()-safe....

Well, if you build with debug libraries (not just "DEBUG") under VS, the compiler adds self-pointers to std::vector objects, and to some other std::thingies that I forget right now.
(In reply to Jed Davis [:jld] from comment #16)
> Thanks for the review.  A few comments:
> 
> (In reply to Michael Layzell [:mystor] from comment #5)
> > @@ +269,3 @@
> > >    bool hasLiteralAnnotation(QualType T) const;
> > > +  // Allow subclasses to apply annotations to external code:
> > > +  virtual bool hasFakeAnnotation(const CXXRecordDecl *D) { return false; }
> > 
> > I'd generalize this to use TagDecl *D rather than CXXRecordDecl *D. That way
> > we also cover things like Enums in the future.
> 
> Is that safe?  The other use of TagDecl is guarded by #if CLANG_VERSION_FULL
> >= 306.

The TagDecl getter is unfortunately from a more recent version of clang than our minimum supported version, which is why we guard it, such that the plugin compiles with an older version. 
TBH, I should polyfill the TagDecl getter for older versions of clang, so that we can have consistent behavior across versions.

> > @@ +288,5 @@
> > > +  MemMoveAnnotation()
> > > +      : CustomTypeAnnotation("moz_non_memmovable", "non-memmove()able") {}
> > > +protected:
> > > +  bool hasFakeAnnotation(const CXXRecordDecl *D) override {
> > > +    // ...is there an easier way to test if this is `std::basic_string`?
> > 
> > I _think_ you can do `D->getQualifiedNameAsString() ==
> > "::std::basic_string"`, but I haven't tested it. I'd `printf("%s\n",
> > D->getQualifiedNameAsString().c_str());` to check what is dumped out, and
> > then figure out what check has to happen here.
> 
> The other thing I was worried about there was efficiency, given that this
> will be called on every type — especially if I make this part of
> hasLiteralAnnotation, which currently isn't cached (and is called before
> checking the cache).  The current version is kind of verbose, but the
> comments on the LLVM interfaces suggested it should be relatively fast.

From what I have found, you can do some pretty expensive things in the clang plugin and not really notice the performance hit. Doing a bunch of string comparisons is probably a drop in the bucket. That being said, getQualifiedNameAsString takes a lot more work than getName() (which returns a StringRef), so we could just getName and check for "basic_string", and then if it matches, do the full getQualifiedNameAsString to make sure that it's in std:: (which it almost certainly is - I don't think we have many basic_strings in non-std:: namespaces flying around).

(In reply to q1 from comment #18)
> (In reply to Jed Davis [:jld] from comment #17)
> > And there's one more std:: thing I forgot about: one of the uses of
> > std::vector in the graphics code.  I don't know if any std::vector
> > implementations are known to be non-memmove()-safe....
> 
> Well, if you build with debug libraries (not just "DEBUG") under VS, the
> compiler adds self-pointers to std::vector objects, and to some other
> std::thingies that I forget right now.

The fact that some implementations do this makes me want to be super-safe and consider everything std:: non-memmovable unless it's on a whitelist, but that may be overzealous.
(In reply to Michael Layzell [:mystor] from comment #19)
...
> > Well, if you build with debug libraries (not just "DEBUG") under VS, the
> > compiler adds self-pointers to std::vector objects, and to some other
> > std::thingies that I forget right now.
> 
> The fact that some implementations do this makes me want to be super-safe
> and consider everything std:: non-memmovable unless it's on a whitelist, but
> that may be overzealous.

I'd tend towards super-safe as well. I do wonder what the real-world performance impact of whitelisting would be, but I don't know whether Mozilla has a comprehensive-enough performance-test suite to estimate that, or even whether such a suite is practical to assemble.
With a whitelist the name handling is a little different — I need to know whether identifiers that *don't* have certain names are in ::std, so the optimization suggested in comment #19 doesn't help much.  But also, the code I have now is potentially a little wrong, because it will miss anything under ::std that's in a nested namespace / class / etc.  Which brings us back to Ehsan's advice from comment #4 and getDeclarationNamespace, which gets the outermost namespace.
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b547e4ac39b2 (on top of bug 1213491)

The exceptions are std::{pair,atomic,__atomic_base}.  (I can reformat the whitelist in the code vertically like it was before, but clang-format wanted it horizontal.)
Attachment #8658926 - Attachment is obsolete: true
Attachment #8673393 - Flags: review?(michael)
Attachment #8673393 - Flags: review?(ehsan)
Comment on attachment 8673393 [details] [diff] [review]
Patch: consider all of ::std nonmovable with a few exceptions

Review of attachment 8673393 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, though I think hasLiteralAnnotation and directAnnotationReason should still be private.

::: build/clang-plugin/clang-plugin.cpp
@@ -268,5 @@
>        dumpAnnotationReason(Diag, T, Loc);
>      }
>    }
>  
> -private:

Why are hasLiteralAnnotation and directAnnotationReason public now?

@@ +304,5 @@
> +    // Annotate everything in ::std, with a few exceptions; see bug
> +    // 1201314 for discussion.
> +    if (getDeclarationNamespace(D) == "std") {
> +      // This doesn't check that it's really ::std::pair and not
> +      // ::std::something_else::pair, but should be good enough.

I have an idea about how to fix this cheaply (basically getDeclarationNamespace would return a NamespaceDecl* instead of a std::string, and then we could compare getDeclarationNamespace's result to getDeclContext()->getEnclosingNamespaceContext() in the if statement below), but this is good enough, and we can fix it in a follow up.
Attachment #8673393 - Flags: review?(michael) → review+
(In reply to Michael Layzell [:mystor] from comment #23)
> Comment on attachment 8673393 [details] [diff] [review]
> > -private:
> 
> Why are hasLiteralAnnotation and directAnnotationReason public now?

That was an editing mistake; thanks.

> @@ +304,5 @@
> > +    // Annotate everything in ::std, with a few exceptions; see bug
> > +    // 1201314 for discussion.
> > +    if (getDeclarationNamespace(D) == "std") {
> > +      // This doesn't check that it's really ::std::pair and not
> > +      // ::std::something_else::pair, but should be good enough.
> 
> I have an idea about how to fix this cheaply (basically
> getDeclarationNamespace would return a NamespaceDecl* instead of a
> std::string, and then we could compare getDeclarationNamespace's result to
> getDeclContext()->getEnclosingNamespaceContext() in the if statement below),
> but this is good enough, and we can fix it in a follow up.

Fixing it as a followup sounds better.
Attachment #8673393 - Flags: review?(ehsan) → review+
https://hg.mozilla.org/mozilla-central/rev/19870896ecce
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Depends on: 1325694
Product: Core → Firefox Build System
Product: Firefox Build System → Developer Infrastructure
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: