Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1 +1,9 @@
| second.cpp:13:19:13:19 | s | This format specifier for type 'size_t' does not match the argument type '..(*)(..)'. |
| second.cpp:14:19:14:19 | s | This format specifier for type 'size_t' does not match the argument type '..(*)(..)'. |
| second.cpp:15:18:15:18 | s | This format specifier for type 'int' does not match the argument type '..(*)(..)'. |
| second.cpp:16:19:16:19 | s | This format specifier for type 'long' does not match the argument type '..(*)(..)'. |
| second.cpp:17:20:17:20 | s | This format specifier for type 'long long' does not match the argument type '..(*)(..)'. |
| second.cpp:18:18:18:18 | s | This format specifier for type 'unsigned int' does not match the argument type '..(*)(..)'. |
| second.cpp:26:18:26:39 | ... - ... | This format specifier for type 'int' does not match the argument type 'long'. |
| second.cpp:29:18:29:39 | ... - ... | This format specifier for type 'unsigned int' does not match the argument type 'long'. |
| tests.c:7:18:7:18 | 1 | This format specifier for type 'char *' does not match the argument type 'int'. |
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@

// defines type size_t plausibly
typedef unsigned long size_t;
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// semmle-extractor-options: --expect_errors

int printf(const char * format, ...);

// defines type `myFunctionPointerType`, referencing `size_t`
typedef size_t (*myFunctionPointerType) ();

void test_size_t() {
size_t s = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a comment on what our frontend thinks size_t is at this point?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two size_t types in the database, both are typedefs but only one is a typedef to an integral type as we would expect. I think this fits the theory that the definition of myFunctionPointerType is somehow creating an alternative size_t typedef type.

But which type does the variable "s" have? ... it looks like there is no Variable "s" extracted in the database. I only see variables "buffer" and "format" in this file. Thus, there are also no VariableAccesses to "s" in the printf calls, I see FunctionAccesses instead, understandably with function types. This seems to be the direct cause of the spurious results I'm seeing.

If I take the definition of myFunctionPointerType away, the variable "s" then exists with an error type, and it looks like we then don't extract the access to "s" at all (which means we don't get erroneous query results).

So I think we have two paths available to fixing this:

  1. fix the extraction of myFunctionPointerType so that it doesn't spuriously generate a size_t typedef, perhaps by simply erroring on this line. This approach has the upside of potentially resolving other related issues.
  2. patch the query to ignore arguments that are accesses to functions with an erroneous type (or just: that are function accesses). This approach has the advantage of being simple and immediate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But which type does the variable "s" have? ... it looks like there is no Variable "s" extracted in the database.

That's not quite true. Because the have nested function support, there is - somewhat unexpectedly - a function s instead. We might get some better defined behavior is we turn that support off. Unfortunately there is no easy way to do this in tests (except for passing the extractor option that tells the extractor to mimic the MSFT compiler). So for now you're getting FunctionAccess as the second argument of the printf call.


printf("%zd", s); // GOOD
printf("%zi", s); // GOOD
printf("%zu", s); // GOOD [FALSE POSITIVE]
printf("%zx", s); // GOOD [FALSE POSITIVE]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is marked as GOOD because we tolerate conversions to unsigned?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I've clarified it now.

printf("%d", s); // BAD
printf("%ld", s); // BAD
printf("%lld", s); // BAD
printf("%u", s); // BAD

char buffer[1024];

printf("%zd", &buffer[1023] - buffer); // GOOD
printf("%zi", &buffer[1023] - buffer); // GOOD
printf("%zu", &buffer[1023] - buffer); // GOOD
printf("%zx", &buffer[1023] - buffer); // GOOD
printf("%d", &buffer[1023] - buffer); // BAD
printf("%ld", &buffer[1023] - buffer); // BAD [NOT DETECTED]
printf("%lld", &buffer[1023] - buffer); // BAD [NOT DETECTED]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these bad?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the pointer type is ssize_t not long int, and hose types may or may-not have the same size depending on platform. Though ... in the interests of filtering out results considered noisy, we've been increasingly lenient on this kind of thing in this query. The user may know more about the platform than we do.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the pointer type is ssize_t not long int, and hose types may or may-not have the same size depending on platform.

Ok. I think I misunderstand why we're running into problems with the query. There seem to be at least three reasons going by the tests you're adding here: size_t not defined, size_t defined elsewhere, some other reason related to pointers that I don't yet understand. Is that correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary issue is the results where the query claims the argument has a function type '..(*)(..)', when it doesn't. These messages are incorrect and confusing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect at least one of these to be "GOOD" given that we mostly support 32-bit and 64-bit platforms, and as ssize_t is usually similarly typedef'ed as size_t. Yet both are marked as "BAD". Why is that reasonable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its bad because %d, %ld, %lld and %u all make assumptions about the pointer size, whereas %z expects the actual pointer size on any platform. Signedness has nothing to do with what's wrong here, though it is different from the earlier test cases. Also at least one of the four cases is likely to work on any particular platform (because the bit size happens to match), but that's not robust.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work in the case of a database without parse errors? Would both be flagged as bad?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they should be, except that we have several pieces of logic in the query designed to be permissive towards things that happen to be the same size - to reduce unwanted (if technically accurate) results. People do a lot of strange things with typedefs and conditional compilation for example that mean they may know more about what sizes a type could be than we do.

You know what, I'm changing these two labels from BAD to DUBIOUS.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what, I'm changing these two labels from BAD to DUBIOUS.

👍

printf("%u", &buffer[1023] - buffer); // BAD
}