Skip to content

gccrs: Implement C-style string literals#4565

Open
Polygonalr wants to merge 4 commits into
Rust-GCC:masterfrom
Polygonalr:c-str
Open

gccrs: Implement C-style string literals#4565
Polygonalr wants to merge 4 commits into
Rust-GCC:masterfrom
Polygonalr:c-str

Conversation

@Polygonalr

@Polygonalr Polygonalr commented May 31, 2026

Copy link
Copy Markdown
Contributor

This is not part of Rust 1.49, but it's nice to backport it for now, since Rust for Linux uses C-style string literals a bit.

Current behaviour is similar to normal string literals, except they are type-checked as slice types instead of array, and the compiled fat pointer has +1 to size to include the null terminator.

@Polygonalr Polygonalr force-pushed the c-str branch 6 times, most recently from d3a37b6 to 9d8970b Compare June 2, 2026 13:12
@Polygonalr Polygonalr changed the title gccrs: Implement C-style strings gccrs: Implement C-style string literals Jun 2, 2026

@CohenArthur CohenArthur left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is really good, but I'd like to have a think about that resulting &CStr type vs [u8] type before we merge it. Can you please check that as soon as C-style string literals were added, CStr became a lang item, in which case we should be able to refer to it within the typechecker and use that type, and if so, we have to figure out a way to get to the CStr type with a 1.49 libcore which does not have the CStr as a lang item yet

Comment thread gcc/rust/backend/rust-compile-expr.cc
Comment thread gcc/rust/lex/rust-lex.cc
Comment on lines +1066 to +1068
// C-style strings
else if (current_char == 'c' && peek_input () == '"')
return parse_c_string (loc);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in a later PR you can also add C-style raw strings the same way, but I don't know if the kernel makes use of those

Comment thread gcc/rust/lex/rust-lex.cc
Comment thread gcc/rust/parse/rust-parse-impl-expr.hxx Outdated
@Polygonalr Polygonalr marked this pull request as draft June 3, 2026 01:08
@dkm

dkm commented Jun 7, 2026

Copy link
Copy Markdown
Member

FWIW, the changelog is not correct. You need to follow the * file.ext (item): What changed pattern.
Please do not simply point at the file :) See for example:

gcc/rust/ChangeLog:
	* lex/rust-token.h: Add C_STRING_LITERAL and RAW_C_STRING_LITERAL (unused
	for now).
	* lex/rust-lex.h: Define new parse_c_string function.
	* lex/rust-lex.cc: Implement lexing for C-style string literals.
	* hir/tree/rust-hir-literal.h: Define new C_STRING LitType.

Can you do some cleanup there? Thanks!

@Polygonalr Polygonalr force-pushed the c-str branch 4 times, most recently from 43cf3e8 to 5098778 Compare June 8, 2026 01:20
@Polygonalr

Copy link
Copy Markdown
Contributor Author

FWIW, the changelog is not correct. You need to follow the * file.ext (item): What changed pattern. Please do not simply point at the file :) See for example:

gcc/rust/ChangeLog:
	* lex/rust-token.h: Add C_STRING_LITERAL and RAW_C_STRING_LITERAL (unused
	for now).
	* lex/rust-lex.h: Define new parse_c_string function.
	* lex/rust-lex.cc: Implement lexing for C-style string literals.
	* hir/tree/rust-hir-literal.h: Define new C_STRING LitType.

Can you do some cleanup there? Thanks!

@dkm Just did some commit message clean-up, apologies for not being specific enough in my commit messages 😓, I'll do my best not to repeat this in the future.

@Polygonalr Polygonalr marked this pull request as ready for review June 8, 2026 01:25

@CohenArthur CohenArthur left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @Polygonalr :) well done

Comment thread gcc/rust/util/rust-lang-item.cc
Comment thread gcc/rust/util/rust-lang-item.h
Comment on lines +1925 to +1926
// +1 for null terminator, unlike Rust string literals.
tree size = build_int_cstu (type, literal_value.as_string ().size () + 1);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but don't we also need to add an extra zero byte at the end?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it taken care of by the backend already?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++11 strings already come with a null terminator, so I don't think we need to handle it.

@CohenArthur CohenArthur added the rust-for-linux Issue related to the compilation of the Linux kernel and its crates label Jun 11, 2026
@Polygonalr Polygonalr force-pushed the c-str branch 4 times, most recently from e539e9e to 8e33e90 Compare June 12, 2026 06:37

@CohenArthur CohenArthur left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry for the nitpick :)

Comment thread gcc/rust/backend/rust-compile-type.cc Outdated

@CohenArthur CohenArthur left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an execute test to ensure that the last character in the CStr is indeed a zero byte. Something like this:

fn main() -> i32 {
   let a = c"something";
   
   last_byte(a) as i32
}

@Polygonalr Polygonalr force-pushed the c-str branch 2 times, most recently from 896c8f8 to 6b77f3f Compare June 18, 2026 15:20
Current behaviour is similar to normal string literals, except they are type-checked
as slice types instead of array, and the compiled fat pointer has +1 to size to
include the null terminator.

gcc/rust/ChangeLog:
	* lex/rust-token.h (RS_TOKEN_LIST): Add C_STRING_LITERAL and RAW_C_STRING_LITERAL
	(unused for now).
	* lex/rust-lex.h (Lexer): Define new parse_c_string function.
	* lex/rust-lex.cc (Lexer::build_token): Implement lexing for C-style string
	literals.
	(Lexer::parse_c_string): Add implementation.
	* hir/tree/rust-hir-literal.h (HIR::Literal): Define new C_STRING LitType.
	* hir/rust-ast-lower-base.cc (ASTLoweringBase::lower_literal): Add new case for
	C_STRING.
	* parse/rust-parse-impl-attribute.hxx (Parser<ManagedTokenSource>::parse_attr_input):
	Add new case for C_STRING.
	* parse/rust-parse-impl-expr.hxx (Parser<ManagedTokenSource>::parse_literal_expr):
	Add new case for parsing C_STRING.
	(Parser<ManagedTokenSource>::null_denotation_not_path): Add new case for C_STRING.
	* ast/rust-ast.h (Token::is_string_lit): Add new case for C_STRING_LITERAL.
	* ast/rust-ast.cc (AttributeParser::parse_meta_item_inner): Add new case for
	C_STRING_LITERAL.
	* ast/rust-ast-collector.cc (TokenCollector::visit): Add new case for
	C_STRING_LITERAL.
	* typecheck/rust-hir-type-check-base.cc (TypeCheckBase::resolve_literal):
	Implement type checking for the new C_STRING type.
	* backend/rust-compile-expr.h (CompileExpr): Define new function
	compile_c_string_literal.
	* backend/rust-compile-expr.cc (CompileExpr::visit(LiteralExpr)): Add new case
	for C_STRING.
	(CompileExpr::compile_c_string_literal): Implement compilation of C-style string
	literals.

Signed-off-by: Yap Zhi Heng <yapzhhg@gmail.com>
…iterals

gcc/rust/ChangeLog:
	* lang.opt: Add new -frust-c-style-string-literals option.
	* parse/rust-parse.h: Import options.h for reading flag_c_style_string_literals.
	* parse/rust-parse-impl-expr.hxx (Parser<ManagedTokenSource>::parse_literal_expr):
	Abort parsing C-style string literals if flag_c_style_string_literals is not set.
	(Parser<ManagedTokenSource>::null_denotation_not_path): Ditto.

gcc/testsuite/ChangeLog:
	* rust/execute/torture/c_string.rs: Set -frust-c-style-string-literals.
	* rust/compile/c_string_null_byte_check.rs: Set -frust-c-style-string-literals.

Signed-off-by: Yap Zhi Heng <yapzhhg@gmail.com>
gcc/rust/ChangeLog:
	* util/rust-lang-item.h (LangItem::Kind): New CSTR kind.
	* util/rust-lang-item.cc (LangItem::lang_items): Ditto.
	* backend/rust-compile-type.cc (visit(TyTy::ReferenceType)): Add specific record
	type for CStr.
	* typecheck/rust-tyty.h (ReferenceType): Add function definition for is_dyn_cstr_type.
	* typecheck/rust-tyty.cc (ReferenceType::is_dyn_object): Update to include
	is_dyn_cstr_type.
	(ReferenceType::is_dyn_cstr_type): Add implementation to check whether the
	ReferenceType is of type CStr.
	* typecheck/rust-hir-type-check-base.cc(resolve_literal): Update C_STRING case to
	resolve to the CStr language item instead.

gcc/testsuite/ChangeLog:
	* rust/execute/torture/c_string.rs: Add CStr language item definition.
	* rust/compile/c_string_null_byte_check.rs: Ditto.
	* rust/execute/torture/c_string_ensure_null_term.rs: New test.

Signed-Off-By: Yap Zhi Heng <yapzhhg@gmail.com>
@Polygonalr Polygonalr marked this pull request as draft June 22, 2026 13:20
gcc/rust/ChangeLog:

	* backend/rust-compile-expr.cc (CompileExpr::visit (FieldAccessExpr)):
	Handle DST fat pointers by reinterpreting the fat pointer as its field type.

Signed-Off-By: Yap Zhi Heng <yapzhhg@gmail.com>
@Polygonalr Polygonalr marked this pull request as ready for review June 23, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust-for-linux Issue related to the compilation of the Linux kernel and its crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants