Skip to content

[Test] C-API#4

Merged
amomchilov merged 118 commits into
masterfrom
c-api
May 2, 2025
Merged

[Test] C-API#4
amomchilov merged 118 commits into
masterfrom
c-api

Conversation

@st0012

@st0012 st0012 commented Feb 24, 2025

Copy link
Copy Markdown
Member

RBS C Parser Library Refactoring

This PR refactors the RBS parser and related components into a standalone C library that no longer depends on the Ruby runtime. This architectural change enables direct integration with static analysis tools like Sorbet while potentially improving performance.

Sorbet's RBS support already runs on this new architecture and we haven't discovered any major issues around it.

Key Improvements

  • Ruby-Independent Implementation: Extracted parser from ext folder into a standalone C library with a clean API, which can now be embedded in non-Ruby tools without Ruby runtime dependency (e.g. Sorbet, JRuby)
  • Enhanced Memory Management: Implemented arena allocator to efficiently manage parser object lifecycles
  • Improved Architecture: Clear separation between public API (headers) and implementation
  • Performance: Potential performance gains from custom memory management and reduced overhead

Enhanced Memory Management

Arena allocator handles all memory for parser objects, including parser itself, lexer, constant pool, strings...etc. When the parser is freed by calling rbs_parser_free, the allocator will free all the objects it allocated. This eliminates the need to manually free individual objects and reduces the risk of memory leaks.

Component Architecture

graph TD
    RubyClient[Ruby Client] --> RubyAPI[Ruby API]
    CClient[C Client] --> CAPI[C API]

    RubyAPI --> CExtension[C Extension]
    CExtension --> CLibrary
    CAPI --> CLibrary

    subgraph CLibrary[C Library]
        subgraph Parser1[Parser Instance 1]
            direction TB
            ConstantPool1[Constant Pool]
            Lexer1[Lexer]
            ArenaAllocator1[Arena Allocator]
        end

        subgraph Parser2[Parser Instance 2]
            direction TB
            ConstantPool2[Constant Pool]
            Lexer2[Lexer]
            ArenaAllocator2[Arena Allocator]
        end
    end

    subgraph "Public API"
        RubyAPI
        CAPI
    end

    %% Parser1 --> ConstantPool1
    %% Parser1 --> Lexer1
    %% Parser1 --> ArenaAllocator1
    %% Parser2 --> ConstantPool2
    %% Parser2 --> Lexer2
    %% Parser2 --> ArenaAllocator2
Loading

@st0012

st0012 commented Feb 25, 2025

Copy link
Copy Markdown
Member Author

I'll leave this PR open because ruby/rbs only runs CI on pull requests, so we need this PR to have CI constantly running against changes in c-api.

amomchilov and others added 27 commits March 13, 2025 20:55
Initial template for C structs

Use allocator in node constructors
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Add linked list implementation

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type `Class#super_class` field

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type fields of `RBS::Types::Block`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type `block` fields

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type `RBS::Types::Proc#self_type` field

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Refactor `parse_function`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Copy value in `rbs_struct_to_ruby_value`

Remove usages of `rbs_loc` from `parser.c`

Extract `rbs_location.h`

Migrate `RBS::Types::Function::Param` fields

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type `RBS::Types::UntypedFunction` fields

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type fields of `RBS::AST::TypeParam`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type some more fields of `RBS::AST::Members::Attr`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type fields in `RBS::AST::Members::MethodDefinition`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type `RBS::AST::Directives::Use::SingleClause#new_name`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type `RBS::Namespace#absolute`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Temporary handle nil types

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Handle `bool` type

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Type all fields of `RBS::Types::Variable`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Migrate `RBS::TypeName`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Migrate `parse_use_clauses`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Migrate `class_instance_name`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Handle overloads as a rbs_node_list

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Remove more `builds_ruby_object_internally` flags

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Invert `builds_ruby_object_internally` default value

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Introduce `rbs_location_t`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Store C structs instead of Ruby `VALUE`s

Introduce +rbs_ast_symbol_t and migrate to it

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Remove ZzzTmpNotImplemented node

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Remove one more instance of EMPTY_ARRAY

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Migrate from VALUE array to rbs_node_list_t

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Migrate `method_params` from taking a VALUE arrays

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Migrate `parse_type_list` from taking a VALUE array

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Forward all C-typed params as-is

Get types on constructor params

Handle mix of C types and Ruby VALUE

Move Ruby object construction into `new` functions

Conditionally construct `ruby_value` internally

Type Attr* field `ivar_name`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Add `AST::Bool`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Use two less VALUE values

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Use more instance of `bool`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Add Hash implementation

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Use C hash for `check_key_duplication`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Use C hash to represent Record fields

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Migrate `memo` to using a C hash

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Uses C hashes for keyword parameters

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Remove parser call to `todo!`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Remove calls to `rbs_struct_to_ruby_value`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

TMP symbol

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Replace 2 fake nodes by one

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Set fields for `Record::FieldType`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Make comment use a `rbs_ast_comment_t` instead of a `VALUE`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Add `rbs_ast_string_t`

Add `rbs_ast_integer_t`

Migrate `literal` to store C nodes

Remove `cached_ruby_string`

Remove useless templating stuff

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Remove `cached_ruby_value` from `rbs_node_list`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Remove `cached_ruby_value` from `rbs_hash`

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Add `rbs_string`, and use it for annotations

Add `rbs_ast_symbol_t` to model symbols in the AST

Co-Authored-By: Alexander Momchilov <alexander.momchilov@shopify.com>
And rename it to `class_constants` to disambiguate it from `rbs_constant_id`, `rbs_constant_pool`, etc.
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>

Do not create comments using a VALUE

Use a rbs_string instead

Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
Signed-off-by: Alexandre Terrasa <alexandre.terrasa@shopify.com>
amomchilov and others added 26 commits March 23, 2025 21:52
… some cleanups (#26)

Part of Shopify/team-ruby-dx#1436

In addition to merging the files, I also did some cleanups like:
- Remove unused function declaration
- Make some helper functions that aren't referenced outside of C library
AND doesn't look like public API as static
- A leftover in #25 

I recommend reviewing by commits.
- `parserstate` -> `rbs_parser_t`
    - Variable names `state` -> `parser`
- `comment` -> `rbs_comment_t`
- `error` -> `rbs_error_t`
These names are too generic and can easily conflict in projects. New
names are:

- `rbs_position_t`
- `rbs_range_t`
- `rbs_token_t`
… files (#32)

Currently, camel case names in `config.yml`, like `MethodDefinition`,
will be generated as `methoddefinition`, which is not ideal. So this PR
updates `template.rb` to make sure they're generated as
`method_definition` instead.

I also changed `rbs_typename` to `rbs_type_name` as I think it's also
not ideal.

NO Ruby files are touched so it shouldn't be breaking for end users.
Assuming the top-level `include/rbs` holds public components (e.g.
`parser.h`), and `include/rbs/util` holds internal components, more
stuff should be moved under `util`.

This follows the same convention `prism` uses.

### Before

```
include
├── rbs
│   ├── ast.h
│   ├── defines.h
│   ├── encoding.h
│   ├── lexer.h
│   ├── parser.h
│   ├── rbs_buffer.h
│   ├── rbs_encoding.h
│   ├── rbs_location.h
│   ├── rbs_location_internals.h
│   ├── rbs_string.h
│   ├── rbs_strncasecmp.h
│   ├── rbs_unescape.h
│   └── util
│       ├── rbs_allocator.h
│       ├── rbs_assert.h
│       └── rbs_constant_pool.h
└── rbs.h
```

### After

```
include
├── rbs
│   ├── ast.h
│   ├── defines.h
│   ├── lexer.h
│   ├── parser.h
│   ├── rbs_location.h
│   ├── rbs_string.h
│   └── util
│       ├── rbs_allocator.h
│       ├── rbs_assert.h
│       ├── rbs_buffer.h
│       ├── rbs_constant_pool.h
│       ├── rbs_encoding.h
│       └── rbs_unescape.h
└── rbs.h
```
…g.*` (#34)

This aligns with the naming convention of Prism:

- Private headers are prefixed with `rbs_` and placed under
`include/rbs/util/`.
- Public headers are placed under `include/rbs/` without the `rbs_`
prefix.
1. `lexerstate` should be `rbs_lexer_t` instead
2. Params representing `rbs_lexer_t` should be named `lexer` instead of
`state`
3. `rbsparser_next_token` should be called `rbs_lexer_next_token`

After this PR, all structs & functions we want to expose should have
`rbs_` prefix.
It was only used in `assert` macros, so it only needs to be defined in
debug mode. But since `rbs_assert` is a function, `is_power_of_two`
always needs to be defined.

Without this change, Sorbet fails to build with `c-api`.
Now we can change the macro (back) to &parser->allocator if need be.
This has the handy effect of making allocations nearly free while
unfortunately having the side effect of crashing your process if you
write more than the arena size. However, if you are allocating more than
4 GiB, you likely have other problems.
`rbs_node_destroy`, `rbs_hash_free`, `rbs_node_list_free` are only
calling each other recursively without any real freeing logic.

This is the result of previous efforts to allocate all nodes on the
arena. So we don't need these functions anymore.

Discovered while working on #41
Co-authored-by: Alexander Momchilov <amomchilov@users.noreply.github.com>
Co-authored-by: Alexander Momchilov <amomchilov@users.noreply.github.com>
@amomchilov amomchilov merged commit 4ecd51a into master May 2, 2025
@amomchilov amomchilov deleted the c-api branch July 2, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants