Ruby 3.2
- Released at: Dec 25, 2022 (NEWS.md file)
- Status (as of Dec 25, 2023): 3.2.2 is current stable
- This document first published: Feb 4, 2022
- Last change to this document: Dec 25, 2023
🇺🇦 🇺🇦 Before you start reading the changelog: A full-scale Russian invasion into my home country continues. The only reason I am alive and able to work on the changelog is Armed Force of Ukraine, and international support with weaponry, funds and information. I am in my home city Kharkiv, preparing to join the army. Please care to read two of my appeals to Ruby community before proceeding: first, second.
We need all support we can get to push inviders out and to bring peace to our land. Please spread information, lobby our cause and donate.🇺🇦 🇺🇦
The post dedicated to my war year and work on Ruby 3.2 is published on Feb 7.
Note: As already explained in Introduction, this site is dedicated to changes in the language, not the implementation, therefore the list below lacks mentions of lots of important optimization introduced in 3.2, including YJIT improvements and object shapes. That’s not because they are not important, just because this site’s goals are different.
In preparation of this entry, Cookpad’s article on notable changes in Ruby 3.2 written by core developers Koichi Sasada (ko1) and Yusuke Endoh (mame) provided invaluable insight. Thank you!
Highlights
- Anonymous method argument passing
- More inspectable refinements
Data
class- Support for pattern-matching in
Time
andMatchData
Set
is a built-in class- Per-
Fiber
storage RubyVM::AbstractSyntaxTree
: fault-tolerant and token-level parsing
Language changes
Anonymous arguments passing improvements
If the method declaration includes anonymous positional or keyword arguments (*
or **
without associated name), those arguments can now be passed to the next method with some_method(*)
or some_method(**)
syntax.
- Reason: While it can be considered too cryptic shorcut by some, the new feature is consistent with passing of all arguments with
...
. - Discussion: Feature #18351
- Documentation: Methods: Array/Hash Argument
- Code:
def only_keywords(**) # accept keyword arguments p(**) # and pass them to the next method end def only_positional(*) # accept positional arguments p(*) # and pass them to the next method end def both(*, **) # effectively the same as ... p(*, **) end only_keywords(a: 1, b: 2) # prints "{:a=>1, :b=>2}" only_positional(1, 2, 3) # prints # 1 # 2 # 3 both(1, 2, 3, a: :b) # prints # 1 # 2 # 3 # {:a=>:b} # Realistic usage: a small wrapper method, just "fall through" to the next one def get(url, **) = send_request(:get, url, **) # Named and unnamed could be freely mixed: def mixed_naming(*a, **) p a p(**) end mixed_naming(1, 2, 3, a: :b) # prints: # [1, 2, 3] # {:a=>:b} # But using anonymous forwarding with named arguments is an error def forward(*args) = p(*) # no anonymous rest parameter (SyntaxError) # Interestingly enough, not only calling methods, but also "repacking" values # into variables work: def repack(*, **) x = * # this is syntax error x = [*] # but this will work and put [1, 2] in x x, y = [*] # and this will work and put 1 in x and 2 in y z = {**} # this will put {a: :b} into z end repack(1, 2, a: :b) # While the latter example might seem just a curiosity, it could help with # quick debugging of path-through code. # Imagine the `get` method above fails in one particular case. # We can adjust it this way, temporarily: def get(url, **) binding.irb if {**}.dig(:headers, :content_type) == 'application/json' send_request(:get, url, **) end # Parentheses are important for correct parsing. # Imagine this: def test(*) # this, depending on further code, will be either a SyntaxError, # or interpreted as call_something() * next_statement call_something * # ... end # Procs don't support anonymous arguments: proc { |*| p(*) } # Somewhat confusingly, this definitions raises: # no anonymous rest parameter (SyntaxError) # ...meaning surrounding method doesn't have them. # And if it does, they would be used, not proc's arguments: def test(*) proc { |*| p(*) }.call(1) end test(2) # prints 2 -- even inside proc, method's arguments are used for forwarding
- Note: Whether anonymous arguments should be supported in procs is discussed.
Constant assignment evaluation order changed
For a long time, statements like module_expression::CONST_NAME = value_expression
first evaluated value_expression
and then module_expression
. This was changed to calculate module_expression
first.
- Reason: Just making it consistent with other assignment expressions, which tend to calculate the left part before the right.
- Discussion: Bug #15928
- Documentation: —
- Code:
# synthetic demonstrational example: def make_a_class puts "Making a class" Class.new end make_a_class::CONST = 42.tap { puts "Calculating the value" } # Prints: # In Ruby 3.1: # Calculating the value # Making a class # In Ruby 3.2: # Making a class # Calculating the value # Even simpler: NonExistentModule::CONST = 42.tap { puts "Calculating the value" } # Ruby 3.1: # Prints "Calculating the value" # raises "uninitialized constant NonExistentModule" (NameError) # Ruby 3.2: # just raises "uninitialized constant NonExistentModule" (NameError)
- Note: The problem is rarely relevant, but might eventually manifest itself in complicated metaprogramming or autoloading. Or, like in the last example, some effectful value might be calculated before discovering there is nowhere to put it in. According to discussion on the tracker, the old behavior was never intentional, it was just too hard to fix.
Behavior of module reopening/redefinition with included modules changed
When some module/class name is available at the top level context from the included modules, and a new class/module is defined, previously it was considered a reopening of existing module; since Ruby 3.2, it is a creation of a new module.
- Reason: As one file’s code has no control what is included in other files (may be non-obvious to code’s author), cryptic behaviors might’ve emerged by treating included modules as reopenable on a top level.
- Discussion: Feature #18832
- Documentation: —
- Code:
require 'net/http' # ...might've happened in some of required files include Net p HTTP #=> Net::HTTP -- from included Net module # plan to define some of our app-specific HTTP services here module HTTP # ... end # Ruby 3.1: HTTP is not a module (TypeError) # Because it assumes you reopening HTTP class from included Net # The error is hard to understand and even harder to bypass # Ruby 3.2: Successfully defines a new empty module, unrelated to Net::HTTP p HTTP #=> HTTP -- now it is a new module, # and Net::HTTP is available only by fully-qualified name
Keyword argument separation leftovers
A few edge cases after big keyword argument separation were fixed:
- Erroneous autosplatting of positional arguments in procs. Discussion: Bug #18633
test = proc { |arg, **keywords| p(arg:, keywords:) } test.call(1, 2) # Prints: {:arg=>1, :keywords=>{}}, 2 is lost as an extra argument, as expected # But... test.call([1, 2]) # Ruby 3.1: prints {:arg=>1, :keywords=>{}}, extra unexpected splatting & loss of 2s # Ruby 3.2: prints {:arg=>[1, 2], :keywords=>{}}, as expected
- The methods that splat arguments were fixed to consistently treat keyword arguments according to
ruby2_keywords
tag. Discussion: Bug #18625, Bug #16466def method_with_keywords(**kw) = p kw # This should never work: method is not marked with ruby2_keywords, # so it shouldn't ever unpack positional arguments into keyword arguments. def method_with_positional(*args) = method_with_keywords(*args) method_with_positional(a: 1) # Ruby 3.1 and 3.2: behaves as expected: # wrong number of arguments (given 1, expected 0) (ArgumentError) # This should work: method is marked with ruby2_keywords, so it # is able to repack hash as keyword_args ruby2_keywords def old_method(*args) = method_with_keywords(*args) old_method(a: 1) # Ruby 3.1 and 3.2: behaves as expected: # prints {:a => 1} # This shouldn't work, but erroneously worked in 3.1: after `bad_old_method` # delegated the hash, it preserved "I am Ruby 2 keywords" through other # methods (even if they aren't marked to be compatible). ruby2_keywords def bad_old_method(*args) = method_with_positional(*args) bad_old_method(a: 1) # Ruby 3.1: prints {:a => 1} # Ruby 3.2: wrong number of arguments (given 1, expected 0) (ArgumentError)
Removals
- Constants:
- Methods:
Core classes and modules
Kernel#binding
raises if accessed not from Ruby
binding
is a method that returns “current context” (Binding
) object, allowing access to local variables, self
, evaluating code in that context, etc. The problem solved was that it was accessible from C methods, too, but as C methods call don’t push new “execution frames,” the binding returned was of the last calling Ruby method, which was useless and misleading. Since Ruby 3.2, the method raises an exceptions in such situations.
- Discussion: Bug #18487
- Documentation:
Kernel#binding
(no mention for the behavior in non-Ruby frame) - Code: To demonstrate the practical implications, the C code should be written, but to get the gist of what’s happening, we can do this:
# The callable binding object, that will "bind" itself to argument, and call it in the context binding_caller = Kernel.instance_method(:binding).method(:bind_call) binding_caller.call(nil).local_variables # [:binding_caller] -- it is performed in the current context # method just accepts block and just calls it def test1(&) local_val = 'test' yield(nil) end test1(&binding_caller).local_variables #=> [:local_val] -- we got the binding of test1, as expected def test2(&) local_val = 'test' # Expectations: we pass block further, so it will be performed # inside #map, and the binding was to "insides" of each argument [1].map(&) end # Reality: #map is method defined in C, it doesn't has its own # "context frame", test2(&binding_caller).first.local_variables # So, in Ruby 3.1: # => [:local_val] -- we still received binding of the previous method # in call chain # In Ruby 3.2: # Cannot create Binding object for non-Ruby caller (RuntimeError)
- Note:
TracePoint#binding
was also adjusted for C methods, see below.
Class and Module
Class#attached_object
For singleton classes, returns the object this class is for; otherwise, raises TypeError
.
- Reason: The “what is this singleton class around of” is useful in metaprogramming, introspection and code analysis, especially when some class methods are added via
class << self
. - Discussion: Feature #12084
- Documentation:
Class#attached_object
- Code:
String.attached_object # raises `String' is not a singleton class (TypeError) "foo".singleton_class.attached_object #=> "foo" # or class A class << self # here we are inside singleton class p attached_object #=> A end end # Usage for advanced metaprogramming: module MyCoolPlugin def self.prepended(mod) if mod.singleton_class? && mod.attached_object < Enumerable mod.include MyCoolPlugin::ServicesForEnumerable end end end class Simple class << self # prepends only simple version of MyCoolPlugin prepend MyCoolPlugin end end class MyArray < Array class << self # Prepend MyCoolPlugin, and includes MyCoolPlugin::ServicesForEnumerable prepend MyCoolPlugin end end # Usage for documentation/code analysis purposes: require 'active_support/all' Time.zone #=> nil, defined by ActiveSupport # Application-specific Time extensions class MyTime < Time end m = MyTime.method(:zone) # => #<Method: MyTime(Time).zone() /...path to implementation> # Now, if we want to path just this method to some # documentation or introspection system, it has enough information # to tell who it belongs to m.owner #=> #<Class:Time> m.receiver #=> MyTime # But before 3.2, there was no way to programmatically go from # singleton class #<Class:Time>, to the regular class Time, which # the method is defined in, from the human point of view. # Now, there is: m.owner.attached_object # => Time # ...an our documentation/introspection system can properly describe # it as a class method of Time.
- Note: The method will not work with “special” Ruby objects (
nil
,true
, andfalse
) which have theirsingleton_class
implementations redefined to return regular class:nil.singleton_class #=> NilClass, not #<Class:nil> class << nil attached_object # raises `NilClass' is not a singleton class (TypeError) end
Module#const_added
A “hook” method, called after the constant was defined in a module.
- Reason: The method was proposed as helpful for autoloader libraries (like
zeitwerk
), but it also can be useful in metaprogramming, like “store a registry of nested classes of a particular type.” Or “validate that some parent constant is redefined in a particular way.” - Discussion: Feature #17881
- Documentation:
Module#const_added
- Code:
module Test def self.const_added(name) puts "const_added: #{name} = #{const_get(name)}" end # The method is called AFTER the constant is actually # defined, so its value is already available. FOO = 1 # Prints: # const_added: FOO = 1 # Each constant override triggers a method again: FOO = 2 # Prints: # const_added: FOO = 2 # Nested class definition: class Nested puts "Class definition body" end # Prints: # const_added: Nested = Test::Nested # Class definition body # Note that const_added invoked at the BEGINNING of class being defined, # not after its body is processed. end
- Note: To understand the last example—why
const_added
was called before class definition is fully finished—you should consider that the class/module name becomes immediately known to Ruby after its definition is opened, allowing things like this:module Test class Nested # outdated way of writing def self.class_method, but it works, # because Nested is already known name here. def Nested.class_method end # or this: weird_class_method # raises nice "undefined local variable or method `weird_class_method' for Test::Nested" # ...because the name `Test::Nested` is already associated with current class end end
Module#undefined_instance_methods
Lists methods that some module or class removed explicitly with undef
.
- Reason: Honestly, I have no idea. The discussion ticket doesn’t clarify this either! I assume it is just for completeness, to make everything that Module can do with method definitions to be accessible programmatically. Or, might help with debuggin of some evil/buggy code that does undefining of wrong methods. –zverok
- Discussion: Feature #12655
- Documentation:
Module#undefined_instance_methods
- Code:
class ImmutableArray < Array undef :select!, :reject! #, ... etc end class UserArray < ImmutableArray undef :map end ImmutableArray.undefined_instance_methods #=> [:select!, :reject!] UserArray.undefined_instance_methods #=> [:map] -- only methods undefined by this module, not ancestors
Refinements
There are several new methods that improve the discovery and ability to debug for complicated code when it uses refinements. Those methods are improving answers to questions like “what methods are available in the current context and why?”, “what methods will be available if I’ll use that module” etc.
Module#refinements
Returns list of refinements the module defines.
- Discussion: Feature #12737
- Documentation:
Module#refinements
- Code:
module MathShortcuts refine Numeric do def sqrt = Math.sqrt(self) # ... end refine String do def calculate(binding) = "#{self} = #{eval(self, binding)}" end end module NoRefinements end MathShortcuts.refinements #=> [#<refinement:Numeric@MathShortcuts>, #<refinement:String@MathShortcuts>] NoRefinemens.refinements #=> [] # Introspection: what would this refinement refine?.. # The method also added in 3.2, see below MathShortcuts.refinements[0].refined_class #=> Numeric # Introspection: what methods would it add? # (false means "don't include methods defined in ancestors") MathShortcuts.refinements[0].instance_methods(false) #=> [:sqrt]
Refinement#refined_class
- Discussion: Feature #12737
- Documentation:
Refinement#refined_class
- Code: See example above that demonstrates usage of
#refined_class
together withModule#refinements
. - Follow-ups: 3.3: Renamed to
#target
, because not only classes can be refined, modules too.
Module.used_refinements
Returns instances of Refinement
used in the current context.
- Discussion: Feature #14332
- Documentation:
Module.used_refinements
- Code:
# See MathShortcuts module definition above class Calculator using MathShortcuts # Works inside refined module... p Module.used_refinements #=> [#<refinement:Numeric@MathShortcuts>, #<refinement:String@MathShortcuts>] def hypotenuse(c1, c2) # ...and inside its methods p Module.used_refinements #=> [#<refinement:Numeric@MathShortcuts>, #<refinement:String@MathShortcuts>] "(c1**2 + c2**2).sqrt".calculate(binding) end end # Use method with refinements, triggering all the debug print puts Calculator.new.hypotenuse(5, 6) #=> (c1**2 + c2**2).sqrt = 7.810249675906654 # Outside of refined class, refinements are empty p Module.used_refinements #=> []
- Notes:
- Note that
used_refinements
is a class method of aModule
, and put there just for organizational purposes, while returning refinements list of the current context. There is no way to ask arbitrary module which refinements it uses (e.g., there is noCalculator.used_refinements
). - Just as a point of interest, the method with this name was proposed short after introducing of concept of refinements and was discussed even before 2.0 release. It eventually became
Module.used_modules
introduced in 2.4: that method just returned a list of modules with refinements, enabled in the current scope viausing
. The result of this method is not very fine-grained (as one refining method can refine many objects at once, and it is impossible to inspect which exactly and what methods were added). After introduction of theRefinement
class in 3.1 it became reasonable to give easier access to particular refinements available in the current context.
- Note that
Integer#ceildiv
The integer division that always rounds up.
- Reason: There are many simple use cases like pagination (when “21 items / 10 per page” should yield “3 pages”). It seems that the method is a direct equivalent of
a.fdiv(b).ceil
, and as such, annoyingly unnecessary, butfdiv
, due to floating point imprecision, might produce surprising results in edge cases:99999999999999999.fdiv(1).ceil # => 100000000000000000 99999999999999999.ceildiv(1) # => 99999999999999999
- Discussion: Feature #18809
- Documentation:
Integer#ceildiv
- Code:
9.ceildiv(3) #=> 3 10.ceildiv(3) #=> 4 -10.ceildiv(3) #=> -3 -- always rounds up, regardless of the sign # If the divisor is not integer, the result is equivalent to dividing by divisor.round 10.ceildiv(2.1) #=> 5 -- like 10.ceildiv(2) 10.ceildiv(2.6) #=> 4 -- like 10.ceildiv(3)
- Note: Unlike most of other operations,
#ceildiv
ignores numeric coercion protocols:class StringNumber def initialize(val) = @val = val.to_s def coerce(other) = [other, @val.to_i] end 10 / StringNumber.new('3') #=> 3, argument is first converted with #coerce if possible 10.fdiv StringNumber.new('3') #=> 3.3333333333333335, same 10.ceildiv StringNumber.new('3') # ArgumentError
It is already fixed in the current master branch and will behave as expected at Ruby 3.2.1
Strings and regexps
Byte-oriented methods
Several method were added that operate on multibyte strings at byte-offset level, regardless of the encoding.
- Reason: Low-level processing of strings (like networks middleware, or efficient search algorithms, or packing/unpacking) might need an ability to operate on a level of single bytes, regardless of original string’s encoding. It is especially important while handling variable-length encodings like UTF-8. Before methods introduction, the only way to perform byte-level processing was to forcing string encoding into
ASCII-8BIT
, process, and then force encoding back. - Discussion: Feature #13110 (
String#byteindex
,String#byterindex
,MatchData#byteoffset
), Feature #18598 (String#bytesplice
) - Documentation:
String#byteindex
,String#byterindex
,String#bytesplice
,MatchData#byteoffset
- Code:
str = 'Слава Україні' str.index('а') #=> 2, character index str.byteindex('а') #=> 4, byte index, because Cyrilic letters in UTF-8 take 2 bytes each str.rindex('а') #=> 9: character index of the last entrance of character 'а' str.byterindex('а') #=> 17: byte index match = str.match(/Слава\s+(?<name>.+)/) match.offset(1) #=> [6, 13] match.byteoffset(1) #=> [11, 25] match.offset(:name) #=> [6, 13] match.byteoffset(:name) #=> [11, 25] str = 'війна' str.bytesplice(2..5, '...') #=> "..." -- returns replacement string str #=> "в...на" -- original string's bytes 2-3, 4-5 (e.g. chars 1 and 2) are replaced # Unlike byteslice getter, bytesplice setter checks character boundaries: str = 'війна' str.byteslice(1..3) #=> "\xB2і" -- works, even if the slice is mid-character str.bytesplice(1..3, '...') # offset 1 does not land on character boundary (IndexError)
- Note:
- After 3.2 release,
bytesplice
behavior had changed to returnself
instead of replacement string. - 3.3: parameters added to
bytesplice
to allow partial copy of the buffer.
- After 3.2 release,
String#dedup
as an alias for -"string"
The method produces frozen and deduplicated string without changing the receiver.
- Reason: Since Ruby 2.5,
-"string"
produces a frozen and deduplicated copy: all instances with the same content take the same place in memory. But it is a less-known fact, that is also hard to guess from the code and quick look into the docs. At the same time, it became a useful idiom for reducing a memory footprint of long-running applications. The#dedup
alias is focused on the behavior, and also more chainable than unary-
. - Discussion: Feature #18595
- Documentation:
String#dedup
- Code:
protocols = %w[http https] domains = %w[company.com api.company.com] # if in various places of the program we constructing the same URLs many times... # ...there might be many similar strings sitting everywhere and taking memory urls = 100.times.map { protocols.sample + '://' + domains.sample } urls.uniq.count # => 4 -- we store 4 same strings again and again urls.map(&:object_id).uniq.count # => 100 -- but it is 100 different objects urls.map!(&:dedup) urls.map(&:object_id).uniq.count # => 4 # The `map(&:dedup)` above could previously been written as urls.map!(&:-@) # ...which calls unary minus on arguments # But it is both uglier, and shows the intention worse.
Regexp.new
: passing flags as a string is supported
- Reason: most of those working with regexps are used to short flag names like
/foo/i
or/bar/m
. At the same time, whenRegexp.new
is constructed dynamically, there was necessary to use numeric flagsRegexp::IGNORECASE | Regexp::MULTILINE
. They are are more formal (and can be thought as more obvious), but string ones are those most of the Rubyists remember. - Discussion: Feature #18788
- Documentation:
Regexp.new
(options
argument) - Code:
Regexp.new('username', 'i') #=> /username/i # All known options work: Regexp.new(<<~'HTML', 'imx') <(\w+) .*?> [^<]+ </\1> HTML #=> same as # %r{<(\w+) .*?> # [^<]+ # </\1>}imx # Unknown option raises Regexp.new('foo', 'g') # unknown regexp option: g (ArgumentError)
- Notes: One quirk that might be surprising with a wrong use of the new feature is that
Regexp.new
treats any truthy value of unrecognized type as “ignore case”. So,# This might erroneously thought to "work": Regexp.new('foo', %w[i]) #=> /foo/i, looks like array of options is also acceptable?.. # ...but actually it is that any truthy value is treated as "ignorecase = true": Regexp.new('foo', %w[abc]) #=> /foo/i
Regexp
: ReDoS vulnerability prevention
- Reason: The ReDoS attack is overloading the system by providing malformed regexp or string to match. The possibility for this attack is mostly theoretical, but still reported as a security vulnerability in some contexts. New Ruby version introduces several features that might mitigate the attack (or, at least, a vulnerability report):
- Setting explicit timeout for Regexp execution;
Regexp.linear_time?
analysis method;- (CRuby-specific) Cache-based optimization: many Regexps now perform in linear time even on very long strings (at the cost of increased memory consumption);
- Discussion: Feature #17837 (timeout), Feature #19104 (cache-based optimization), Feature #19194 (
.linear_time?
) - Documentation:
Regexp.timeout
,Regexp.timeout=
,Regexp.new
(timeout:
keyword argument),Regexp.linear_time?
- Code:
Regexp.linear_time?(/a+$/) #=> true Regexp.linear_time?(/(a+)\1*$/) #=> false, backtracking is complicated Regexp.timeout = 0.005 # Just a demo: simple yet very ambigous regexp applied to very large string /(a+)\1*$/.match?('a' * 1_000_000) # Depending on your machine's performance, might raise: # `match?': regexp match timeout (Regexp::TimeoutError) # When applied to a smaller string /(a+)\1*$/.match?('a' * 1_000) #=> true # This works, too: Regexp.new(/(a+)*$/, timeout: 0.005).match?('a' * 1_000_000) # Might raise: # `match?': regexp match timeout (Regexp::TimeoutError)
- Note: While
Regexp.linear_time?
is part of the official language API, its results for the same regexps might change between versions and implementations.
Time.new
can parse a string
The new protocol for Time.new
is introduced, that parses Time from string.
- Reason: Before Ruby 3.2, there core class
Time
provided no way to to get back aTime
value from any serialization, including even simpleTime#inspect
or#to_s
. TheTime.parse
provided by standard librarytime
(not core functionality, doesn’t work without explicitrequire 'time'
), and tries to parse every imaginable format, whileTime.new
with string is stricter. - Discussion: Feature #18033
- Documentation:
Time.new
- Code:
Time.new('2023-01-29 00:29:30') # => 2023-01-29 00:29:30 +0200 # Desired timezone can be provided as part of a string: Time.new('2023-01-29 00:29:30 +08:00') #=> 2023-01-29 00:29:30 +0800 # ...or like with other .new protocols, as a separate in: argument: Time.new('2023-01-29 00:29:30', in: '+08:00') #=> 2023-01-29 00:29:30 +0800 # The accepted format is much stricter than Time.parse: require 'time' Time.parse('Jan 29, 2023') #=> 2023-01-29 00:00:00 +0200 Time.new('Jan 29, 2023') # in `initialize': can't parse: "Jan 29, 2023" (ArgumentError) # Even incomplete time is considered an error (but see Notes below): Time.new('2023-01-29 00:29') # in `initialize': missing sec part: 00:29 (ArgumentError)
- Notes:
- A few improvements are planned to be made to the parser strictness and robustness in 3.2.1 (see Bug #19296, Bug #19293), for example:
# This works, but is considered a bug, the method should allow # only fully-specified time Time.new("2023-01-29") #=> 2023-01-29 00:00:00 +0200
Time.new('2023')
works, too, but it is a feature that worked before (force-conversion of singular year argument to integer), see Bug #19293. It will probably be deprecated, but can’t be quickly removed due to backward compatibility.
- A few improvements are planned to be made to the parser strictness and robustness in 3.2.1 (see Bug #19296, Bug #19293), for example:
- Follow-ups:: 3.3:
Time.new
became stricting, accepting only fully-specified date-time.
Struct
and Data
Struct
can be initialized by keyword arguments by default
The default behavior of Struct
since 3.2 is to accept both positional and keyword arguments in constructor.
- Reason: Since introduction of
Struct.new(<members>, keyword_init: true)
in 2.5, it was frequently criticized as clumsy - Discussion: Feature #16806
- Documentation:
Struct.new
- Code:
User = Struct.new(:id, :name) # This works: User.new(1, 'Joan') #=> #<struct User id=1, name="Joan"> # Since 3.2, this works too: User.new(id: 1, name: 'Joan') #=> #<struct User id=1, name="Joan"> # keyword_arguments: true/false still can be provided to make the behavior stricter: User = Struct.new(:id, :name, keyword_init: true) User.new(id: 1, name: 'Joan') #=> #<struct User id=1, name="Joan"> User.new(1, 'Joan') # in `initialize': wrong number of arguments (given 2, expected 0) (ArgumentError) User = Struct.new(:id, :name, keyword_init: false) User.new(1, 'Joan') #=> #<struct User id=1, name="Joan"> User.new(id: 1, name: 'Joan') # => #<struct User id={:id=>1, :name=>"Joan"}, name=nil> # Note it is not ArgumentError, but interpreting all keyword args as one positional hash
- Notes:
- The incompatibility might be introduced by code that expected singular hash as an argument for a Struct initialization:
Wrapper = Struct.new(:json_data) Wrapper.new(user: {name: 'Joan'}) # Ruby 3.0: works # #<struct Wrapper json_data={:user=>{:name=>"Joan"}}> # Ruby 3.1: warns, yet works # warning: Passing only keyword arguments to Struct#initialize will behave differently from Ruby 3.2. Please use a Hash literal like .new({k: v}) instead of .new(k: v). # #<struct Wrapper json_data={:user=>{:name=>"Joan"}}> # Ruby 3.2: breaks # in `initialize': unknown keywords: user (ArgumentError) # Fixed by explicitly setting `keyword_init: false` in struct definition: Wrapper = Struct.new(:json_data, keyword_init: false) Wrapper.new(user: {name: 'Joan'}) # => #<struct Wrapper json_data={:user=>{:name=>"Joan"}}> # ...on Ruby 2.5-3.2, without any warnings # or, alternatively, as always wrapping hashes in {} explicitly, # as 3.1's warning suggested: Wrapper.new({user: {name: 'Joan'}}) # => #<struct Wrapper json_data={:user=>{:name=>"Joan"}}>
- While the new behavior is convenient, one should be especially careful when redefining
#initialize
forStruct
s to not break it:User = Struct.new(:id, :new) do # suppose we want to convert id to Integer before initializing. # Note that it could be in `args.first`, or in `kwargs[:id]` now, so it is either this: def initialize(*args, **kwargs) if !args.empty? args[0] = args[0].to_i elsif kwargs.key?(:id) kwargs[:id] = kwargs[:id].to_i end super(*args, **kwargs) end # or just post-processing... def initialize(...) super(...) self.id = self.id.to_i end end
- The incompatibility might be introduced by code that expected singular hash as an argument for a Struct initialization:
Data
: new immutable value object class
A new class for containing value objects: it is somewhat similar to Struct
(and reuses some of the implementation internally), but is intended to be immutable, and have more modern and cleaner API.
- Reason: Before 3.2,
Struct
was an ubiquitous data holder class in Ruby, but being designed a long time ago, it has its drawbacks, making it not suitable for all situations: it is mutable by design (have argument setters), and have APIs of both “value-alike” and “container-alike” types. But there is a lot of code usingStruct
in various ways (and for good reasons), so it can’t be just redesigned. Several approaches was considered (including adding a “configure” API toStruct
, allowing to specify “should it be mutable, should it be iterable, should it be hash-alike”), but in the end, a new class with smaller and stricter API was designed. - Discussion: Feature #16122
- Documentation:
Data
- Code: Data is completely new, well-documented class. So we wouldn’t try to demonstrate all details of its behavior, just give a brief overview.
Point = Data.define(:x, :y) # Both positional and keyword arguments can be used p1 = Point.new(1, 0) #=> #<data Point x=1, y=0> p2 = Point.new(x: 0, y: 1) #=> #<data Point x=0, y=1> # all arguments are mandatory Point.new(1) # missing keyword: :y (ArgumentError) # #initialize might be redefined to provide default arguments or argument conversions Point3D = Data.define(:x, :y, :z) do def initialize(x:, y:, z: 0) = super end Point3D.new(x: 1, y: 2) # => #<data Point3D x=1, y=2, z=0> # the redefinition above is enough to handle keyword AND position arguments: Point3D.new(1, 2) # => #<data Point3D x=1, y=2, z=0> # there is no setters or any other way to change already created object p1.x = 5 # undefined method `x=' for #<data Point x=1, y=0> (NoMethodError) p1.instance_variable_set('@z', 100) # can't modify frozen Point: #<data Point x=1, y=0> (FrozenError) # #with method can be used to construct new instances, # replacing only parts of the data: p1.with(y: 100) #=> #<data Point x=1, y=100>
- Notes:
- The class with the same name (
Data
) existed before for internal purposes—as a recommended empty base class for classes defined in C extensions. It was deprecated since Ruby 2.5, and removed in Ruby 3.0. - On
Data
immutability: note that only theData
-derived object itself is frozen, but there is no deep freezing of instance variables. So this is still possible (and up to user code to prevent, if undesirable):Result = Data.define(:array) res = Result.new([1, 2, 3]) res.instance_variable_set('@size', 3) #=> can't modify frozen Result, as expected # but... res.array << 4 # works res #=> #<data Result array=[1, 2, 3, 4]> # Can shoot yourself in the foot in code doing something like... case res in Result(array:) # unpack into local variable array.reverse! # process it inplace, considering it independent local variable... # ...pass processed somewhere else... end # ...but actually data WAS changed: res #=> #<data Result array=[4, 3, 2, 1]>
- The class with the same name (
- Follow-ups:
#with
method in Ruby 3.2.0 was naive and just copies all old and new attributes to the new instance, without invoking any custom initialization methods. It was fixed to call#initialize
in 3.2.2:Point = Data.define(:x, :y) do def initialize(x:, y:) = super(x: x.to_i, y: y.to_i) end p = Point.new('1', '2') # => #<data Point x=1, y=2> -- conversion performed through #initialize p.with(y: '3') # => #<data Point x=1, y="3"> -- #initialize is bypassed # Probably since Ruby 3.2.1: p.with(y: '3') # => #<data Point x=1, y=3>
Pattern matching
- “Find pattern”
value in [*, pattern, *]
is no longer experimental. Feature #18585
MatchData
: added #deconstruct
and #deconstruct_keys
As a part of the effort to make core classes more pattern matching friendly, MatchData
(the result of regexp matching) now can be deconstructed.
- Discussion: Feature #18821
- Documentation:
MatchData#deconstruct
,MatchData#deconstruct_keys
- Code:
case connection_string.match(%r{postgres://(\w+):(\w+)@(.+)}) in 'admin', password, server # do connection with admin rights in ^DEV_USERS, _, 'dev-server.local' # connect to dev server with any password in user, password, server # do regular connection end # Might be used just for quick and expressive unpacking of match results connection_string = 'postgres://admin:secret@foo.amazonaws.com' connection_string.match(%r{postgres://(\w+):(\w+)@(.+)}) => user, password, server user #=> "admin" password #=> "secret" server #=> "foo.amazonaws.com" # When named capture group is used, MatchData also provides hash unpacking: connection_string.match(%r{postgres://(?<user>\w+):(?<password>\w+)@(?<server>.+)}) => user:, password:, server: user #=> "admin" password #=> "secret" server #=> "foo.amazonaws.com"
Time#deconstruct_keys
Time
now can be used in pattern matching too.
- Discussion: Feature #19071
- Documentation:
Time#deconstruct_keys
- Code:
# `deconstruct_keys(nil)` shows all available keys: Time.now.deconstruct_keys(nil) # => {:year=>2023, :month=>1, :day=>15, :yday=>15, :wday=>0, :hour=>17, :min=>5, :sec=>56, :subsec=>(148452241/200000000), :dst=>false, :zone=>"EET"} # Usage in pattern-matching: case timestamp in year: ...2022 puts "Far past!" in year: 2022, month: 1..3 puts "Last year's first quarter" in year: 2023, month:, day: puts "#{day} of #{month}th month!" # ... end # Check if it is the first Thursday of the current month: if Time.now in wday: 4, day: ..7 # ...
- Notes:
- It was decided that
#deconstruct
method forTime
doesn’t make much sense, because the reasonable order for all of the time components is hard to define. - Standard library classes
Date
andDateTime
also receive similar implementations (Date#deconstruct_keys
,DateTime#deconstruct_keys
):require 'date' Date.today.deconstruct_keys(nil) #=> {:year=>2023, :month=>1, :day=>15, :yday=>15, :wday=>0} DateTime.now.deconstruct_keys(nil) # => {:year=>2023, :month=>1, :day=>15, :yday=>15, :wday=>0, :hour=>17, :min=>19, :sec=>15, :sec_fraction=>(478525469/500000000), :zone=>"+02:00"}
- It was decided that
Enumerables and collections
Enumerator.product
Generates an enumerator from several other, yielding all possible combinations of their elements.
- Discussion: Feature #18685
- Documentation:
Enumerator.product
,Enumerator::Product
- Code:
enumerator = Enumerator.product(1.., %w[test me]) # => #<Enumerator::Product: ...> enumerator.take(6) # => [[1, "test"], [1, "me"], [2, "test"], [2, "me"], [3, "test"], [3, "me"]] # The arguments can be any object responding to `each_entry`, # not necessary enumerator/enumerable class ThreeBears def each_entry yield 'Papa Bear' yield 'Mama Bear' yield 'Little Bear' end end Enumerator.product([1, 2], ThreeBears.new).to_a # => [[1, "Papa Bear"], [1, "Mama Bear"], [1, "Little Bear"], # [2, "Papa Bear"], [2, "Mama Bear"], [2, "Little Bear"]]
- Notes:
- It is currently discussed that protocol for
Enumerator.product
is unlikeArray#product
(which is a method of the first argument of the expression). - If one of the enumerators is effectful (can be iterated through only once), the current implementation would exhaust it on the first go:
require 'stringio' # This will work as expected io = StringIO.new('abc') Enumerator.product(io.each_char, [1, 2, 3]).to_a # => [["a", 1], ["a", 2], ["a", 3], ["b", 1], ["b", 2], ["b", 3], ["c", 1], ["c", 2], ["c", 3]] # But this will produce less data than the full cross-product # This will work as expected io = StringIO.new('abc') Enumerator.product([1, 2, 3], io.each_char).to_a #=> [[1, "a"], [1, "b"], [1, "c"]]
This is probably a bug.
- It is currently discussed that protocol for
Hash#shift
always returns nil
if the hash is empty
There was a bug/inconsistency with returning the default value if it is defined.
- Discussion: Bug #16908
- Documentation:
Hash#shift
- Code:
h = {a: 1} h.shift #=> [:a, 1] h.shift #=> nil, as expected # but if the default for hash is defined... h.default = :foo h.shift # 3.1: => :foo -- hard to explain, it isn't even [key, value] pair # 3.2: => nil
Set
became a built-in class
Previously a part of standard library, Set (a collection of unique elements) was promoted to core class. No more need to require 'set'
to use the class.
- Discussion: Feature #16989
- Documentation:
Set
(still mentionsrequire 'set'
, though) - Notes: As of 3.2, the only change is making the library auto-required without changing the implementation.
Set
is still not as integrated in Ruby as other collections, likeHash
andArray
:Set
is implemented in Ruby, usesHash
as its internal storage (by creating a hash ofset_element => true
pairs), and doesn’t have its own literal. There are distant plans to improve it, but with no particular schedule.
Thread::Queue
: timeouts for pop
and push
timeout: <number>
parameter was added to methods Queue#pop
, SizedQueue#push
, SizedQueue#pop
.
- Reason: As thread queue is meant as a method of inter-thread communication, it is useful to provide a way for not hung a thread forever while waiting for input from other thread (or waiting for place in queue in case of
SizedQueue#push
) - Discussion: Feature #18774, Feature #18944
- Documentation:
Thread::Queue#pop
,Thread::SizedQueue#pop
,Thread::SizedQueue#push
- Code:
queue = Thread::Queue.new sender = Thread.new do queue.push(1) queue.push(2) end # Expects 3 values from sender receiver = Thread.new do # This will print 1, 2, and then make receiver sleep forever: # 3.times.each { p queue.pop } # But this prints 1, 2, waits for 0.5 seconds and then prints `nil` 3.times.map { p queue.pop(timeout: 0.5) } end [sender, receiver].each(&:join) sized = Thread::SizedQueue.new(2) sized.push(1, timeout: 0.5) #=> success, returns the queue object sized.push(2, timeout: 0.5) #=> success, returns the queue object sized.push(3, timeout: 0.5) #=> waits 0.5 seconds, returns nil sized.size #=> 2, only 1 and 2 were pushed successfully
Procs and methods
Proc#dup
returns an instance of subclass
- Reason: Just for consistency with other core classes behavior.
- Discussion: Bug #17545
- Documentation: —
- Code:
class MyProc < Proc # some additional custom methods... end MyProc.new { }.dup # 3.1: => #<Proc:...> # 3.2: => #<MyProc:...>
- Notes:
- In general, inheriting from core classes is a questionable practice, and you probably should avoid it;
- Despite producing an instance of a subclass now,
#dup
doesn’t call#initialize_dup
constructor, so custom data that you’ve associated with a subclass instance can’t be preserved:class TaggedProc < Proc attr_reader :tag def initialize(tag, &block) @tag = tag super(&block) end def initialize_dup(other) # this will NOT be invoked @tag = other.tag super end end t = TaggedProc.new('test') { } t.tag #=> 'test' t.dup.tag #=> nil
This is a bug.
- Follow-ups: 3.3:
#dup
properly invokes#initialize_dup
.
Proc#parameters
: new keyword argument lambda: true/false
parameters(lambda: true)
returns Proc parameters description as if the proc was lambda (e.g. the parameters without defaults was mandatory), regardless of Proc’s real “lambdiness.”
- Reason: The regular (non-lambda) proc always reports its positional arguments is optional. It corresponds to its behavior, but loses the information which of them have default values defined. It might be inconvenient when using procs in metaprogramming, like building wrapper objects, or defining methods based on procs.
- Discussion: Feature #15357
- Documentation:
Proc#parameters
- Code:
prc = proc { |x, y=0| p(x:, y:) } prc.parameters # => [[:opt, :x], [:opt, :y]] -- for proc, all parameters are optional # Whih corresponds to how it actually behaves: all params can be skipped: prc.call #=> {:x=>nil, :y=>0} prc.parameters(lambda: true) # => [[:req, :x], [:opt, :y]] -- in stricter lambda protocol, first parameter is required # Which corresponds to how the corresponding lambda would treat # its parameters: lambda { |x, y=0| p(x:, y:) }.call # wrong number of arguments (given 0, expected 1..2) (ArgumentError) # The `lambda: false` call works, too, although arguably less useful: l = ->(x, y=0) { } l.parameters # => [[:req, :x], [:opt, :y]] l.parameters(lambda: false) # => [[:opt, :x], [:opt, :y]]
Method#public?
, #protected?
, and #private?
are removed
Predicates to check method visibility added in Ruby 3.1 were reverted.
- Reason: The new feature implementation have led to several bugs with
Method
class behavior; while investigating the root cause for those bugs, Matz have decided that method’s visibility is not its inherent property, but rather a property of the module/object that owns the method, and as such, is already present in form ofModule#{private,public,protected}_instance_methods
andObject#{private,public,protected}_methods
- Discussion: Feature #11689#note-24
- Notes: The discussion whether the feature should be un-reverted is still ongoing!
UnboundMethod
: more consistent reporting on what module it belongs to
Since 3.2, UnboundMethod
’s #inspect
and comparison with other UnboundMethod
instances only considers the module it is defined in, not the actual module it was unbound from.
- Reason: The change just aligns auxiliary methods with the main
UnboundMethod
implementation. No usage of unbound method is affected by what was the original class or object it was unbound from, only by the place of definition. - Discussion: Feature #18798
- Affected methods:
UnboundMethod#==
,#inspect
(documentation not updated) - Code:
tally = Array.instance_method(:tally) p tally # 3.1: => #<UnboundMethod: Array(Enumerable)#tally(*)> # 3.2: => #<UnboundMethod: Enumerable#tally(*)> # The former reports "it was defined in Enumerable, but unbound from Array" orig_tally = Enumerable.instance_method(:tally) tally == orig_tally # 3.1: false -- because it was unbound from different class # 3.2: true # In reality, both are the same, and can be rebound to any class including Enumerable: orig_tally.bind("test".each_char).call # => {"t"=>2, "e"=>1, "s"=>1} tally.bind("test".each_char).call # => {"t"=>2, "e"=>1, "s"=>1} -- on 3.1, this worked, even if tally was "unbound from Array" # Therefore, reporting `tally` as belonging to Array and unequal to `orig_tally` was misleading
- Note: While it might seem like a weird unnecessary quirk, unbinding methods and then rebinding them to different objects is useful metaprogramming technique when redefining some core methods to preserve and reuse the initial implementation.
IO and network
IO
: support for timeouts for blocking IO
IO#timeout
getter and setter were added to the base class, and are respected on blocking operations.
- Discussion: Feature #18630
- Documentation:
IO#timeout
,IO#timeout=
- Code:
STDIN.timeout = 5 print "Tell me what: " answer = gets # If you didn't print anything for 5 seconds, this raises: # in `gets': Blocking operation timed out! (IO::TimeoutError) STDIN.timeout = nil # to remove the timeout answer = gets # will wait till input appears or process will be killed STDIN.timeout = 0 answer = gets # Will raise IO::TimeoutError immediately, # useful for quick "take something from input buffer if it isn't empty"
- Note:
IO#timeout
in general affects reading and writing operations (including network ones, defined onSocket
). Operations likeIO.open
andIO#close
are not affected.
IO#path
Any IO
object can be constructed with additional argument path:
, which will be available as a path
attribute.
- Reason:
IO
object could be created from low-level file descriptor (for example, returned by some C extension), but there was no way to specify it corresponds to some specific filesystem path. - Discussion: Feature #19036
- Documentation: IO#Open options,
IO#path
- Code:
# Always worked: f = File.open('README.md') f.path #=> 'README.md' # IO created from system-level file descriptor (which might've been returned by a C library) io = IO.new(f.fileno) # => #<IO:fd 5> io.path # 3.1: NoMethodError (undefined method `path') # 3.2: => nil # IO can't guess file path from the descriptor, but path can be provided explicitly: io = IO.new(f.fileno, path: 'README.md') # => #<IO:README.md> io.path # => "README.md" # One generalization of the new feature was to introspection of standard IO streams: STDOUT.path # 3.1: NoMethodError (undefined method `path') # 3.2: => "<STDOUT>"
Exceptions
Exception#detailed_message
The method can be redefined for providing custom “decoration” of exception messages, without redefining the main #message
.
- Reason: Standard libraries like
did_you_mean
(adds “did you mean other name” toNoMethodError
) orerror_highlight
(printing of failed part of code and highlighting the problematic part) previously adjustedException#message
method. It might not always be convenient: say, if an application wants to benefit from those gems, but also need to report “clear” error messages to a monitoring system, it required workaround. Starting from Ruby 3.2, there is a clear distinction:#message
is an original message with which the exception was raised;#detailed_message
might be redefined by some libraries or user’s code for convenience and better reporting (most probably);#full_message
(introduced in 2.5) is what the interpreter prints: detailed message + error backtrace.
- Discussion: Feature #18564
- Documentation:
Exception#detailed_message
- Code:
# Default implementation: begin raise RuntimeError, 'test' rescue => e puts e.message # test puts e.detailed_message # adds error class # test (RuntimeError) puts e.full_message # adds backtrace # test.rb:3:in `<main>': test (RuntimeError) end # NoMethodError employs did_you_mean to lookup for the right name, # and error_highight to show where exactly the error happened: begin 'foo'.lenthg rescue => e puts e.message # undefined method `lenthg' for "foo":String puts e.detailed_message # class name + highlighted part of code + "Did you mean?" # undefined method `lenthg' for "foo":String (NoMethodError) # # 'foo'.lenthg # ^^^^^^^ # Did you mean? length puts e.full_message # all of the above + "where it happened" # test.rb:2:in `<main>': undefined method `lenthg' for "foo":String (NoMethodError) # # 'foo'.lenthg # ^^^^^^^ # Did you mean? length end # Implement the custom one: class LoadError def detailed_message(highlight: false, **) res = super # invoke the default implementation which will produce message + class name return res unless path.start_with?('vendor/') # Provide custom value. Ideally, the code should consider to add some # markup with escape codes for expressiveness if `highlight: true` is passed res + "\n"\ " Vendor library `#{path.delete_prefix('vendor/')}' not loaded\n"\ " Check our instructions in VENDOR.md" end end require 'vendor/tricky' # This will now raise an error which would be printed as... # # in `require': cannot load such file -- vendor/tricky (LoadError) # Vendor library `tricky' not loaded # Check our instructions in VENDOR.md
SyntaxError#path
Returns the path of where the error have happened.
- Reason: The feature was introduced by request of SyntaxSuggest new core library. It makes post-processing of SyntaxError easier for this and third-party libraries, say, when it is necessary to analyze the code that errored.
- Discussion: Feature #19138
- Documentation:
SyntaxError
- Code:
# Consider there is 'test.rb' such that: x = 5 y = 6 z = x** #---- begin load 'test.rb' rescue SyntaxError => e p e #=> #<SyntaxError:"tmp/test.rb:3: syntax error, unexpected end-of-input\n z = x**\n ^\n"> puts e.path #=> test.rb end
- Note: As of 3.2, there is no way to set
path
(unlike other additional exception data likeKeyError#key
that can be set in#initialize
). AsSyntaxError
is mostly meant to be generated by the Ruby parser and not by custom code, that might not be a big problem.
Concurrency
- For documentation purposes,
Fiber::SchedulerInteface
(3.0-3.1) was renamed toFiber::Scheduler
(3.2+). It still serves just a documentation-level abstraction: no real class with such name exists, see our explanations in 3.0 changelog.
Fiber
storage
Per-fiber hash-alike storage interface is introduced. It can be set up on Fiber creation, and accessed as a whole via #storage
accessors, or key-by-key with Fiber[]
accessors. By default, it is inherited on fiber creation, but can be overridden.
- Reason: The official explanation from NEWS is the best: “You should generally consider Fiber storage for any state which you want to be shared implicitly between all fibers and threads created in a given context, e.g. a connection pool, a request id, a logger level, environment variables, configuration, etc.”
- Discussion: Feature #19078
- Documentation:
Fiber.[]
,Fiber.[]=
,Fiber#storage
,Fiber#storage=
,Fiber.new
- Code:
Fiber[:user] = 'admin' Fiber[:user] #=> "admin" Fiber.current.storage #=> {user: 'admin'} # This will have no effect, storage returns a copy of internal storage Fiber.current.storage[:user] = 'John' # Still the same: Fiber[:user] #=> "admin" Fiber.current.storage = {user: 'Jane'} # warning: Fiber#storage= is experimental and may be removed in the future! Fiber[:user] #=> "Jane" # Cleaning up the storage Fiber.current.storage = nil Fiber.current.storage #=> {} Fiber[:user] = 'admin' f = Fiber.new { puts Fiber[:user] } f.resume # prints "admin", by default the storage is inherited f.storage # raises "Fiber storage can only be accessed from the Fiber it belongs to" (ArgumentError) # The storage can be overwritten on creation: Fiber.new(storage: {user: 'Jane'}) { puts Fiber[:user] }.resume # prints "Jane" # or... Fiber.new(storage: nil) { puts Fiber[:user] }.resume # prints empty string # The same as default: inherit from the creating fiber: Fiber.new(storage: true) { puts Fiber[:user] }.resume # prints "admin" # Even if inherited, fiber storage is isolated between fibers: f = Fiber.new { puts Fiber[:user] Fiber[:user] = 'Amy' } Fiber[:user] = 'Jane' f.resume # prints "admin" from fiber, change in the main fiber didn't affect inherited puts Fiber[:user] # prints "Jane", change in inherited fiber didn't affect the main one
- Notes:
- Only
Fiber#storage=
is considered experimental; the rest of API is considered stable; - There is an API discrepancy, currently discussed, between
Fiber[]
(which is class method, reading/writing current fiber’s storage) andFiber.current.storage
(instance method, but available only on class instance);
- Only
Fiber::Scheduler#io_select
Implements non-blocking IO.select
- Discussion: Feature #19060
- Documentation:
Fiber::Scheduler#io_select
- Notes:
- See code examples in 3.0 changelog for general demo of using Fiber Scheduler. As no simple implementation is available, it is complicated to show an example of new hooks in play.
- Just to remind: Ruby does not include the default implementation of Fiber Scheduler, but the maintainer of the feature, Samuel Williams, provides one in his gem Async which is Ruby 3.2-compatible already.
Internals
Thread.each_caller_location
A way for enumerating backtrace entries without instantiating them all.
- Reason: There are may contexts when only a small chunk of the backtrace is necessary, but to find this chunk, the whole backtrace needs to be materalized with
#caller_locations
. For example, consider “send to monitoring system the first line in theapp/
that called this (library) query code.” In large apps under high load, the call stack might be really large, and cost of its materialization into Ruby objects on frequent calls might be significant. The new method allows to go through stack frames one by one, and break as soon as the necessary one(s) is reached. - Discussion: Feature #16663
- Documentation:
Thread.each_caller_location
- Code:
# test.rb def inner Thread.each_caller_location { p [_1, _1.class] } end def outer inner end outer # prints: # ["test.rb:8:in `outer'", Thread::Backtrace::Location] # ["test.rb:11:in `<main>'", Thread::Backtrace::Location] # More realistic usage: def method_to_debug # ... app_frame = nil Thread.each_caller_location { if _1.path.match?('/app') app_frame = _1 break end } Monitoring.notify "Method was invoked by #{app_frame}" # ... end
- Notes:
- Note that while each item is printed as a regular string, they are actually instances of a utility class
Thread::Backtrace::Location
. - The method intentionally doesn’t have a block-less version (which should’ve returned
Enumerator
as Enumerable’s method like#each
or#map
do): this would defy the point of efficient backtrace analysis at the current frame, adding more frames of Enumerable/Enumerator implementations; - For the reason of efficiency,
each_caller_location
returns nothing (again, to avoid materializing unnecessary objects), so if the goal is to find one location, as in example above, or select some part of the call stack, the only way to do it is non-idiomatic code:lib = [] # Goal: take the first caller locations while they are inside our app's lib/ folder: Thread.each_caller_locaton { lib << _1 break unless _1.start_with?('lib/') }
- Note that while each item is printed as a regular string, they are actually instances of a utility class
GC.latest_gc_info
: add need_major_gc:
key
- Reason: The information (whether the next garbage collection would be minor or major) might be useful for highload systems, where it might make sense to trigger garbage-collection preemptively if the next one would be major, before entering the performance-critical part of the code.
- Discussion: GH-6791
- Documentation:
GC.latest_gc_info
(the possible keys aren’t documented) - Code:
GC.latest_gc_info # 3.1: # => {:major_by=>nil, :gc_by=>:newobj, :have_finalizer=>false, :immediate_sweep=>false, :state=>:sweeping} # 3.2: # => {:major_by=>nil, :need_major_by=>nil, :gc_by=>:newobj, :have_finalizer=>false, :immediate_sweep=>false, :state=>:none} # Or: GC.latest_gc_info(:need_major_by) #=> nil
- Notes: The author of this changelog is not a GC expert, and the matter is not very well documented, so I only can say that the possible values (besides
nil
), according to feature’s code, are:nofree
,:oldgen
,:shady
,:force
, and they are the same asmajor_by:
possible values. As far as I can guess,major_by:
describes the latest major GC method, whileneed_major_by:
describes the upcoming one; if it isnil
, the major GC is not upcoming probably?..
ObjectSpace
: dumping object shapes
Object Shapes is a large and interesting new internal object structuring approach which we (being focused on language API) wouldn’t explain here. The explanation and discussion can be found at Feature #18776. The only way the Ruby-level API is affected by the change is a new parameter for ObjectSpace.dump_all
method, that allows to dump shapes defined so far.
- Discussion: GH-6868
- Documentation:
ObjectSpace#dump_all
(docs not fully updated, though) - Code:
require 'objspace' # To only output what would be put int ObjectSpace since this point gc_generation = GC.count since_id = RubyVM.stat(:next_shape_id) # New shapes are defined when instance vars for objects are set, so let's make one! class User def initialize(id, name) @id = id @name = name end end User.new(1, 'Yuki') ObjectSpace.dump_all(output: :stdout, since: gc_generation, shapes: since_id) # {"address":"0x7f6490e00da0", "type":"SHAPE", "id":237, "parent_id":5, "depth":3, "shape_type":"IVAR","edge_name":"@id", "edges":1, "memsize":120} # {"address":"0x7f6490e00dc0", "type":"SHAPE", "id":238, "parent_id":237, "depth":4, "shape_type":"IVAR","edge_name":"@name", "edges":0, "memsize":32}
This reads: setting
@id
creates a new shape (with"id":237
), and setting@name
creates the next one, inherited from from that ("id":238, "parent_id":237
). To understand the deep meaning and consequences of this behavior, though, we’ll refer to the original discussion.
TracePoint#binding
returns nil
for c_call
/c_return
- Reason: See
Kernel#binding
explanations above: C methods don’t have their own binding, so before Ruby 3.2,TracePoint#binding
for their call confusingly returned the binding of the first Ruby caller in the call stack. - Discussion: Bug #18487
- Documentation:
TracePoint#binding
(still has docs for old behavior, though) - Code:
TracePoint.new(:c_call) do |tp| p [tp.method_id, tp.binding, tp.binding&.local_variables] end.enable { x = [5] x.map { } } # In Ruby 3.1, this prints: # [:map, #<Binding:0x00007fbfb9b36d40>, [:x]] -- so, we have a binding of surrounding block, not insides of `map` # In Ruby 3.2: # [:map, nil, nil]
TracePoint
for block default to trace the current thread
- Reason: In block form, the intention of the developer is to trace what’s happening in the specified block. In complicated applications, though, other threads might work at the same time and pollute the tracing with unrelated occurrences.
- Discussion: Bug #16889
- Documentation:
TracePoint#enable
. - Code:
def test = nil other = Thread.start { sleep(0.1) # to give TracePoint time to start test } Thread.current.name = 'main' other.name = 'other' # Note: each example below needs to restart the "other" thread. TracePoint.new(:call) do |tp| puts "Called from #{Thread.current}" if tp.method_id == :test end.enable do test other.join end # Ruby 3.1: # Called from #<Thread:...@main run> # Called from #<Thread:...@other run> # Ruby 3.2: # Called from #<Thread:...@main run> # The desired thread to trace can be specified explicitly: TracePoint.new(:c_call) do |tp| puts "Called from #{Thread.current}" if tp.method_id == :size end.enable(target_thread: other) do test other.join end # Ruby 3.1 and 3.2: # Called from #<Thread:...@other run> # Only block form is affected: tp = TracePoint.new(:c_call) do |tp| puts "Called from #{Thread.current}" if tp.method_id == :size end tp.enable test other.join # Ruby 3.1 & 3.2: # Called from #<Thread:...@main run> # Called from #<Thread:...@other run> # If target for tracing is explicitly specified, all threads are traced: TracePoint.new(:c_call) do |tp| puts "Called from #{Thread.current}" if tp.method_id == :size end.enable(target: method(:test)) do test other.join end # Ruby 3.1 & 3.2: # Called from #<Thread:...@main run> # Called from #<Thread:...@other run>
RubyVM::AbstractSyntaxTree
error_tolerant: true
option for parsing
With this option, parsing can be performed even on incomplete and syntactically incorrect scripts, replacing unparseable parts with ERROR
token.
- Reason: The new option opens road for using Ruby native parser for various language tools working on the fly, while the code is written (like LSP) or providing advice and possible fixes on erroneous code. It is important that “official” language parser supported such cases out-of-the-box.
- Discussion: Feature #19013
- Documentation:
AbstractSyntaxTree.parse
- Code:
src = <<~RUBY def test RUBY RubyVM::AbstractSyntaxTree.parse(src) # in `parse': syntax error, unexpected end-of-input (SyntaxError) root = RubyVM::AbstractSyntaxTree.parse(src, error_tolerant: true) pp root # Shortening for the sake of this changelog, the structure of the tree would be: # # (SCOPE@1:0-1:8 # body: # (DEFN@1:0-1:8 # mid: :test # body: # (SCOPE@1:0-1:8 # args: # (ARGS@1:8-1:8 ...) # body: nil))) # # E.g. the code is correctly parsed as "a beginning of a method `test` # without a body" # The parser also tries to recover from errors in the middle of the script: src = <<~RUBY def bad x + end def good puts 'ok' end RUBY root = RubyVM::AbstractSyntaxTree.parse(src, error_tolerant: true) pp root # Shortened output again... # # (SCOPE@1:0-7:3 # body: # (BLOCK@1:0-7:3 # (DEFN@1:0-3:3 # mid: :bad # body: # (SCOPE@1:0-3:3 # body: (ERROR@2:2-3:3))) # (DEFN@5:0-7:3 # mid: :good # body: # (SCOPE@5:0-7:3 # body: (FCALL@6:2-6:13 :puts (LIST@6:7-6:13 (STR@6:7-6:13 "test") nil)))))) # # Note the ERROR node in the midle of method :bad, but then properly parsed # body of method :good
- Notes:
- Recovery not guaranteed
keep_tokens: true
option for parsing
With keep_tokens: true
option provided, AbstractSyntaxTree.parse
will attach corresponding code tokens array to each node of the syntax tree.
- Reason: As the previous feature, this one is useful for implementing code analysis tools: there are several ways to write code that will produce exactly the same syntax tree; and while it doesn’t affect interpreting, it does affect style checking, suggestions etc.
- Discussion: Feature #19070
- Documentation:
AbstractSyntaxTree.parse
,Node#tokens
,Node#all_tokens
- Code:
RubyVM::AbstractSyntaxTree.parse("puts 'test'", keep_tokens: true).tokens # => # [[0, :tIDENTIFIER, "puts", [1, 0, 1, 4]], # [1, :tSP, " ", [1, 4, 1, 5]], # [2, :tSTRING_BEG, "'", [1, 5, 1, 6]], # [3, :tSTRING_CONTENT, "test", [1, 6, 1, 10]], # [4, :tSTRING_END, "'", [1, 10, 1, 11]]] RubyVM::AbstractSyntaxTree.parse("puts('test')", keep_tokens: true).tokens # => # [[0, :tIDENTIFIER, "puts", [1, 0, 1, 4]], # [1, :"(", "(", [1, 4, 1, 5]], # [2, :tSTRING_BEG, "'", [1, 5, 1, 6]], # [3, :tSTRING_CONTENT, "test", [1, 6, 1, 10]], # [4, :tSTRING_END, "'", [1, 10, 1, 11]], # [5, :")", ")", [1, 11, 1, 12]]] RubyVM::AbstractSyntaxTree.parse("puts('test', )", keep_tokens: true).tokens # => # [[0, :tIDENTIFIER, "puts", [1, 0, 1, 4]], # [1, :"(", "(", [1, 4, 1, 5]], # [2, :tSTRING_BEG, "'", [1, 5, 1, 6]], # [3, :tSTRING_CONTENT, "test", [1, 6, 1, 10]], # [4, :tSTRING_END, "'", [1, 10, 1, 11]], # [5, :",", ",", [1, 11, 1, 12]], # [6, :tSP, " ", [1, 12, 1, 13]], # [7, :")", ")", [1, 13, 1, 14]]]
Note that all three scripts are exactly equivalent execution-wise and will produce the same syntax tree; but from the point of view of code analysis tool, they are different. For example, the first one might cause the suggestion to add parentheses (if that’s the preferred style setting), and the last one might imply that the user waits for suggestions for possible local variables to add to output.
Standard library
By Ruby 3.1 release, most of the standard library is extracted to either default or bundled gems; their development happens in separate repositories, and changelogs are either maintained there, or absent altogether. Either way, their changes aren’t mentioned in the combined Ruby changelog, and I’ll not be trying to follow all of them.
stdgems.org project has a nice explanations of default and bundled gems concepts, as well as a list of currently gemified libraries and links to their docs.
“For the rest of us” this means libraries development extracted into separate GitHub repositories, and they are just packaged with main Ruby before release. It means you can do issue/PR to any of them independently, without going through more tough development process of the core Ruby.
A few changes to mention, though:
Pathname#lutime
.FileUtils.ln_sr
andrelative:
option forFileUtils.ln_s
. Discussion: Feature #18925.CGI.escapeURIComponent
andCGI.unescapeURIComponent
are added. This is an attempt to mitigate discrepancy between various helper method throughout the standard libraries likeURI
,ERB
andCGI
. Discussion: Feature #18822- The difference with
CGI.escape
/unescape
is only in encoding and decoding' '
character (escape
followsapplication/x-www-form-urlencoded
which converts it to+
, whileescapeURIComponent
follows RFC 3986 and converts it to'%20'
) - Previously, the goal could’ve been achieved with
URI.escape
, but it was deprecated since 1.9 and removed in 3.0, being too vague and generic (it actually meant to replace all “unsafe” characters on URI construction). - Unusual for Ruby method names are mimicking well-known JS ones like encodeURIComponent.
- The difference with
Coverage
:- The internal changes in interpreter were made so the standard library would be able to measure code coverage of
eval
-ed code.- Discussion: Feature #19008.
- Docs
Coverage.setup
(enabled witheval: true
, or:all
arguments)
.supported?
. Discussion: Feature #19026
- The internal changes in interpreter were made so the standard library would be able to measure code coverage of
- There are many awesome changes in Ruby’s console IRB, see the gem author’s article What’s new in Ruby 3.2’s IRB?.
Version updates
Default gems
- RubyGems 3.4.1
- abbrev 0.1.1
- benchmark 0.2.1
- bigdecimal 3.1.3
- bundler 2.4.1
- cgi 0.3.6
- csv 3.2.6
- date 3.3.3
- delegate 0.3.0
- did_you_mean 1.6.3
- digest 3.1.1
- drb 2.1.1
- english 0.7.2
- erb 4.0.2
- error_highlight 0.5.1
- etc 1.4.2
- fcntl 1.0.2
- fiddle 1.1.1
- fileutils 1.7.0
- forwardable 1.3.3
- getoptlong 0.2.0
- io-console 0.6.0
- io-nonblock 0.2.0
- io-wait 0.3.0
- ipaddr 1.2.5
- irb 1.6.2
- json 2.6.3
- logger 1.5.3
- mutex_m 0.1.2
- net-http 0.3.2
- net-protocol 0.2.1
- nkf 0.1.2
- open-uri 0.3.0
- open3 0.1.2
- openssl 3.1.0
- optparse 0.3.1
- ostruct 0.5.5
- pathname 0.2.1
- pp 0.4.0
- pstore 0.1.2
- psych 5.0.1
- racc 1.6.2
- rdoc 6.5.0
- readline-ext 0.1.5
- reline 0.3.2
- resolv 0.2.2
- resolv-replace 0.1.1
- securerandom 0.2.2
- stringio 3.0.4
- strscan 3.0.5
- syntax_suggest 1.0.2
- syslog 0.1.1
- tempfile 0.1.3
- time 0.2.1
- timeout 0.3.1
- tmpdir 0.1.3
- tsort 0.1.1
- un 0.2.1
- uri 0.12.0
- weakref 0.1.2
- win32ole 1.8.9
- yaml 0.2.1
- zlib 3.0.0
Bundled gems
- minitest 5.16.3
- power_assert 2.0.3
- test-unit 3.5.7
- net-ftp 0.2.0
- net-imap 0.3.4
- net-pop 0.1.2
- net-smtp 0.3.3
- rbs 2.8.2
- typeprof 0.21.3
- debug 1.7.1
Standard library content changes
New libraries
- syntax_suggest (formerly
dead_end
) gem added. It provides helpful error messages for wrong syntax, trying to guess the place of the error. For example, assuming thistest.rb
:def foo [1, 2, 3].each { end
an attempt to run it with Ruby 3.1 produces:
test.rb:3: syntax error, unexpected `end'
while Ruby 3.2 produces:
Unmatched `{', missing `}' ? 23 def foo > 24 [1, 2, 3].each { 25 end test.rb:3: syntax error, unexpected `end' (SyntaxError)