JavaScript, Ruby and C are not call by reference

The difference between call by value, call by reference and a recap of call by sharing.

Written by
Derk-Jan Karrenbeld
This is the main author for this article.
Published on
6/27/2019
updated: 6/28/2019
Languages
We've used these programming languages.
1. JavaScript
2. Ruby
3. C
4. PHP
License
CC BY-NC-SA 4.0
This license applies to the textual content of the article. The images might have their own license.

🛑 This article is a response to various articles in the wild which state that JavaScript and Ruby are "Call/Pass by reference" for objects and "Call/Pass by value" for primitives.

Many of these articles provide a lot of valuable information and this article is not to unequivically say that those articles should not have been written or are useless. Instead, this article attempts to explore the semantic, yet pedantic, meanings and definitions of

call by reference

pass a reference

reference type

reference

First, I would like to make a few statements, after which Ill try to explore what these statements actually mean and why I've made them, contrary to various articles in the wild.

☕ When you see this emoji (☕), I try to give a non-code analogy to help you better understand what's going on. These abstractions are pretty leaky and might not hold up, but they're only meant in the context of the paragraphs that surround them. Take them with a grain of salt.

Black and yellow metal signage beside green grasses during daytime, in Yangmingshan, Taipei, Taiwan — Photo by Treddy Chen on Unsplash

Statements

JavaScript is always call by value.
Ruby is always call by value.
C is always call by value.
The terminology is confusing and perhaps even flawed.
The terminology only applies to function (procedure) parameters.
Pointers are an implementation detail and their presence don't say anything about the evaluation of function parameters.

History and Definitions

I've tried to look up the origins of the terms as mentioned above, and there is quite a bit of literature out there from the earlier programming languages.

The Main Features of CPL (D. W. Barron et al., 1963):

Three modes of parameter call are possible; call by value (which is equivalent to the ALGOL call by value), call by substitution (equivalent to ALGOL call by name), and call by reference. In the latter case, the LH value of the actual parameter is handed over; this corresponds to the "call by simple name" suggested by Strachey and Wilkes (1961).

It is important to note that here the literature talks about mode of parameter call. It further distinguishes three modes: call by value, call by name and call by reference.

Further literature gives a good, yet technical, definition of these three and a fourth strategy (namely copy restore), as published in the Semantic Models of Parameter Passing (Richard E. Fairly, 1973). I've quoted 2 of the 4 definitions below, after which I'll break them down and explain what they mean in more visual terms.

Call by Value

[...] Call by Value parameter requires that the actual parameter be evaluated at the time of the procedure call. The memory register associated with the formal parameter is then initialized to this value, and references to the formal parameter in the procedure body are treated as references to the local memory register in which the initial value of the actual parameter was stored. Due to the fact that a copy of the value associated with the actual parameter is copied into the local memory register, transformations on the parameter value within the procedure body are isolated from the actual parameter value. Because of this isolation of values, Call by value can not be used to communicate calculated values back to the calling program.

Roughly, this means that a parameter is, before the function (procedure) is called, completely evaluated. The resulting value (from that evaluation), is then assigned to the identifier inside the function (formal parameter). In many programming languages this is done by copying the value to a second memory address, making the changes inside the function (procedure body) isolated to that function.

In other words: the original memory address' contents (the one used to store the evaluated expression before passing it into the function) can not be changed by code inside the function and changes inside the function to the value are not propagated to the caller.

☕ When you order a coffee and someone asks for your name, they might write it down incorrectly. This doesn't affect your actual name and the change is only propagated to the cup.

Call by Reference

[...] In Call by Reference, the address (name) of the actual parameter at the time of the procedure call is passed to the procedure as the value to be associated with the corresponding formal parameter. References to the formal parameter in the procedure body result in indirect addressing references through the formal parameter register to the memory register associated with the actual parameter in the calling procedure. Thus, transformations of formal parameter values are immediately transmitted to the calling procedure, because both the actual parameter and the formal parameter refer to the same register.

Roughly, this means that, just like before, the parameter is evaluated, but, unlike before, the memory address (address / name) is passed to the function (procedure). Changes made to the parameter inside the function (formal parameter) are actually made on the memory address and therefore propagate back to the caller.

☕ When you go to a support store for one of your hardware devices and ask for it to be fixed, they might give you a replacement device. This replacement device is still yours, you own it just like before, but it might not be the exact same one you gave to be fixed.

Reference (and value) types

This is not the complete picture. There is one vital part left that causes most of the confusion. Right now I'll explain what a reference type is, which has nothing to do with arguments/parameters or function calls.

Reference types and value types are usually explained in the context of how a programming language stores values inside the memory, which also explains why some languages choose to have both, but this entire concept is worthy of (a series of) articles on its own. The wikipedia page is, in my opinion, not very informative, but it does refer to various language specs that do go into technical detail.

A data type is a value type if it holds a data value within its own memory space. It means variables of these data types directly contain their values.

Unlike value types, a reference type doesn't store its value directly. Instead, it stores the address where the value is being stored.

In short, a reference type is a type that points to a value somewhere in memory whereas a value type is a type that directly points to its value.

☕ When you make a payment online, and enter your bank account number details, for example your card number, the card itself can not be changed. However, the bank account's balance will be affected. You can see your card as a reference to your balance (and multiple cards can all reference the same balance).

☕ When you pay offline, that is with cash, the money leaves your wallet. Your wallet holds its own value, just like the cash inside your wallet. The value is directly where the wallet/cash is.

Show me the code proof

function reference_assignment(myRefMaybe) {
  myRefMaybe = { key: 42 };
}

var primitiveValue = 1;
var someObject = { is: 'changed?' };

reference_assignment(primitiveValue);
primitiveValue;
// => 1

reference_assignment(someObject);
// => { is: 'changed?' }

As shown above, someObject has not been changed, because it was not a reference to someObject. In terms of the definitions before: it was not the memory address of someObject that was passed, but a copy.

A language that does support pass by reference is PHP, but it requires special syntax to change from the default of passing by value:

function change_reference_value(&$actually_a_reference)
{
    $actually_a_reference = $actually_a_reference + 1;
}

$value = 41;
change_reference_value($value);
// => $value equals 42

I tried to keep the same sort of semantic as the JS code.

As you can see, the PHP example actually changes the value the input argument refers to. This is because the memory address of $value can be accessed by the parameter $actually_a_reference.

What's wrong with the nomenclature?

Reference types and "boxed values" make this more confusing and also why I believe that the nomenclature is perhaps flawed.

The term call-by-value is problematic. In JavaScript and Ruby, the value that is passed is a reference. That means that, indeed, the reference to the boxed primitive is copied, and therefore changing a primitive inside a function doesn't affect the primitive on the outside. That also means that, indeed, the reference to a reference type, such as an Array or Object, is copied and passed as the value.

Because reference types refer to their value, copying a reference type makes the copy still refer to that value. This is also what you experience as shallow copy instead of deep copy/clone.

Whoah. Okay. Here is an example that explores both these concepts:

function appendOne(list) {
  list.push(1);
}

function replaceWithOne(list) {
  list = [];
}

const first = [];
const second = [];

appendOne(first);
first;
// => [1]

replaceWithOne(second);
second;
// => []

In the first example it outputs [1], because the push method modifies the object on which it is called (the object is referenced from the name list). This propagates because the list argument still refers to the original object first (its reference was copied and passed as a value. list points to that copy, but points to the same data in memory, because Object is a reference type).

In the second example it outputs [] because the re-assignment doesn't propagate to the caller. In the end it is not re-assigning the original reference but only a copy.

Here is another way to write this down. 👉🏽 indicates a reference to a different location in memory.

first_array   = []
second_array  = []

first         = 👉🏽 first_array
list          = copy(first) = 👉🏽 first_array
list.push     = (👉🏽 first_array).push(...)

// => (👉🏽 first_array) was changed

second        = 👉🏽 second_array
list          = copy(second) = 👉🏽 second_array
replace_array = []
list          = 👉🏽 replace_array

// => (👉🏽 second_array) was not changed

What about pointers?

C is also always pass by value / call by value, but it allows you to pass a pointer which can simulate pass by reference. Pointers are implementation details, and for example used in C# to enable pass by reference.

In C, however, pointers are reference types! The syntax *pointer allows you to follow the pointer to its reference. In the comments in this code I tried to explain what is going on under the hood.

void modifyParameters(int value, int* pointerA, int* pointerB) {
    // passed by value: only the local parameter is modified
    value = 42;

     // passed by value or "reference", check call site to determine which
    *pointerA = 42;

    // passed by value or "reference", check call site to determine which
    *pointerB = 42;
}

int main() {
    int first = 1;
    int second = 2;
    int random = 100;
    int* third = &random;

    // "first" is passed by value, which is the default
    // "second" is passed by reference by creating a pointer,
    //         the pointer is passed by value, but it is followed when
    //         using *pointerA, and thus this is like passing a reference.
    // "third" is passed by value. However, it's a pointer and that pointer
    //         is followed when using *pointerB, and thus this is like
    //         passing a reference.
    modifyParameters(first, &second, third);

    // "first" is still 1
    // "second" is now 42
    // "random" is now 42
    // "third" is still a pointer to "random" (unchanged)
    return 0;
}

The lesser used and known term that was coined is Call by sharing which applies to Ruby, JavaScript, Python, Java and so forth. It implies that all values are object, all values are boxed, and they copy a reference when they pass it as value. Unfortunately, in literature, the usage of this concept is not consistent, which is also why it's probably less known or used.

For the purpose of this article, call-by-sharing is call by value, but the value is always a reference.

Conclusion

In short: It's always pass by value, but the value of the variable is a reference. All primitive-methods return a new value and thus one can not modify it, all objects and arrays can have methods that modified their value, and thus one can modify it.

You can not affect the memory address of the parameter directly in the languages that use call-by-value, but you may affect what the parameter refers to. That is, you may affect the memory the parameter points to.

The statement Primitive Data Types are passed By Value and Objects are passed By Reference. is incorrect.

Photo of the Centrale Bibliotheek in Rotterdam, The Netherlands: an industrial looking building with metallic walls and various yellow pipes on the side. — Photo by Boudewijn Huysmans on Unsplash

Written by

Published on

Languages

License