Atomic
Indivisible. An instruction sequence
executed by a strand is atomic if it
appears at any moment to any other
strand as if either no instructions in the
sequence have been executed or all
instructions in the sequence have been
executed.
Chip multiprocessor
A general-purpose multiprocessor
implemented as a single multicore chip.
Cilk
An extension to the C and C++
programming languages that allows a
programmer to express task parallelism.
Important milestones in the history of Cilk include
Cilk-1, Cilk-5,
Cilk++, Cilk Plus,
and OpenCilk.
Cilk Plus
An important milestone in the history of Cilk
developed by Intel Corporation, Cilk Plus extended Cilk++
with transparent interoperability with legacy C/C++ binary executables.
Cilk++
An important milestone in the history of Cilk
developed by CilkArts, Inc., Cilk++ extended Cilk-5
to C++ and introduced reducer hyperobjects as an efficient means for resolving races on nonlocal variables.
Cilk-1
The first important milestone in the history of Cilk,
Cilk-1 was developed at MIT and provided provably efficient work-stealing runtime support but little linguistic support.
Cilk-5
The next important milestone after Cilk-1
in the history of Cilk,
Cilk-5 was developed at MIT and provided simple
linguistic extensions to ANSI C for multithreading.
Cilksan
The Cilksan race detector is a tool
provided in OpenCilk for
finding race condition defects in Cilk code.
cilk_for
A keyword in the Cilk language that
indicates a for
loop whose iterations
can be executed independently in
parallel.
cilk_scope
A keyword in the Cilk language that indicates that all spawned children within the scoped region must complete before proceeding.
cilk_spawn
A keyword in the Cilk language that
indicates that the named subroutine can
execute independently and in parallel
with the caller.
cilk_sync
A keyword in the Cilk language that
indicates that all functions spawned
within the current function must complete
before statements following the
cilk_sync
can be executed.
Commutative operation
An operation over a type is
commutative if for
any two objects and of type .
Integer addition and set union are
commutative, but string concatenation is
not.
Concurrent agent
A processor, process, thread, strand, or other entity that executes a program
instruction sequence in a computing
environment containing other such
entities.
Core
A single processor unit of a multicore
chip. The terms "processor" and "CPU"
are often used in place of "core,"
although industry usage varies.
Archaic: A solid-state memory made of
magnetized toroidal memory elements.
CPU
Central Processing Unit. We use this
term as a synonym for "core," or a single
processor of a multicore chip.
Critical section
The code executed by a strand while holding a lock.
Critical-path length
See span.
Data race
A race condition that occurs when two or
more parallel strands, holding no lock in
common, access the same memory
location and at least one of the strands
performs a write. Compare with
determinacy race.
Deadlock
A situation when two or more strands
are each waiting for another to
release a resource, and the "waiting-for"
relation forms a cycle so that none can
ever proceed.
Determinacy race
A race condition that occurs when two
logically parallel strands access the same
memory location and at least one strand
performs a write.
Determinism
The property of a program when it
behaves identically from run to run when
executed on the same inputs.
Deterministic programs are usually
easier to debug.
Distributed memory
Computer storage that is partitioned
among several processors. A distributed-memory multiprocessor is a computer in
which processors must send messages
to remote processors to access data in
remote processor memory. Contrast with
shared memory.
Execution time
How long a program takes to execute on a given computer system.
Also called running time.
Fake lock
A construct that Cilksan
treats as
a lock but which behaves like a no-op
during actual running of the program. A
fake lock can be used to suppress the
reporting of an intentional race condition.
False sharing
The situation that occurs when two
strands access different memory
locations residing on the same cache
block, thereby contending for the cache
block. For more information, see the
Wikipedia entry
https://en.wikipedia.org/wiki/False_sharing.
Global variable
A variable that is bound outside of all local scopes.
See also nonlocal variable.
Hyperobject
A linguistic construct supported by the
OpenCilk runtime system
that allows many strands
to coordinate in updating a shared
variable or data structure independently
by providing different views
of the hyperobject to different strands at
the same time. The reducer is the only
hyperobject currently provided by OpenCilk.
Instruction
A single operation executed by a processor.
Knot
A point at which the end of one strand
meets the end of another. If a knot has
one incoming strand and one outgoing
strand, it is a serial knot. If it has one
incoming strand and two outgoing
strands, it is a spawn knot. If it has
multiple incoming strands and one
outgoing strand, it is a sync knot. A
Cilk program does not produce serial
knots or knots with both multiple
incoming and multiple outgoing strands.
Linear speedup
Speedup proportional to the processor count.
See also perfect linear speedup.
Lock
A synchronization mechanism for
providing atomic operation by limiting
concurrent access to a resource.
Important operations on locks include
acquire (lock) and release (unlock).
Many locks are implemented as a mutex,
whereby only one strand can hold the
lock at any time.
Lock contention
The situation wherein multiple strands vie for the same lock.
Multicore
A semiconductor chip containing more than one processor core.
Multiprocessor
A computer containing multiple general-purpose processors.
Mutex
A "mutually exclusive" lock that only one
strand can acquire at a time, thereby
ensuring that only one strand executes
the critical section protected by the
mutex at a time.
For example, Linux* OS supports Pthreads pthread_mutex_t
objects.
Nondeterminism
The property of a program when it
behaves differently from run to run when
executed on exactly the same inputs.
Nondeterministic programs are usually
hard to debug.
Nonlocal variable
A program variable that is bound outside
of the scope of the function, method, or
class in which it is used. In Cilk
programs, we also use this term to refer
to variables with a scope outside a
cilk_for
loop.
OpenCilk
A task-parallel programming platform for multicore computers based on Cilk technology.
Parallel loop
A for
loop all of whose iterations can be
run independently in parallel. The
cilk_for
keyword designates a parallel
loop.
Parallelism
The ratio of work to span, which is the
largest speedup an application could
possibly attain when run on an infinite
number of processors.
Perfect linear speedup
Speedup equal to the processor count.
See also linear speedup.
Process
A self-contained concurrent agent that by
default executes a serial chain of
instructions. More than one thread may
run within a process, but a process does
not usually share memory with other
processes. Scheduling of processes is
typically managed by the operating system.
Processor
A processor implements the logic to
execute program instructions
sequentially; we use the term "core" as a
synonym. This document does not use
the term "processor" to refer to multiple
processing units on the same or multiple
chips, although other documents may
use the term that way.
Race condition
A source of nondeterminism whereby the
result of a concurrent computation
depends on the timing or relative order of
the execution of instructions in each
individual strand.
Receiver
A variable to receive the result of a function call.
Reducer
A hyperobject with a defined (usually
associative) reduce()
binary operator
which the OpenCilk runtime system uses to
combine the each view of each separate
strand.
A reducer must have two methods:
- A default constructor which initializes the
reducer to its identity value
- A
reduce()
method which merges the
value of right reducer into the left (this)
reducer.
Response time
The time it takes to execute a
computation from the time a human user
provides an input to the time the user
gets the result.
Running time
How long a program takes to execute on a given computer system.
Also called execution time.
Scale down
The ability of a parallel application to run efficiently on one
or a small number of processors.
Scale up
The ability of a parallel application to run efficiently
on a large number of processors.
See also linear speedup.
Sequential consistency
The memory model for concurrency
wherein the effect of concurrent agents is
as if their operations on shared memory
were interleaved in a global order
consistent with the orders in which each
agent executed them. This model was
advanced in 1976 by Leslie Lamport.
Serial execution
Execution of the serial projection of a Cilk program.
Serial projection
The C or C++ program that results from
stubbing out the keywords of a Cilk
program, where cilk_spawn
, cilk_scope
, and
cilk_sync
are elided and cilk_for
is
replaced with an ordinary for
. The
serial projection can be used for debugging
and, in the case of a converted C/C++
program, will behave exactly as the
original C/C++ program. The terms "serialization" and "serial elision" are used in some of the literature.
Also, see "serial semantics".
Serial semantics
The behavior of a Cilk program when executed as the
serial projection of the program.
See the following article:
Four Reasons Why Parallel Programs Should Have Serial Semantics.
Serialization
See serial projection.
Shared memory
Computer storage that is shared among
several processors. A shared-memory
multiprocessor is a computer in which
each processor can directly address any
memory location. Contrast with
distributed memory.
Span
The theoretically fastest execution time
for a parallel program when run on an
infinite number of processors,
discounting overheads for
communication and scheduling. Often
denoted by in the literature, and
sometimes called critical-path length.
Spawn
To call a function without waiting for it to
return, as in a normal call. The caller can
continue to execute in parallel with the
called function. See also cilk_spawn.
Speedup
How many times faster a program is
when run in parallel than when run on
one processor. Speedup can be
computed by dividing the running time
of the program on processors by its
running time on one processor.
Strand
A serial chain of executed instructions without any parallel
control (such as a spawn, sync, return
from a spawn, etc.)
Sync
To wait for a set of spawned functions to
return before proceeding. The current
function is dependent upon the spawned
functions and cannot proceed in parallel
with them. See also cilk_sync.
Task-parallel
Task-parallel platforms provide a layer of software on top of threads to coordinate, schedule, and manage the processors of a multicore. Some task-parallel platforms are built as runtime libraries, but others provide full-fledged parallel languages with compiler and runtime support.
Task-parallel programming allows parallelism to be specified in a "processor-oblivious" fashion, where the programmer identifies what computational tasks may run in parallel but does not indicate which thread or processor performs the task. Thus, the programmer is freed from worrying about communication protocols, load balancing, and other vagaries of thread programming. The task-parallel platform contains a scheduler, which automatically load-balances the tasks across the processors, thereby greatly simplifying the programmer’s chore. Task-parallel algorithms provide a natural extension to ordinary serial algorithms, allowing performance to be reasoned about mathematically using "work/span analysis."
Thread
A thread executes a serial instruction chain.
Scheduling of threads is typically managed by the operating
system.
Throughput
A number of operations performed per unit time.
View
The state of a hyperobject as seen by a given strand.
Work
The running time of a program when run on one processor,
sometimes denoted by .
Work stealing
A scheduling strategy where processors
post parallel work locally and, when a
processor runs out of local work, it steals
work from another processor. Work-stealing schedulers are notable for their efficiency, because they incur no
communication or synchronization
overhead when there is ample
parallelism. The OpenCilk runtime system
employs a work-stealing scheduler.
Worker
A thread that, together with other workers,
implements the OpenCilk runtime system's work stealing scheduler.