Start on a manual - libzahl - big integer library

commit d6206bd03f351333ceb83ddb403e1f3ffb0bbfe4
parent e59c5e3120b30addac913d253160ef09c43bab27
Author: Mattias Andrée <maandree@kth.se>
Date:   Mon,  9 May 2016 05:21:33 +0200

Start on a manual

Signed-off-by: Mattias Andrée <maandree@kth.se>

Diffstat:
M .gitignore  | 4 ++++
M Makefile  | 15 +++++++++++++--
A doc/libzahl.tex  | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A doc/libzahls-design.tex  | 252 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A doc/what-is-libzahl.tex  | 174 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

5 files changed, 534 insertions(+), 2 deletions(-)
diff --git a/.gitignore b/.gitignore
@@ -20,3 +20,7 @@
 *.pdf
 *.ps
 *.dvi
+*.idx
+*.maf
+*.mtc*
+*.toc
diff --git a/Makefile b/Makefile
@@ -69,7 +69,13 @@ INLINE_FUN =\
 	zzero
 
 DOC =\
-	refsheet.pdf
+	refsheet.pdf\
+	libzahl.pdf
+
+TEXSRC =\
+	doc/libzahl.tex\
+	doc/what-is-libzahl.tex\
+	doc/libzahls-design.tex
 
 HDR_PUBLIC = zahl.h $(HDR_SEMIPUBLIC)
 HDR        = $(HDR_PUBLIC) $(HDR_PRIVATE)
@@ -150,6 +156,10 @@ refsheet.pdf: doc/refsheet.tex
 	pdflatex doc/refsheet.tex </dev/null
 	pdflatex doc/refsheet.tex </dev/null
 
+libzahl.pdf: $(TEXSRC)
+	pdflatex doc/libzahl.tex </dev/null
+	pdflatex doc/libzahl.tex </dev/null
+
 check: test
 	./test
 
@@ -182,7 +192,8 @@ uninstall:
 clean:
 	-rm -- *.o *.su *.a *.so test test-random.c 2>/dev/null
 	-rm -- benchmark benchmark-zrand benchmark-func 2>/dev/null
-	-rm -- *.aux *.log *.out 2>/dev/null
+	-rm -- *.aux *.log *.out *.idx *.maf *.mtc* *.toc 2>/dev/null
 	-rm -- refsheet.pdf refsheet.dvi refsheet.ps 2>/dev/null
+	-rm -- libzahl.pdf libzahl.dvi libzahl.ps 2>/dev/null
 
 .PHONY: all check clean install uninstall
diff --git a/doc/libzahl.tex b/doc/libzahl.tex
@@ -0,0 +1,91 @@
+\documentclass[11pt,b5paper,openright]{book}
+\special{papersize=176mm,250mm}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{algorithmic, algorithm, colonequals, alltt}
+\usepackage{amsmath, amssymb, mathtools, MnSymbol, mathrsfs, esvect}
+\usepackage{tipa, color, graphicx}
+\usepackage{microtype}
+\usepackage{shorttoc, minitoc}
+\usepackage[english]{babel}
+\selectlanguage{english}
+\definecolor{linkcolour}{RGB}{112, 0, 0}
+\definecolor{urlcolour}{RGB}{0, 0, 112}
+\usepackage[unicode,pdfencoding=auto]{hyperref}
+\hypersetup{
+  pdfborder={0 0 0},
+  colorlinks=true,
+  linkcolor=linkcolour,
+  urlcolor=urlcolour,
+  linktoc=all,
+  pdfsubject={Computer science},
+  pdfauthor={Mattias Andrée},
+  pdftitle={libzahl},
+  pdfkeywords={libzahl, big integer, big number, bigint, bignum, multiple-precision, arbitrary precision}
+}
+\hypersetup{linktocpage}
+\usepackage{makeidx}
+\makeindex
+\usepackage{geometry}
+\geometry{margin=1in}
+%\DisableLigatures{encoding = *, family = *}
+
+\newcommand{\chapref}[1]{\hyperref[#1]{Chapter~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}}}
+\newcommand{\secref}[1]{\hyperref[#1]{Section~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}}}
+\newcommand{\appxref}[1]{\hyperref[#1]{Appendix~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}}}
+\newcommand{\pchapref}[1]{(see \hyperref[#1]{Chapter~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}})}
+\newcommand{\psecref}[1]{(see \hyperref[#1]{Section~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}})}
+\newcommand{\pappxref}[1]{(see \hyperref[#1]{Appendix~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}})}
+\definecolor{c}{rgb}{0.45, 0.45, 0.45}
+
+\begin{document}
+
+\frontmatter
+
+\title{{\Huge \bf libzahl version 1.1}}
+\author{Mattias Andrée $\langle$\href{mailto:maandree@kth.se}{\texttt{maandree@kth.se}}$\rangle$}
+\date{\vspace{3in}}
+\maketitle
+
+\thispagestyle{empty}
+\null
+\vfill
+\noindent
+Copyright \copyright{} 2016 $~$  Mattias Andrée $\langle$\href{mailto:maandree@kth.se}{\texttt{maandree@kth.se}}$\rangle$
+\vspace{1ex}
+
+\noindent
+{\small
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this document and associated files (the ``Document''), to deal in
+the Document without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Document, and to permit persons to whom the Document is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Document.
+}
+\newpage
+
+\shorttoc{Short contents}{0}
+\setcounter{tocdepth}{2}
+\dominitoc
+\tableofcontents
+
+
+\mainmatter
+
+\input doc/what-is-libzahl.tex
+\input doc/libzahls-design.tex
+
+
+\appendix
+
+
+\backmatter
+
+\addcontentsline{toc}{chapter}{Index}
+\printindex
+
+\end{document}
diff --git a/doc/libzahls-design.tex b/doc/libzahls-design.tex
@@ -0,0 +1,252 @@
+\chapter{libzahl's design}
+\label{chap:libzahl's design}
+
+In this chapter, the design of libzahl is discussed.
+
+\vspace{1cm}
+\minitoc
+
+
+\newpage
+\section{Memory pool}
+\label{sec:Memory pool}
+
+Allocating memory dynamically is an expensive operation.
+To improve performance, libzahl never deallocates memory
+before the library is uninitialised, instead it pools
+memory, that is no longer needed, for reuse.
+
+Because of the memory pooling, this is a pattern to the
+allocation sizes. In an allocation, a power of two
+elements, plus a few elements that are discussed in
+\secref{sec:Integer structure}, are allocated. That is,
+the number multiplied the size of an element.
+Powers of two (growth factor 2) is not the most memory
+efficient way to do this, but it is the simplest and
+performance efficient. This power of two (sans the few
+extra elements) is used to calculate --- getting the index
+of the only set bit --- the index of the bucket in
+which the allocation is stored when pooled. The buckets
+are dynamic arrays with the growth factor 1.5. The
+growth factor 1.5 is often used for dynamic arrays, it
+is a good compromise between memory usage and performance.
+
+libzahl also avoids allocating memory by having a set
+of temporary variables predefined.
+
+
+\newpage
+\section{Error handling}
+\label{sec:Error handling}
+
+In C, it is traditional to return a sentiel value
+if case an error has occurred, and set the value
+of a globale variable to describe the error that
+has occurred. The programmer can choose whether to
+check for errors, ignore errors where it does not
+matter, or simple ignore errors all together and let
+the program eventual crash. This is a simple
+technique that gives the programmer a better
+understanding of what can happend. A great advantage
+C has over most programming languages.
+
+Another technique is to use long jumps on error.
+This technique is not too common, but is has one
+significant advantage. Error-checks need only be
+preformed where the error can first be detected.
+There is no need to check the return value at every
+function return. This leads to cleaner code, if
+there are many functions that can raise exceptional
+conditions, and greater performance under some
+conditions. This is why this technique is sometimes
+used in high-performance libraries. libzahl uses
+this technique.
+
+Rather than writing
+
+\begin{alltt}
+   if (zadd(a, b, c))
+       goto out;
+\end{alltt}
+
+\noindent
+or a but cleaner, if ther is a lot of calls,
+
+\begin{alltt}
+   #define TRY(...) do if (__VA_ARGS__) goto out; while (0)
+   \textcolor{c}{/* \textrm{\ldots} */}
+   TRY(zadd(a, b, c));
+\end{alltt}
+
+\noindent
+you write
+
+\begin{alltt}
+   jmp_buf env;
+   if (setjmp(env))
+       goto out;
+   zsetup(env);
+   \textcolor{c}{/* \textrm{\ldots} */}
+   zadd(a, b, c);
+\end{alltt}
+
+You only need to call {\tt setjmp} and {\tt zsetup}
+once, but can may update the return point by calling
+them once more.
+
+If you don't need to check for errors, you can
+disable error detection at compile-time. By defining
+the {\tt ZAHL\_UNSAFE} C preprocessor definition
+when compiling libzahl, and when compiling your
+software that uses libzahl.
+
+
+\newpage
+\section{Integer structure}
+\label{sec:Integer structure}
+
+The data type used to represent a big integer with
+libzahl is {\tt z\_t}, defined as
+
+\begin{alltt}
+   typedef struct zahl z_t[1];
+\end{alltt}
+
+\noindent
+where {\tt struct zahl} is defined as
+
+\begin{alltt}
+   struct zahl \{
+       int sign;
+       size_t used;
+       size_t alloced;
+       zahl_char_t *chars;
+   \};
+\end{alltt}
+
+\noindent
+where {\tt zahl\_char\_t} is defined as
+
+\begin{alltt}
+   typedef uint64_t zahl_char_t;
+\end{alltt}
+
+\noindent
+As a user, try not to think about anything else than
+
+\begin{alltt}
+   typedef \textcolor{c}{/* \textrm{ignore what is here} */} z_t[1];
+\end{alltt}
+
+\noindent
+details can change in future versions of libzahl.
+
+{\tt z\_t} is defined as a single-element array.
+This is often called a reference. There are some
+flexibility issues with this, why {\tt struct zahl}
+has beed added, but for most uses with big integers,
+it makes things simpler. Particularly, you need not
+work prepend {\tt \&} to variable when making function
+calls, but the existence of {\tt struct zahl} allows
+you do so if you so choose.
+
+The {\tt .sign} member, is either $-1$, 0, or 1,
+when the integer is negative, zero, or positive,
+respectively. Whenever, {\tt .sign} is 0, the value
+of {\tt .used} and {\tt .chars} are undefined.
+
+{\tt .used} holds to the number of elements used in
+{\tt .chars}, and {\tt .alloced} holds the allocation
+side of {\tt .chars} measured in elements minus a few
+extra elements that are always added to the allocation.
+{\tt .chars} is a little-endian array of 64-bit digits,
+these 64-bit digits are called `characters' in libzahl.
+{\tt .chars} holds the absolute value of the
+represented value.
+
+Unless {\tt .sign} is 0, {\tt .chars} always contains
+four extra elements, refered to as fluff. These are
+merely allocated so functions can assume that they can
+always manipulate groups of four characters, and need
+not care about cases where the number of characters is
+not a multiple of four. There are of course a few cases
+when the precise number of characters is important.
+
+
+\newpage
+\section{Parameters}
+\label{sec:Parameters}
+
+The general order of parameters in libzahl functions
+are: output integers, input integers, input data,
+output data, parametric values. For example, in
+addition, the out parameter is the first parameter.
+But for marshalling and unmarshalling the buffer
+is last. For random number generation the order is:
+output, device, distribution, distribution parameters.
+Whilst the distribution parameters are big integers,
+they are not considered input integers. The order
+of the input parameters are that of the order you
+would write them using mathematical notation, this
+also holds true if you include the output parameter
+(as long as there is exactly one output,) for example
+
+\vspace{1ex}
+$a \gets b^c \mod d$
+\vspace{1ex}
+
+\noindent
+is written
+
+\begin{verbatim}
+   zmodpow(a, b, c, d);
+\end{verbatim}
+
+\noindent
+or
+
+\begin{verbatim}
+   zmodpowu(a, b, c, d);
+\end{verbatim}
+
+Like any self respecting bignum library, libzahl
+supports using the same big integer reference as
+for output as input, as long as the all output
+parameters are unique to each other. For example
+
+\begin{alltt}
+   a += b;
+\end{alltt}
+
+\noindent
+or
+
+\begin{alltt}
+   a = a + b;
+\end{alltt}
+
+\noindent
+is written, using libzahl, as
+
+\begin{alltt}
+   zadd(a, a, b);
+\end{alltt}
+
+For commutative functions, like {\tt zadd}, the
+implementation is optimised to assume that this
+order is more likely to be used than the alternative.
+That is, you should, for example, write
+
+\begin{alltt}
+   zadd(a, a, b);
+\end{alltt}
+
+\noindent
+rather than
+
+\begin{alltt}
+   zadd(a, b, a);
+\end{alltt}
+
+This is assumption is not made for non-commutative
+functions.
diff --git a/doc/what-is-libzahl.tex b/doc/what-is-libzahl.tex
@@ -0,0 +1,174 @@
+\chapter{What is libzahl?}
+\label{chap:What is libzahl?}
+
+In this chapter, it is discussed what libzahl is,
+why it is called libzahl, why it exists, why
+you should use it, what makes it different, and
+what is its limitations.
+
+\vspace{1cm}
+\minitoc
+
+
+\newpage
+\section{The name and the what}
+\label{sec:The name and the what}
+
+In mathematics, the set of all integers is represented
+by a bold uppercase `Z' ({\bf Z}), or sometimes
+double-stroked (blackboard bold) ($\mathbb{Z}$). This symbol
+is derived from the german word for integers: `Zahlen'
+[\textprimstress{}tsa\textlengthmark{}l\textschwa{}n],
+whose singular is `Zahl' [tsa\textlengthmark{}l]. libzahl
+[l\textsecstress{}\textsci{}b\textprimstress{}tsa\textlengthmark{}l]
+is a C library capable of representing very large integers,
+limited by the memory address space and available memory.
+Whilst this is almost none of the elements in {\bf Z},
+it is substantially more than available using the intrinsic
+integer types in C. libzahl of course also implements
+functions for performing arithmetic operations over
+integers represented using  libzahl. Libraries such as
+libzahl as called bigint libraries, big integer libraries,
+multiple precision integer libraries, arbitrary precision
+integer libraries,\footnote{`Multiple precision integer'
+and `arbitrary precision integer' are misnomers, precision
+is only relevant for floating-point numbers.} or bignum
+libraries, or any of the previous with `number' substituted
+for `integer'. Some libraries that refer to themself bignum
+libraries or any of using the word `number' support other
+number types than integers. libzahl only supports integers.
+
+
+\newpage
+\section{Why does it exist?}
+\label{sec:Why does it exist?}
+
+libzahl's main competitors are GNU MP (gmp),\footnote{GNU
+Multiple Precision Arithmetic Library} LibTomMath (ltm),
+and TomsFastMath (tfm). All of these have problems:
+
+\begin{itemize}
+\item
+GNU MP is extremely bloated, can only be complied with
+GCC, can only be dynamically linked, and requires glibc.
+Additionally, whilst its performance is generally good,
+it can still be improved. Furthermore, GNU MP cannot be
+used for robust applications.
+
+\item
+LibTomMath is very slow, infact performance is not its
+priority, rather its simplicit is the priority. Despite
+this, it is not really that simple.
+
+\item
+TomsFastMath is slow, complicated, and is not a true
+big integer library and is specifically targeted at
+cryptography.
+\end{itemize}
+
+libzahl is developed under the suckless.org umbrella.
+As such, it attempts to follow the suckless
+philosophy.\footnote{\href{http://suckless.org/philosophy}
+{http://suckless.org/philosophy}} libzahl is simple,
+very fast, simple to use, and can be used in robust
+applications. Currently however, it does not support
+multithreading, but it has better support multiprocessing
+and distributed computing than its competitor.
+
+
+\newpage
+\section{How is it different?}
+\label{sec:How is it different?}
+
+All big number libraries have in common that both input
+and output integers are parameters for the functions.
+There are however two variants of this: input parameters
+followed by output parameters, and output parameters
+followed by input parameters. The former variant is the
+conventional for C functions. The latter is more in style
+with primitive operations, pseudo-code, mathematics, and
+how it would look if the output was return. In libzahl,
+the latter convention is used. That is, why write
+
+\begin{alltt}
+   zadd(sum, augend, addend);
+\end{alltt}
+
+\noindent
+rather than
+
+\begin{alltt}
+   zadd(augend, addend, sum);
+\end{alltt}
+
+\noindent
+This can be compared to
+
+\vspace{1ex}
+$sum \gets augend + addend$
+\vspace{1ex}
+
+\noindent
+versus
+
+\vspace{1ex}
+$augend + addend \rightarrow sum$.
+\vspace{1ex}
+
+libzahl, GNU MP, and Hebimath uses the output-first
+convention. LibTomMath and TomsFastMath uses the
+input-first convention.
+
+Unlike other bignum libraries, errors in libzahl are
+caught using {\tt setjmp}. This ensure that it can be
+used in robust applications, catching errors does not
+become a mess, and it minimises the overhead of
+catching errors. Errors are only checked when they can
+occur, not also after each function-return.
+
+Additionally, libzahl tries to keep the functions'
+names simple and natural rather than techniqual or
+mathematical. The names resemble those of the standard
+integer operators.
+
+
+\newpage
+\section{Limitations}
+\label{sec:Limitations}
+
+libzahl is not recommended for cryptographic
+applications, it is not mature enough, and its author
+does not have the necessary expertise. And in,
+particular it does not implement constant time
+operations. Additionally, libzahl is not thread-safe.
+
+libzahl is also only designed for POSIX systems.
+It will probably run just fine on any modern
+system. But it makes some assumption that POSIX
+stipulates or are unpractical not to implement
+for machines that should support POSIX (or even
+support modern software):
+
+\begin{itemize}
+\item
+Bytes are octets.
+
+\item
+There is an integer type that is 64-bits wide.
+(The compiler needs to support it, but it is not
+strictly necessary for it to be an CPU-intrinsic,
+but that would be favourable for performance.)
+
+\item
+Two's complement is used.
+(The compiler needs to support it, but it is not
+strictly necessary for it to be an CPU-intrinsic,
+but that would be favourable for performance.)
+\end{itemize}
+
+These limitations may be removed later. And there
+is some code that does not make these assumptions
+but acknowledge that it may be a case. On the other
+hand, these limitations could be fixed, and agnostic
+code could be rewritten to assume that these
+restrictions are met.

	libzahl big integer library
	git clone git://git.suckless.org/libzahl
	Log \| Files \| Refs \| README \| LICENSE

M	.gitignore	\|	4	++++
M	Makefile	\|	15	+++++++++++++--
A	doc/libzahl.tex	\|	91	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	doc/libzahls-design.tex	\|	252	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	doc/what-is-libzahl.tex	\|	174	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++