libzahl

big integer library
git clone git://git.suckless.org/libzahl
Log | Files | Refs | README | LICENSE

commit d6206bd03f351333ceb83ddb403e1f3ffb0bbfe4
parent e59c5e3120b30addac913d253160ef09c43bab27
Author: Mattias Andrée <maandree@kth.se>
Date:   Mon,  9 May 2016 05:21:33 +0200

Start on a manual

Signed-off-by: Mattias Andrée <maandree@kth.se>

Diffstat:
M.gitignore | 4++++
MMakefile | 15+++++++++++++--
Adoc/libzahl.tex | 91+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adoc/libzahls-design.tex | 252+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adoc/what-is-libzahl.tex | 174+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 534 insertions(+), 2 deletions(-)

diff --git a/.gitignore b/.gitignore @@ -20,3 +20,7 @@ *.pdf *.ps *.dvi +*.idx +*.maf +*.mtc* +*.toc diff --git a/Makefile b/Makefile @@ -69,7 +69,13 @@ INLINE_FUN =\ zzero DOC =\ - refsheet.pdf + refsheet.pdf\ + libzahl.pdf + +TEXSRC =\ + doc/libzahl.tex\ + doc/what-is-libzahl.tex\ + doc/libzahls-design.tex HDR_PUBLIC = zahl.h $(HDR_SEMIPUBLIC) HDR = $(HDR_PUBLIC) $(HDR_PRIVATE) @@ -150,6 +156,10 @@ refsheet.pdf: doc/refsheet.tex pdflatex doc/refsheet.tex </dev/null pdflatex doc/refsheet.tex </dev/null +libzahl.pdf: $(TEXSRC) + pdflatex doc/libzahl.tex </dev/null + pdflatex doc/libzahl.tex </dev/null + check: test ./test @@ -182,7 +192,8 @@ uninstall: clean: -rm -- *.o *.su *.a *.so test test-random.c 2>/dev/null -rm -- benchmark benchmark-zrand benchmark-func 2>/dev/null - -rm -- *.aux *.log *.out 2>/dev/null + -rm -- *.aux *.log *.out *.idx *.maf *.mtc* *.toc 2>/dev/null -rm -- refsheet.pdf refsheet.dvi refsheet.ps 2>/dev/null + -rm -- libzahl.pdf libzahl.dvi libzahl.ps 2>/dev/null .PHONY: all check clean install uninstall diff --git a/doc/libzahl.tex b/doc/libzahl.tex @@ -0,0 +1,91 @@ +\documentclass[11pt,b5paper,openright]{book} +\special{papersize=176mm,250mm} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{algorithmic, algorithm, colonequals, alltt} +\usepackage{amsmath, amssymb, mathtools, MnSymbol, mathrsfs, esvect} +\usepackage{tipa, color, graphicx} +\usepackage{microtype} +\usepackage{shorttoc, minitoc} +\usepackage[english]{babel} +\selectlanguage{english} +\definecolor{linkcolour}{RGB}{112, 0, 0} +\definecolor{urlcolour}{RGB}{0, 0, 112} +\usepackage[unicode,pdfencoding=auto]{hyperref} +\hypersetup{ + pdfborder={0 0 0}, + colorlinks=true, + linkcolor=linkcolour, + urlcolor=urlcolour, + linktoc=all, + pdfsubject={Computer science}, + pdfauthor={Mattias Andrée}, + pdftitle={libzahl}, + pdfkeywords={libzahl, big integer, big number, bigint, bignum, multiple-precision, arbitrary precision} +} +\hypersetup{linktocpage} +\usepackage{makeidx} +\makeindex +\usepackage{geometry} +\geometry{margin=1in} +%\DisableLigatures{encoding = *, family = *} + +\newcommand{\chapref}[1]{\hyperref[#1]{Chapter~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}}} +\newcommand{\secref}[1]{\hyperref[#1]{Section~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}}} +\newcommand{\appxref}[1]{\hyperref[#1]{Appendix~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}}} +\newcommand{\pchapref}[1]{(see \hyperref[#1]{Chapter~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}})} +\newcommand{\psecref}[1]{(see \hyperref[#1]{Section~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}})} +\newcommand{\pappxref}[1]{(see \hyperref[#1]{Appendix~\ref*{#1} [\nameref*{#1}], page \pageref*{#1}})} +\definecolor{c}{rgb}{0.45, 0.45, 0.45} + +\begin{document} + +\frontmatter + +\title{{\Huge \bf libzahl version 1.1}} +\author{Mattias Andrée $\langle$\href{mailto:maandree@kth.se}{\texttt{maandree@kth.se}}$\rangle$} +\date{\vspace{3in}} +\maketitle + +\thispagestyle{empty} +\null +\vfill +\noindent +Copyright \copyright{} 2016 $~$ Mattias Andrée $\langle$\href{mailto:maandree@kth.se}{\texttt{maandree@kth.se}}$\rangle$ +\vspace{1ex} + +\noindent +{\small +Permission is hereby granted, free of charge, to any person obtaining a +copy of this document and associated files (the ``Document''), to deal in +the Document without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Document, and to permit persons to whom the Document is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Document. +} +\newpage + +\shorttoc{Short contents}{0} +\setcounter{tocdepth}{2} +\dominitoc +\tableofcontents + + +\mainmatter + +\input doc/what-is-libzahl.tex +\input doc/libzahls-design.tex + + +\appendix + + +\backmatter + +\addcontentsline{toc}{chapter}{Index} +\printindex + +\end{document} diff --git a/doc/libzahls-design.tex b/doc/libzahls-design.tex @@ -0,0 +1,252 @@ +\chapter{libzahl's design} +\label{chap:libzahl's design} + +In this chapter, the design of libzahl is discussed. + +\vspace{1cm} +\minitoc + + +\newpage +\section{Memory pool} +\label{sec:Memory pool} + +Allocating memory dynamically is an expensive operation. +To improve performance, libzahl never deallocates memory +before the library is uninitialised, instead it pools +memory, that is no longer needed, for reuse. + +Because of the memory pooling, this is a pattern to the +allocation sizes. In an allocation, a power of two +elements, plus a few elements that are discussed in +\secref{sec:Integer structure}, are allocated. That is, +the number multiplied the size of an element. +Powers of two (growth factor 2) is not the most memory +efficient way to do this, but it is the simplest and +performance efficient. This power of two (sans the few +extra elements) is used to calculate --- getting the index +of the only set bit --- the index of the bucket in +which the allocation is stored when pooled. The buckets +are dynamic arrays with the growth factor 1.5. The +growth factor 1.5 is often used for dynamic arrays, it +is a good compromise between memory usage and performance. + +libzahl also avoids allocating memory by having a set +of temporary variables predefined. + + +\newpage +\section{Error handling} +\label{sec:Error handling} + +In C, it is traditional to return a sentiel value +if case an error has occurred, and set the value +of a globale variable to describe the error that +has occurred. The programmer can choose whether to +check for errors, ignore errors where it does not +matter, or simple ignore errors all together and let +the program eventual crash. This is a simple +technique that gives the programmer a better +understanding of what can happend. A great advantage +C has over most programming languages. + +Another technique is to use long jumps on error. +This technique is not too common, but is has one +significant advantage. Error-checks need only be +preformed where the error can first be detected. +There is no need to check the return value at every +function return. This leads to cleaner code, if +there are many functions that can raise exceptional +conditions, and greater performance under some +conditions. This is why this technique is sometimes +used in high-performance libraries. libzahl uses +this technique. + +Rather than writing + +\begin{alltt} + if (zadd(a, b, c)) + goto out; +\end{alltt} + +\noindent +or a but cleaner, if ther is a lot of calls, + +\begin{alltt} + #define TRY(...) do if (__VA_ARGS__) goto out; while (0) + \textcolor{c}{/* \textrm{\ldots} */} + TRY(zadd(a, b, c)); +\end{alltt} + +\noindent +you write + +\begin{alltt} + jmp_buf env; + if (setjmp(env)) + goto out; + zsetup(env); + \textcolor{c}{/* \textrm{\ldots} */} + zadd(a, b, c); +\end{alltt} + +You only need to call {\tt setjmp} and {\tt zsetup} +once, but can may update the return point by calling +them once more. + +If you don't need to check for errors, you can +disable error detection at compile-time. By defining +the {\tt ZAHL\_UNSAFE} C preprocessor definition +when compiling libzahl, and when compiling your +software that uses libzahl. + + +\newpage +\section{Integer structure} +\label{sec:Integer structure} + +The data type used to represent a big integer with +libzahl is {\tt z\_t}, defined as + +\begin{alltt} + typedef struct zahl z_t[1]; +\end{alltt} + +\noindent +where {\tt struct zahl} is defined as + +\begin{alltt} + struct zahl \{ + int sign; + size_t used; + size_t alloced; + zahl_char_t *chars; + \}; +\end{alltt} + +\noindent +where {\tt zahl\_char\_t} is defined as + +\begin{alltt} + typedef uint64_t zahl_char_t; +\end{alltt} + +\noindent +As a user, try not to think about anything else than + +\begin{alltt} + typedef \textcolor{c}{/* \textrm{ignore what is here} */} z_t[1]; +\end{alltt} + +\noindent +details can change in future versions of libzahl. + +{\tt z\_t} is defined as a single-element array. +This is often called a reference. There are some +flexibility issues with this, why {\tt struct zahl} +has beed added, but for most uses with big integers, +it makes things simpler. Particularly, you need not +work prepend {\tt \&} to variable when making function +calls, but the existence of {\tt struct zahl} allows +you do so if you so choose. + +The {\tt .sign} member, is either $-1$, 0, or 1, +when the integer is negative, zero, or positive, +respectively. Whenever, {\tt .sign} is 0, the value +of {\tt .used} and {\tt .chars} are undefined. + +{\tt .used} holds to the number of elements used in +{\tt .chars}, and {\tt .alloced} holds the allocation +side of {\tt .chars} measured in elements minus a few +extra elements that are always added to the allocation. +{\tt .chars} is a little-endian array of 64-bit digits, +these 64-bit digits are called `characters' in libzahl. +{\tt .chars} holds the absolute value of the +represented value. + +Unless {\tt .sign} is 0, {\tt .chars} always contains +four extra elements, refered to as fluff. These are +merely allocated so functions can assume that they can +always manipulate groups of four characters, and need +not care about cases where the number of characters is +not a multiple of four. There are of course a few cases +when the precise number of characters is important. + + +\newpage +\section{Parameters} +\label{sec:Parameters} + +The general order of parameters in libzahl functions +are: output integers, input integers, input data, +output data, parametric values. For example, in +addition, the out parameter is the first parameter. +But for marshalling and unmarshalling the buffer +is last. For random number generation the order is: +output, device, distribution, distribution parameters. +Whilst the distribution parameters are big integers, +they are not considered input integers. The order +of the input parameters are that of the order you +would write them using mathematical notation, this +also holds true if you include the output parameter +(as long as there is exactly one output,) for example + +\vspace{1ex} +$a \gets b^c \mod d$ +\vspace{1ex} + +\noindent +is written + +\begin{verbatim} + zmodpow(a, b, c, d); +\end{verbatim} + +\noindent +or + +\begin{verbatim} + zmodpowu(a, b, c, d); +\end{verbatim} + +Like any self respecting bignum library, libzahl +supports using the same big integer reference as +for output as input, as long as the all output +parameters are unique to each other. For example + +\begin{alltt} + a += b; +\end{alltt} + +\noindent +or + +\begin{alltt} + a = a + b; +\end{alltt} + +\noindent +is written, using libzahl, as + +\begin{alltt} + zadd(a, a, b); +\end{alltt} + +For commutative functions, like {\tt zadd}, the +implementation is optimised to assume that this +order is more likely to be used than the alternative. +That is, you should, for example, write + +\begin{alltt} + zadd(a, a, b); +\end{alltt} + +\noindent +rather than + +\begin{alltt} + zadd(a, b, a); +\end{alltt} + +This is assumption is not made for non-commutative +functions. diff --git a/doc/what-is-libzahl.tex b/doc/what-is-libzahl.tex @@ -0,0 +1,174 @@ +\chapter{What is libzahl?} +\label{chap:What is libzahl?} + +In this chapter, it is discussed what libzahl is, +why it is called libzahl, why it exists, why +you should use it, what makes it different, and +what is its limitations. + +\vspace{1cm} +\minitoc + + +\newpage +\section{The name and the what} +\label{sec:The name and the what} + +In mathematics, the set of all integers is represented +by a bold uppercase `Z' ({\bf Z}), or sometimes +double-stroked (blackboard bold) ($\mathbb{Z}$). This symbol +is derived from the german word for integers: `Zahlen' +[\textprimstress{}tsa\textlengthmark{}l\textschwa{}n], +whose singular is `Zahl' [tsa\textlengthmark{}l]. libzahl +[l\textsecstress{}\textsci{}b\textprimstress{}tsa\textlengthmark{}l] +is a C library capable of representing very large integers, +limited by the memory address space and available memory. +Whilst this is almost none of the elements in {\bf Z}, +it is substantially more than available using the intrinsic +integer types in C. libzahl of course also implements +functions for performing arithmetic operations over +integers represented using libzahl. Libraries such as +libzahl as called bigint libraries, big integer libraries, +multiple precision integer libraries, arbitrary precision +integer libraries,\footnote{`Multiple precision integer' +and `arbitrary precision integer' are misnomers, precision +is only relevant for floating-point numbers.} or bignum +libraries, or any of the previous with `number' substituted +for `integer'. Some libraries that refer to themself bignum +libraries or any of using the word `number' support other +number types than integers. libzahl only supports integers. + + +\newpage +\section{Why does it exist?} +\label{sec:Why does it exist?} + +libzahl's main competitors are GNU MP (gmp),\footnote{GNU +Multiple Precision Arithmetic Library} LibTomMath (ltm), +and TomsFastMath (tfm). All of these have problems: + +\begin{itemize} +\item +GNU MP is extremely bloated, can only be complied with +GCC, can only be dynamically linked, and requires glibc. +Additionally, whilst its performance is generally good, +it can still be improved. Furthermore, GNU MP cannot be +used for robust applications. + +\item +LibTomMath is very slow, infact performance is not its +priority, rather its simplicit is the priority. Despite +this, it is not really that simple. + +\item +TomsFastMath is slow, complicated, and is not a true +big integer library and is specifically targeted at +cryptography. +\end{itemize} + +libzahl is developed under the suckless.org umbrella. +As such, it attempts to follow the suckless +philosophy.\footnote{\href{http://suckless.org/philosophy} +{http://suckless.org/philosophy}} libzahl is simple, +very fast, simple to use, and can be used in robust +applications. Currently however, it does not support +multithreading, but it has better support multiprocessing +and distributed computing than its competitor. + + +\newpage +\section{How is it different?} +\label{sec:How is it different?} + +All big number libraries have in common that both input +and output integers are parameters for the functions. +There are however two variants of this: input parameters +followed by output parameters, and output parameters +followed by input parameters. The former variant is the +conventional for C functions. The latter is more in style +with primitive operations, pseudo-code, mathematics, and +how it would look if the output was return. In libzahl, +the latter convention is used. That is, why write + +\begin{alltt} + zadd(sum, augend, addend); +\end{alltt} + +\noindent +rather than + +\begin{alltt} + zadd(augend, addend, sum); +\end{alltt} + +\noindent +This can be compared to + +\vspace{1ex} +$sum \gets augend + addend$ +\vspace{1ex} + +\noindent +versus + +\vspace{1ex} +$augend + addend \rightarrow sum$. +\vspace{1ex} + +libzahl, GNU MP, and Hebimath uses the output-first +convention. LibTomMath and TomsFastMath uses the +input-first convention. + +Unlike other bignum libraries, errors in libzahl are +caught using {\tt setjmp}. This ensure that it can be +used in robust applications, catching errors does not +become a mess, and it minimises the overhead of +catching errors. Errors are only checked when they can +occur, not also after each function-return. + +Additionally, libzahl tries to keep the functions' +names simple and natural rather than techniqual or +mathematical. The names resemble those of the standard +integer operators. + + +\newpage +\section{Limitations} +\label{sec:Limitations} + +libzahl is not recommended for cryptographic +applications, it is not mature enough, and its author +does not have the necessary expertise. And in, +particular it does not implement constant time +operations. Additionally, libzahl is not thread-safe. + +libzahl is also only designed for POSIX systems. +It will probably run just fine on any modern +system. But it makes some assumption that POSIX +stipulates or are unpractical not to implement +for machines that should support POSIX (or even +support modern software): + +\begin{itemize} +\item +Bytes are octets. + +\item +There is an integer type that is 64-bits wide. +(The compiler needs to support it, but it is not +strictly necessary for it to be an CPU-intrinsic, +but that would be favourable for performance.) + +\item +Two's complement is used. +(The compiler needs to support it, but it is not +strictly necessary for it to be an CPU-intrinsic, +but that would be favourable for performance.) +\end{itemize} + +These limitations may be removed later. And there +is some code that does not make these assumptions +but acknowledge that it may be a case. On the other +hand, these limitations could be fixed, and agnostic +code could be rewritten to assume that these +restrictions are met.