A PKI based on decentralized trust for P2P transaction processing [2004]
Context
The text below is Malte Ubl's diploma thesis from 2004. Diploma was the name of the primary academic title for engineering degrees in Germany before Europe switched to the US Bachelor/Masters system.
The thesis was originally written in German which clearly was a mistake since the audience was the worldwide computer science community. In fact, I've been sad about this decision ever since. AI to the rescue. The text below is machine-translated by Google's Gemini 1.5 Pro.
Publishing the thesis now has no practical purpose, but it is a fun historical artifact providing a perspective into the early-2000s internet. Summarized with the benefit of hindsight, the thesis proposes a blockchain-like mechanism for p2p transaction processing.
The text has a lot of boilerplate establishing baseline facts required for diploma theses. Things get a bit more interesting from 6 Trust.
1 Introduction
2004 - The Internet has changed the world. The World Wide Web has driven globalization forward at an unprecedented speed, often completely eliminating the spatial distance between business partners. Who can deliver a truckload of coffee machines to my receiving dock next Wednesday? This question can be easily posed on international marketplaces today. The advantages for companies are obvious, but entirely new possibilities are also emerging for consumers. Abercrombie & Fitch doesn't have outlets in Germany - so I'll just order from the USA. Your daughter has outgrown her Barbie dolls - auction won. Global freedom is what the Internet gives us - and yet this freedom is currently limited. Even the seller on eBay is a customer of a paid service. People on the Internet are primarily users and only secondarily providers.
Wouldn't you think a computer with 2 gigaflops of processing power, half a terabyte hard drive, and a 3 Mbit DSL Internet connection is good for more than just displaying a web browser on the screen? These previously untapped resources of the Internet enable a migration away from the client-server architecture towards a peer-to-peer (P2P) architecture. This thesis will show a concrete way how, based on current technology, a network can be created that emancipates people on the Internet from their role as mere users. The monopoly of companies on providing services is broken and made accessible to the masses - the technology represents at least a paradigm shift, if not a revolution.
The goal of this thesis is to enable participants in a fully decentralized P2P network to efficiently conduct business processes with each other. After a brief introduction to the world of P2P networks, the thesis first addresses the elementary technological building blocks that support the execution of complex business processes in computer networks. One building block is enabling verifiable communication or electronic signatures as a prerequisite for contract conclusions in virtual space. However, this purely technological level is not the only area that makes trading in P2P networks a challenge. In a global market, the issue of trust between the participants in a transaction is crucial because only where trust exists can efficient, spontaneous decisions be made without unnecessary caution. In contrast to traditional business contacts, however, the factor of personal relationships is eliminated. Therefore, there must be novel procedures within the network to create trust relationships between participants. Such decentralized trust management forms the second building block for business processes in P2P networks.
An interesting characteristic of P2P networks is that while the network as a whole is practically unlimitedly available, the connections of individual participants are very unstable. This contrast contradicts the assumptions made in the traditional client-server architecture. Existing technologies are therefore often partially unsuitable for use in P2P networks. Filling this technology gap is a central part of this thesis. One step in this direction is the introduction of P2P transactions. P2P transactions allow, for the first time, participants in a network with extremely unreliable connections to conduct transactions with each other, after which both the client and the provider of a transaction can prove that the transaction took place and what the result of the transaction was. In conjunction with decentralized trust management, P2P transactions therefore form a solid infrastructure for conducting business processes in P2P networks.
For a more vivid presentation, the procedures and methods in this thesis are considered using a consistent scenario - a C2C online auction platform. However, the developed infrastructure could pave the way for a variety of applications into the P2P world - a step that is very attractive, not least for business reasons.
2 P2P Networks
Peer-to-peer (P2P) networks themselves are not a new concept. For example, the mail delivery protocol SMTP^(1){ }^{1}and even the DNS protocol^(2){ }^{2}, the backbone of the Internet as we know it today, are P2P protocols. The name "P2P", however, has only emerged in recent years. It makes it easier to talk about a phenomenon that has been particularly associated with copyright infringement by the general public since the late 1990s, but which technologically represents a paradigm shift that will lastingly change the role of every single "Internet surfer". Brookshier writes in this sense that P2P technology is not a new concept, but a revolution. A revolution not in the technical sense - but a revolution like the French Revolution.^(3){ }^{3}The participant in a P2P network is the emancipated Internet user. The concentration of resources on the servers in the client-server model gives way to a system in which each participant can assume the role of either the server or the client, depending on the situation.^(4){ }^{4}
2.1 Roles of Network Participants
A special feature of the P2P architecture is the role distribution of the participants in the network. This will be presented here in contrast to the client-server architecture.
The computers in a network built according to the client-server architecture can be divided into those that assume the role of clients and those that assume the role of servers. The number of clients is usually significantly larger, while the servers are particularly powerful computers. The clients provide the user interface, while the network services (for example, print service, file service, and database service) run on the servers. There is no direct communication between the clients.^(5){ }^{5}
Figure 1: Data flows in a client-server network.
In a P2P network, such a clear distinction between client and server cannot be made.^(6){ }^{6}Each participant in the network has a user interface for client functionalities and offers services for other participants in the network. Ideally, every participant should be able to offer every transaction supported by the network.^(7){ }^{7}
Figure 2: Data flow in a P2P network.
A P2P network thus forms a kind of community in which members "pay" for the use of other members' resources by making their own resources available.^(8){ }^{8}
2.2 Organization
A pure P2P network is organized entirely decentrally. There is no central unit (e.g., a server) that acts as a coordinator; rather, the network's behavior is the product of the independent behavior of the participants. Due to the lack of a controlling entity, a P2P network requires a kind of incentive system^(9){ }^{9}that guides the behavior in the network in a common direction.^(10){ }^{10}P2P networks that deviate from the strict decentralized organization are called hybrid P2P networks. The choice of a hybrid structure is usually made to improve performance when a distributed algorithm is significantly less efficient than a comparable centralized algorithm. However, this step introduces a single point of failure, which reduces the network's robustness to failures.^(11){ }^{11}
So-called super-peer P2P networks are a hybrid of the pure P2P architecture and a hybrid structure. In such a network, the super-peers take on more tasks than ordinary participants. One can speak here of several P2P networks at different hierarchical levels. Depending on the size and openness of the individual levels, the complexity, degree of organization, and robustness of the individual levels can be manipulated.^(12){ }^{12}
In the following, only decentrally organized P2P networks will be considered.
2.3 Structuring
P2P networks can be differentiated according to their degree of structuring. In an unstructured P2P network, participants have no information about the resources of other participants. This means that, for example, searching the network is only possible by sending a request to as many other participants as possible. Unstructured P2P networks are extremely fault-tolerant, as the failure of a single participant has no effect on the overall system. However, this comes at the cost of efficiency and predictability of system behavior.
In structured P2P networks, participants have knowledge about the available resources of a set of other participants. This enables efficient routing mechanisms (e.g., for searching), which are implemented, for example, by distributed hash tables.^(13){ }^{13}Managing this information represents an additional effort. This is all the more true since the information must be constantly adjusted when participants leave or join the network.^(14){ }^{14}
2.4 Differentiation from Other Distributed Architectures
In addition to P2P networks, there are other distributed computer network architectures, which will be contrasted here with P2P networks for better differentiation.
2.4.1 Clusters
A cluster is a parallel and possibly distributed system consisting of a set of computers that are used as a single, unified computing resource. A cluster thus has an external effect like a single computer. The distribution of tasks and resources within the cluster is carried out by a central entity, and the computers usually belong to a single local network.^(15){ }^{15}Compared to P2P networks, clusters therefore differ particularly in terms of spatial distribution and organization. While the computers in a cluster are coordinated by a central unit to work towards a specific goal, the behavior of the participants in a P2P network is not subject to central control. In this respect, a cluster works much more purposefully than a P2P network, but scalability is limited by two decisive factors. Only as many computers can be added to the cluster as the central control can handle and as the owner can finance. The central control limits the availability of the cluster because the failure of the control puts the entire system out of operation.^(16){ }^{16}
2.4.2 Grid Computing
A grid is a parallel and distributed system that allows geographically distributed, autonomous participants to share their resources. The distribution takes place dynamically at runtime based on various criteria such as the availability and computational speed of the participants.^(17){ }^{17}The participants often belong to different organizations. However, the degree of organization between the participants is usually quite high. Current implementations of grid computing networks often still use centralized mechanisms for resource management. However, a decentralized implementation with P2P technology is also possible. Such grid computing networks are then special forms of P2P networks specifically designed for the distribution of resources.^(18){ }^{18}
2.5 Advantages and Disadvantages of P2P Technology
From a business perspective, operating an application for end-user use based on a P2P network offers significant advantages over a client-server architecture. Since the systems control themselves and data is exchanged directly between participants, all costs associated with operating the application are eliminated. This particularly affects costs for operating a data center or for traffic caused by customers.
A disadvantage of P2P technology is that the provider of a P2P network has little influence on the events taking place within it. Also, developing a P2P application is generally significantly more complex than developing client-server applications, although this could change with the development of P2P frameworks like JXTA^(19){ }^{19}.
Fully decentralized P2P networks are practically unlimitedly available because the failure of individual participants does not affect the functionality of the entire network. The scalability of P2P networks with respect to a given problem depends on the existence of suitable algorithms for use in decentrally organized, distributed environments. However, there is no bottleneck in P2P networks that would impose an upper limit on performance. In this respect, P2P networks are infinitely scalable.
2.6 Applications
In addition to the examples DNS and SMTP mentioned above, P2P networks are found in the following areas today:^(20){ }^{20}
File sharing - controversially represented by Napster^(21){ }^{21}, Kazaa^(22){ }^{22}, Gnutella^(23){ }^{23}, etc. Exchange of often copyrighted content over the P2P network.
Content distribution - represented by BitTorrent^(24){ }^{24}, among others. Distribution of the traffic load of a download, which is accessed by many, among all downloaders.
Resource sharing - represented by SETI@Home^(25){ }^{25}, among others. Participants make their unused resources available to the network.
Communication - represented by ICQ^(26){ }^{26}, among others. Participants communicate with each other via direct channels.
2.7 Characteristics
For a more efficient presentation, this thesis uses the term P2P network as a synonym for a hypothetical network with characteristics that can be considered realistic for a P2P network based on existing P2P networks. The characteristics are: /P2P 10/ Every participant can, in principle, carry out every transaction supported by the P2P network. /P2P 20/ Participants can publish information in the network. /P2P 30/ Information is kept massively redundant. Its availability is therefore independent of the online status of a single participant. /P2P 40/ Information in the network can be searched efficiently and decentrally. /P2P 50/ All participants can communicate unhindered. In particular, the existence of firewalls is ignored for simplification.^(27){ }^{27} /P2P 60/ In contrast to the availability of information throughout the network (/P2P 30/), there are no guarantees regarding the availability of a single participant. It must be assumed that participants regularly drop out.
The contrast between the characteristics /P2P 30/ and /P2P 60/ is one of the most striking characteristics of P2P networks. The network as a whole is highly available, while individual participants switch between online and offline status at relatively short intervals. The average online time of a participant in a P2P file-sharing network is approximately 60 minutes.^(28){ }^{28}The maximum online time of participants is limited by most Internet providers used by end customers, as the internet connections are closed after a maximum of 24 hours. This volatility of connections makes many mechanisms developed for highly available network topologies unsuitable for use in P2P networks.
From the characteristics /P2P 20-40/, it follows that the P2P network as a whole represents a kind of distributed database that provides services for publishing and searching for information.
Figure 3: P2P network as an abstract server.
To facilitate the presentations and visualization, the P2P network will therefore be represented in figures (see Figure 3) in the following as a simple server that is available to the participants. The characteristic of massive spatial distribution is, however, ignored.
2.8 Identifiers
An identifier is a string that identifies an entity in a network. Unless otherwise stated, it is globally unique in the sense of the network. The identifier for an entity iiis called Identifier _(i)_{i}.
2.9 Functions
Based on the characteristics mentioned above, the P2P network provides some functions: The term tuple is used for related lists of data that can be passed to and returned from functions. For efficient representation, any declaration of data types is omitted. The contents of the tuples should be clear from the context.
publish(Tuple)
Every participant can publish any tuple of information in the network (/P2P 20/).
Participant iisends Tuple _(1)_{1}to participant jjand receives Tuple _(2)_{2}back as a response.
The functions together serve as an interface to the underlying network. They therefore represent a further level of abstraction between the transport layer and the application layer of the network.
3 Scenario
3.1 Business Perspective
This thesis deals with the establishment of an infrastructure for P2P networks that, among other things, supports the processing of business processes between participants. For a more vivid presentation, the facts are presented using a scenario.
Since all participants are technically on the same level in P2P networks, such business processes can be most easily mapped onto P2P networks in which the participants are also on the same level. This is true for hardly any business process more than for consumer-to-consumer online auctions. Here, each participant can be both a consumer and a provider, and the knowledge of each participant about the other participant is so limited that, for example, social differences hardly come into play.
Online auctions offer the advantage over other sales processes that they come very close to the economic ideal of perfect competition. Both the number of buyers and the number of sellers is large. For example, while online shops of well-known brands may reach a similar number of buyers, the number of sellers is always vanishingly small in comparison.^(29){ }^{29}
3.1.1 Business Process
This thesis considers C2C online auctions in the simplest possible form. The auction determines the price for a good through bids from potential customers. Each bid must be higher than the previous one. By placing a bid, the bidder declares their agreement to enter into a purchase contract at the bid price. The purchase contract is concluded if the bid is not exceeded.^(30){ }^{30}
Figure 4: Online auction business process as EPK (see [Schröder 2004]); (Color coding is for categorization only).
The business process of an online auction can be roughly divided into three parts, assuming that a number of providers have already created and published auctions.
Offer selection (marked turquoise in Figure 4)
Bid submission (marked yellow)
Exchange of money for goods (marked orange)
This thesis is limited to the consideration of offer selection and bid submission, as the exchange of money for goods would primarily take place outside the computer network anyway. The submission of bids until the winner of an auction is determined is called an auction transaction. Once the auction transaction is completed, the provider and the winner can submit an evaluation of the auction process, whereby they can, of course, also take into account the subsequent exchange of money for goods.
3.1.2 Assumptions
The following assumptions are made for the online auction under consideration: /Ann 10/ The number of buyers is large. /Ann 20/ The number of sellers is large. /Ann 25/ The number of offers is large. /Ann 30/ All buyers have the same opportunity to sufficiently inform themselves about the offers. /Ann 40/ The offers can be easily compared with each other. /Ann 50/ The large number of buyers and sellers makes it very unlikely that a buyer and a seller know each other directly. /Ann 60/ The majority of participants are benevolent.
3.1.3 Data Modeling
Figure 5: Modeling an auction as an ER diagram (see [Wikipedia Entity-Relationship Model]).
Figure 5 shows - in a greatly simplified form - the data of an auction platform (the description of the auction is arbitrarily complex. It can contain text, images, etc.).
3.1.4 Risk
All participants in an auction, both the seller and all bidders, have an interest in not being cheated. For the bidder, the question arises whether the seller will actually send the auctioned item after the money has been transferred and whether the quality of the goods corresponds to the auction description. As a seller, it is important whether the bidders actually have an interest in purchasing the goods or whether they only want to artificially drive up the price to benefit their own auctions. Another scenario would be that a bidder participates in several auctions for similar goods and chooses the cheapest offer in the end. Certainly, not all possible misbehaviors have been listed here, but the problem can be reduced to the fact that there must be a basis of trust between bidder and seller that allows participants to assess the risk of a transaction. In general, the decision to participate in a transaction or to allow another's participation will boil down to the formula:
Risk of transaction * Value of goods > Pain threshold for loss
Each participant will define the pain threshold for a loss for themselves. The value of the goods is known. However, /Ann 60/ makes assessing the risk difficult. An online auction platform must therefore support the determination of the risk of a transaction between complete strangers.
3.1.5 Requirements
The following requirements are placed on an auction platform: /Anf 10/ Participants must be able to create and publish auctions. /Anf 20/ Participants must be able to search for auctions. /Anf 25/ Participants must be able to determine the amount of the current bid. /Anf 30/ Participants must be able to assess the risk of a transaction with another participant. /Anf 40/ Bidders must be able to submit bids.
Submitting a bid is the transaction in the narrower sense. The following requirements apply to this: /Anf 50/ The bidder must have proof that they have submitted a bid. /Anf 60/ The seller must have proof that the bidder has submitted a bid. /Anf 70/ If the bidder and seller are honest, the evidence from /Anf 50/ and /Anf 60/ must never contradict each other.
For /Anf 10/ up to but not including /Anf 30/, it is assumed that these requirements can be met by the capabilities of today's widespread file-sharing P2P networks. The remaining requirements will be addressed in this thesis.
3.1.6 Use Cases
Selected use cases within the auction platform are considered in detail.^(31){ }^{31}
Figure 6: Selected use cases of the auction platform.
The participants are divided into bidders (also buyers) and sellers, whereby these are only temporary roles that each participant can also assume simultaneously for different auctions. A participant is already referred to as a bidder as soon as they intend to participate in an auction.
/Af 10/ Evaluate Transaction Actor: Bidder/Seller Precondition: Evaluable auction transaction has been executed. Postcondition: Transaction has been evaluated. Process Description: After a bidder has won an auction, the winner and seller are given the opportunity to evaluate it with a short text and the rating "positive", "negative", or "neutral".
/Af 20/ Create Auction Actor: Seller Precondition: Seller wants to auction goods. Postcondition: Auction can be found via the search function. Process Description: Seller defines the attributes of an auction via an input mask (see Figure 6).
/Af 30/ Select Auction Actor: Bidder Precondition: Bidder wants to bid on goods. Postcondition: Bidder has found an attractive auction or bidder has not found an attractive auction. Process Description: See turquoise colored functions in Figure 4.
/Af 40/ Submit Bid Actor: Bidder Precondition: Bidder wants to participate in an auction. Postcondition: Bidder has submitted a bid. If, in addition, the auction has ended and no further bid has been submitted after this one, the bidder is the winner of the auction. A purchase contract for the auctioned item is concluded between the winner and the seller of the auction. Exceptions: Bid could not be submitted because a higher bid has already been submitted in the meantime or the auction has ended. Process Description: See yellow colored function in Figure 4.
3.2 Implementation
3.2.1 Current Online Auction Platforms
As part of this thesis, an online auction platform based on P2P technology is designed. This contrasts with all current mainstream C2C auction platforms, represented in particular by eBay^(32){ }^{32}, which are implemented in a client-server architecture.
To better highlight the special characteristics of P2P technology, the technological basis as well as indirectly relevant aspects of eBay's business model will be briefly presented here.
3.2.1.1 Architecture
Figure 7: Client-server architecture of eBay.
The eBay auction platform is a classic web application.^(33){ }^{33}The object marked in blue in Figure 7 symbolizes the server. This may be composed of various computing units - logically, however, a single server is created on which all transactions run. The clients communicate with the server. Communication between clients at the eBay level does not take place.^(34){ }^{34}
3.2.1.2 Business Model
eBay's business model is based on fees being charged to the seller after a successful auction. On the cost side, eBay has to account for the operation of its application in addition to administration and marketing. In addition to the acquisition and maintenance of the server platform, eBay incurs costs due to usage-related traffic.
3.2.1.3 Reputation Model
As described in chapter 3.1.4 and /Anf 30/, users of an auction platform need to be able to assess the risk of a transaction. eBay uses a simple reputation model based on buyers and sellers being able to rate each other after a transaction has ended. The rating history is then available to future potential auction participants as a basis for decision-making.
3.2.2 Implementation Approach
The architecture of the auction platform in the P2P network represents a natural mapping of the business process. Abstractly speaking, data, goods, and money flows follow the same path.
Figure 8: Data, goods, and money flows between participants in an auction.
The data flow runs directly between the participants in an auction via the P2P network. A detour via a third party, in particular the server of a service provider, is eliminated.
By eliminating the server, there is no central database where, for example, reputation data about the participants could be stored. Section 6.6 will show a method for calculating reputation without such a central database and how reputation can be used as an indicator of the risk of a transaction.
Another function of the server in auction platforms is the role of the arbitrator. Conflicts can arise during an auction if, for example, a seller claims to have received a bid, but the alleged bidder denies this. The auction platform in the P2P network must be able to resolve such conflicts as far as possible without the influence of third parties. Chapters 4 and 5 explain the approach to how such low-conflict self-administration is possible. The design of the auction platform is finally presented in Chapter 7.
4 Cryptography
Cryptography is the science of encrypting and obscuring information. The main goals of cryptography are:^(35){ }^{35}
Confidentiality (Secrecy) The goal of confidentiality is to transmit a message to a recipient without third parties learning about the transmission or even the content.
Data Integrity The goal of data integrity is to ensure that a message is not altered during transmission.
Authentication
Participant Authentication The goal of participant authentication is the unambiguous identification of a participant, whereby local identification is also considered here (e.g., to an ATM), so communication over an insecure channel does not necessarily have to be assumed.
Message Authentication (Non-repudiation, Attribution) The goal of message authentication is to prove the authorship of a document in order to be able to unambiguously determine the author and to prevent them from denying authorship. In daily life, this is usually guaranteed by signatures. Message authentication is thus the basis for many business transactions.
4.1 Symmetric Encryption
For millennia, attempts have been made to achieve these goals through the means of symmetric cryptography. Here, a message is transformed from plaintext to ciphertext using a key, whereby the same key (or a key easily derived from the key) is used to recover the plaintext from the ciphertext.^(36){ }^{36}
Figure 9: Encryption and decryption in a symmetric cryptosystem. After Fig. 2.1 in [Adams Lloyd 2002], p. 8.
The problem with this method is the necessity of exchanging the key between the participants in a message transmission. This exchange must take place over a secure channel, which is often difficult to implement, for example, when the participants are spatially distant from each other. An even bigger problem with symmetric cryptography, however, is the large number of keys. Each participant must have a key to every other participant with whom they want to communicate. With 1,000 participants, the number of keys can be in the range of half a million. In general, for nnusers, up to n^(2)//2n^{2} / 2keys will be needed, which also have a limited lifespan in all common encryption methods. Managing such quantities of keys therefore presents a significant bureaucratic challenge and can be described as impractical for many purposes.^(37){ }^{37}
4.2 Asymmetric Encryption
If two scientists, Whitfield Diffie and Martin Hellman, hadn't had a groundbreaking idea in 1976, the world would probably have taken a different path in the development towards the information society - with a mathematical trick, they solved the problem of key distribution described above. The fundamental question is: How can a message be sent to another participant without knowing a secret (the key in symmetric cryptography)? The answer given by Diffie and Hellman was to use a key pair instead of a single key, whereby one key - the public key - is publicly known and the other key - the private key - only needs to be known by its owner. Participant Bob can now send participant Alice a message by encrypting the message with Alice's public key. The message can now only be decrypted by Alice with her private key.^(38){ }^{38}
Figure 10: Bob sends Alice a message using public-key encryption.
A so-to-speak "byproduct" of asymmetric cryptography is the possibility of a digital signature. If Bob encrypts a message with his own private key, a signature is created. This can be verified by decrypting the signature with Bob's public key and comparing it with the original text.^(39){ }^{39}In addition to the participant authentication guaranteed by the signature, it also guarantees the integrity of the data associated with it.
Figure 11: Bob signs a message. The signature is verified with Bob's public key.
Unfortunately, asymmetric cryptography also has a key distribution problem, albeit of a different kind than in symmetric encryption - the public key problem.
4.2.1 Public Key Problem
For Bob to send Alice an encrypted message, Bob must know Alice's public key, and for Alice to verify a signature from Bob, she must know Bob's public key. The problem now is: How do participants obtain the public key of another participant? A search on the Internet could provide many answers. The question is, which answer is correct - one, several, or none? The most obvious way to solve the problem is to exchange keys through personal contact. However, in the age of the Internet, this path is often impractical due to large spatial distances.
Various strategies have been developed to support the exchange of public keys, which are usually based on providing the keys with a certificate that proves their authenticity. The procedures differ mainly in the way these certificates are generated. Chapter 5 describes common procedures for issuing certificates. Chapter 7 will demonstrate that under certain conditions, the key distribution problem does not exist at all.
4.3 Cryptographic Hash Functions
Cryptographic hash functions are collision-free one-way functions that map messages of arbitrary length to a hash value of fixed length. This hash value serves as a fingerprint of the original message since it is assumed that no other message (and especially no other meaningful message) can be found that produces the same hash value. Similarly, it must not be possible to deduce the message from a hash value. Thus, it is possible to verify the knowledge of a shared secret between two participants without revealing the secret.^(40){ }^{40}
4.4 Notation
Public Key
Pub_(i)Pub_{i}is the public key of participant ii.
Private Key
Priv_(i)Priv_{i}is the private key of participant ii.
Electronic Signature
The electronic signature of a message mmfrom participant iiis sig_(i)(m)sig_{i}(m). In a tuple (e_(1),e_(2),dotse_(n),sig_(i))\left(e_{1}, e_{2}, \ldots e_{n}, sig_{i}\right), sig_(i)sig_{i}is the signature over the preceding elements.
Public Key Encryption
A message mmis encrypted with the public key of participant iiby crypt_(i)(m)crypt_{i}(m).
Cryptographic Hash Function
The cryptographic hash value of a message mmis hash(m)hash(m). In a tuple (e_(1),e_(2),dotse_(n),hash)\left(e_{1}, e_{2}, \ldots e_{n}, hash\right), hashhashis the hash value over the preceding elements.
Random Number Generator
The procedure rand()rand()returns a true random number with sufficient entropy^(41){ }^{41}.
5 Public Key Infrastructure
A Public Key Infrastructure (PKI) is the basis for a pervasive security infrastructure^(42){ }^{42}whose services are implemented using public key concepts and techniques.^(43){ }^{43}
The word infrastructure indicates that it is not the goal of a PKI to support a single application. Rather, the PKI offers an environment that allows various applications to operate within a security architecture. Instead of developing incompatible isolated solutions for specific applications, the PKI enables a consistent security landscape across application and even entire platform boundaries. The comparison with other infrastructures, such as the road or electricity network, is therefore not far-fetched. Accordingly, the security infrastructure must also meet similar, abstract requirements as the above networks:
The interface must be known and easy to use.
The results of using the infrastructure must be predictable and useful.
The way these results are generated does not need to be known to use the infrastructure.
For the user of the infrastructure, its existence should be largely transparent. This means that, apart from logging in and in exceptional cases (failure of the infrastructure), the user should not have to interact with the infrastructure in any way.
Probably the most important aspect of a security infrastructure is that it implements a single, trusted security technology that is available to the entire environment. Applications that use the services of the PKI can communicate and conduct transactions with each other undisturbed and securely.
5.1 Goals of PKI
Adams and Lloyd describe a PKI as an application enabler.^(44){ }^{44}The goal of the PKI is therefore to support applications. In a narrower sense, the PKI implements the goals of cryptography for its users in as transparent a manner as possible. However, it does not do this for its own sake. The real reason for implementing a PKI is the need to conduct transactions of any kind.
5.2 PKI Components and Services
A PKI provides a set of services and components to implement the security infrastructure. The extent to which and how each point is implemented naturally depends on the needs of the target group of the PKI.^(45){ }^{45}
5.2.1 Certificate
A certificate is the assurance by a third party that a given public key belongs to a given identity for a certain period. The certificate is signed with the private key of the third party.
5.2.2 Certification Authority
Certification Authorities (CAs) are designated participants in a PKI who confirm the authenticity of public keys through their signatures. Usually, there will be only a few CAs compared to the total number of participants. However, it is important that the public keys of the CAs are known to a large number (all) of participants for this model to work.
5.2.3 Certificate Repository
In addition to certified public keys, a PKI needs the ability to determine the public key for a given identity. The Certificate Repository^(46){ }^{46}serves this purpose.
5.2.4 Certificate Revocation
The statement expressed by a certificate - the binding between an identity and a key pair - can become false. This is especially the case when a private key is compromised. For these situations, there must be a way to revoke a certificate. This process is called Certificate Revocation.
5.2.5 Key Backup and Recovery
In a realistic scenario, users of the PKI will sooner or later lose access to their private key (e.g., due to a hard drive crash or forgetting a password). For these cases, options for backing up the key or another type of recovery should be created.
5.2.6 Automatic Key Update
Certificates usually have a limited lifespan. For such certificates, a PKI should provide an automatic way to update the certificate (and especially the key).
5.2.7 Key History
When certificates expire, there must still be a way to retrieve old keys for certain purposes (e.g., if a document was encrypted with an old key).
5.2.8 Cross-Certification
While it is conceivable that all certificates in a PKI refer to the same so-called root certificate, this scenario is unrealistic. Since this scenario is unrealistic, it will happen that two CAs certify each other. This process is called cross-certification.
5.2.9 Support for Non-Repudiation
Public key cryptography allows the signing of messages. A PKI must ensure that the binding between message and author given by a signature cannot be broken. In addition to the authorship of a document, this also guarantees the non-repudiation of receipt, delivery, and consent.
In fact, the electronic signature only means that a specific private key was used. The transitive conclusion to the owner of the private key is not beyond any doubt (e.g., in case of key theft). A PKI can therefore only provide support. In case of doubt, a judge is still needed to evaluate the evidence.
Non-repudiation plays a particularly important role in PKI services. The ability to prove the occurrence of a transaction, as well as the identity and consent of the participants afterwards, is essential for the handling of complex business processes over a network.
5.2.10 Time Stamping
It is especially important for ensuring non-repudiation that there is a secure source for time within the PKI, because due to the limited validity of the certificates and the possibility of revocation, it must be possible to subsequently verify whether a signature was made at a time when the key was still valid.
5.2.11 Client Software
Some form of client software will always be necessary for implementing a PKI. However, its scope depends on the respective system.
5.3 Trust Models
If the assignment of public keys to identities takes place via certificates, the question remains how trustworthy the certificate itself is. If any participant acts as a CA, the certificate is practically worthless. Rather, the certificate must represent a gain in trust - in other words, the CA must be more trustworthy than the certified party.
5.3.1 Strict Hierarchy of Certification Authorities
In a strict hierarchy of Certification Authorities, there is a Root CA from which all trust in the PKI originates. This means that all participants must trust the Root CA. The Root CA's certificate is self-signed.^(47){ }^{47}There can be further levels of CAs below the Root CA, whose certificates are signed by a CA at the respective higher level. The leaves of the resulting tree are the actual participants in the PKI. Valid participant certificates refer to the Root CA via the certificate chain by tracing the respective CAs.^(48){ }^{48}
Figure 12: Strict hierarchy of CAs.
All processes performed by a CA are associated with bureaucratic effort. When issuing a certificate, the identity of the certificate holder must be established beyond doubt, which is usually done by means of hand-signed documents transmitted by mail or fax. This procedure must be repeated for any changes to the certificate. Issuing a certificate is therefore usually associated with costs^(49){ }^{49}.
5.3.1.1 Architecture with Distributed Trust
The introduction of a strict hierarchy for all CAs in all PKIs of this world is unrealistic. Distributed trust provides a remedy. Here, several inherently strict hierarchies exist in parallel, whose respective Root CAs cross-certify each other (cross-certification).^(50){ }^{50}
5.3.2 Web of Trust
The Web of Trust, developed for PGP^(51){ }^{51}, attempts to manage the certification of public keys without the use of dedicated CAs. The approach assumes that each participant can certify the identity and public keys of a set of other participants.^(52){ }^{52}This set is called friends. If Alice wants to write an email to Bob, but cannot verify Bob's public key herself, she looks at the set of keys that her friends certify, and so on. This way, she will hopefully find one or more certificate paths to Bob.^(53){ }^{53}The shorter the path and the more redundant the route, the more certain the validity of Bob's key. The assessment of validity is now up to Alice herself and is therefore not to be mastered without technical understanding.^(54){ }^{54}
Figure 13: The certification path from Alice to Bob, according to [Feisthammel 2002].
The Web of Trust is a first approach to using the existing trust in a network to solve the public key problem. Chapter 6 will therefore deal more intensively with the concept of trust.
6 Trust
This thesis has so far dealt with trust in two different places: in Chapter 3.1.4 regarding the risk of a transaction and in Chapter 5.3 regarding the legitimacy of a public key. In the first case, trust refers directly to a participant; in the second case, the trustworthiness of a particular CA is examined. However, the underlying meaning of the term is the same:
Trust is the subjective expectation of a person regarding the future behavior of another person, based on previous encounters^(55){ }^{55}between these persons.^(56){ }^{56}
Trust is of central importance for the efficient execution of business processes, as it allows the protagonists to make spontaneous decisions. In an environment without a basis of trust, on the other hand, a lot of time is wasted taking precautions to prevent possible misconduct by business partners. Distrust, in turn, has a similarly positive effect as trust itself, as it also helps to make conscious decisions about cooperating with others.^(57){ }^{57}
In the online auction scenario, the business partners are completely unknown to each other.^(58){ }^{58}The existence of a direct trust relationship can therefore be excluded. However, since auctions take place within a limited time frame, efficient execution is essential. For the success of the auction platform, it is therefore of utmost importance to support the participants in building trust.
6.1 Transitive Trust
Transitive trust expresses that individuals have a certain basis of trust with individuals who are within the trust circle of their trusted contacts.^(59){ }^{59}This is how the so-called Web of Trust^(60){ }^{60}is formed.
Figure 14: Visualization of the trust relationship between Bob, Alice and Christian.
Figure 14 visualizes the trust relationship between Alice, Bob, and Christian. The transitive trust between Alice and Christian arises through Bob, who has a relationship with both. The strength of the transitive trust depends on many factors and will not be discussed further here. However, it can be assumed that transitive trust is usually weaker than, and at most identical to, a direct trust relationship.
6.2 Small-World Networks
In 1967, psychologist Stanley Milgram conducted an experiment with an astonishing result. The approach was that various people within the USA should send a letter to another, completely unknown person living in a distant part of the USA. Instead of sending it directly, however, the letter could only be sent to personal friends. The number of steps a letter needed to reach its destination was then examined. The surprising result was that the letters needed an average of only 6 steps to reach their destination - the small-world phenomenon^(61){ }^{61}was born.^(62){ }^{62}
The reason for the small number of steps between any two people is that people live in a social network and these networks are small. A social network can practically only have two states. Either all people live in small, isolated communities with no connections to each other, or they live in a single large group where everyone is connected to everyone else through a few steps. There is no middle ground, e.g., given by a few large groups that are self-contained. This is surprising insofar as people traditionally think in categories such as social classes or races, which, however, do not seem to have a dramatic influence on the connection distance between any two people in the network.
Figure 15: Graphs with ordered and random connections. Based on [Watts 2003], p. 87, Fig. 3.6.
To understand the origin of the small attribute of a network, it helps to consider three types of graphs. The left graph in Figure 15 shows a completely ordered network. The distance between two opposite nodes is thus quite large. In the right graph, the edges between the nodes are chosen randomly. For such random graphs, it has been proven that connectivity is high as soon as more than one edge per node occurs^(63){ }^{63}- if they were an accurate abstraction of reality, they would explain the small-world phenomenon. However, since social relationships do not develop purely randomly (the probability of someone from Hamburg knowing someone else from Hamburg is significantly higher than knowing a native of the Brazilian rainforest), this model is not realistic.
The middle graph, on the other hand, shows a representation that could well serve as a model of a social network. Each node is part of a closely connected group. However, there are also individual edges that connect distant nodes. These "shortcuts" have a dramatic effect on the average distance between two nodes in the network and are the key to the small-world phenomenon. In a social network, it is indeed the case that people tend to interact with a group that is closely intertwined. However, individuals are always active in several such groups. These people form the "shortcuts" in the middle graph.^(64){ }^{64}
A small-world network combines local group formation and globally short connections. The consequence is that from the subjective perspective of a single node, the small characteristic never becomes apparent because it can only overview its immediate surroundings. Only a global view reveals the true degree of networking.
6.2.1 Trust in Small-World Networks
Chapter 6.1 describes how a Web of Trust can be built using transitive trust. A kind of chain of trust is established between any two participants. If the underlying network is small, then it can be assumed that this chain will always be relatively short, regardless of the number of nodes in the network. In other words, every participant in the P2P network has a transitive trust relationship with every other participant, which runs through a relatively short trust chain.^(65){ }^{65}From this, it can be concluded that a certain basis of trust exists in the network, even if no participant can subjectively perceive this. However, it shows that there is potential that could be exploited by special algorithms.
Technically, with regard to small-world networks, it remains questionable whether it is possible to find these short paths with reasonable effort. However, it will be shown later that a sufficient assessment of a person's trustworthiness can also be made without explicitly finding the short paths.
6.3 Trust Relationships
Trust relationships between two people can be very diverse. This paper limits its consideration to the states: trust, distrust, and no relationship.
6.4 Trust Metrics
Trust metrics are computational models that provide a quantitative assessment of the trust that one person should have in another. Trust metrics are inherently backward-looking in their consideration. Therefore, they cannot predict sudden changes in human behavior.
A simple trust metric would, for example, traverse the trust chain between two participants and return the lowest trust rating. However, there are many other techniques, a complete review of which is beyond the scope of this thesis. More complex trust metrics must take social and psychological aspects of trust into account while remaining scalable.^(66){ }^{66}
6.4.1 Use in P2P Networks
Trust metrics for use in P2P networks must meet special requirements:^(67){ }^{67} /Ver 10/ The system must be completely self-regulating. Correct behavior within the network must be enforced by the participants themselves, as there is no central authority. /Ver 20/ The system must guarantee anonymity for users. Trust must be mapped to identifiers that have no externally associable meaning.^(68){ }^{68} /Ver 30/ New participants in the network should not enjoy an advantage. This prevents malicious participants from gaining an advantage by frequently changing their identity. /Ver 40/ The overhead in terms of computation, infrastructure, storage space, and message complexity should be minimal. /Ver 50/ The system must also be robust against collectives of malicious users. /Ver 60/ For use in decentralized P2P networks, the calculation must be able to be performed distributedly.
6.5 Global Trust
The definition used in this paper primarily considers trust as a local phenomenon. However, global trust, which takes a different perspective, is also useful. Global trust maps the aggregated trust of an entire network onto a single participant. The algorithms are usually based on Google's^(69){ }^{69}PageRank^(70){ }^{70}algorithm for evaluating internet pages. PageRank calculates the importance of a page based on the importance of the pages that link to it. The same concept can be applied to trust - trust statements from people who are trusted by more people are worth relatively more. The recursive nature of these models automatically leads to a complete consideration of the underlying network. Trust metrics that use a global concept of trust have the advantage that they also work without an efficient algorithm for finding local transitive trust relationships, the short paths in the underlying small-world network.
6.6 EigenTrust
An algorithm for calculating the global trust or reputation of participants in a P2P network is the EigenTrust algorithm. EigenTrust is demonstrated in [Kamvar Schlosser Garcia-Molina 2003] using the example of a file-sharing network in which the problem of participants distributing no or fake files is to be minimized. Here, EigenTrust is applied to the online auction scenario.^(71){ }^{71}
Let QQbe the set of participants in the network. The winner and the seller of an auction are given the opportunity to evaluate each other after the auction has ended. The local reputation (history of previous auctions) between participants i in Qi \in Qand j in Qj \in Qis defined as follows:
Other participant jjbehaved positively towards iiin an auction: (tr(i,j)=1)(tr(i, j) = 1). This is the case, for example, if an auction winner paid promptly and in full.
Other participant jjbehaved negatively towards iiin an auction: (tr(i,j)=-1)(tr(i, j) = -1). This is the case, for example, if the auctioneer does not deliver after the auction.
All participants trust themselves: (tr(i,i)=1)(tr(i, i) = 1).
The local trust value from iito jjis s_(ij)=sum tr_(ij)s_{ij} = \sum tr_{ij}.
EigenTrust now determines the global reputation of a participant iifrom the local trust of other participants towards iiand their global trust.
6.6.1 Normalization of Local Trust Values
Local trust values must be normalized to be aggregated securely, as otherwise the system could be circumvented by assigning arbitrarily high trust values. The normalized local trust value is defined as:
The term max(s_(ij),0)\max(s_{ij}, 0)sets all negative s_(ij)s_{ij}to 0. This facilitates normalization but has the consequence that the distinction between a bad relationship and no relationship between two participants is lost.^(72){ }^{72}The value c_(ij)c_{ij}is always between 0 and 1.
6.6.2 Aggregation of Local Trust Values
It is now necessary to aggregate the local trust values. To do this, participant iiasks all their acquaintances jjabout their opinion of kk. The statements of the acquaintances are in turn weighted by the trust of iiin them.
t_(ik)t_{ik}now expresses the trust of iiin kk, taking into account one level of transitive trust relationships. Note that the formula is not recursive.
The above formula can be represented in matrix notation: Let
CCbe the matrix c_(ij)c_{ij},
vec(t)_(i)\vec{t}_{i}be the vector t_(ij)t_{ij},
vec(c)_(i)\vec{c}_{i}be the vector of local normalized trust values of ii, then:
By repeatedly multiplying the matrix C^(T)C^{T}by itself, the number of transitive relationship levels considered is increased. Through a sufficiently large number of repetitions, iigains a complete overview of the network.
In doing so, vec(t)_(i)\vec{t}_{i}converges for all iito the same vector^(73){ }^{73}vec(t)\vec{t}, the global trust vector of the network.
6.6.3 Probabilistic Interpretation
Since EigenTrust is based on PageRank^(74){ }^{74}, the trust values can be interpreted as probabilities similar to the Random Surfer Model. The Random Surfer Model considers an idealized web surfer who randomly clicks on links and thus moves through the WWW. After a certain number of clicks, the probability is higher that the random surfer has arrived at a page that many links point to than at a page that fewer links point to. The Google search engine uses this probability as an indicator of the quality of a website.
Figure 16: The trust network of participants a to f. The edge label is the respective normalized local trust value between the connected participants.
The random surfer in an EigenTrust network moves from participant to participant. If they are at participant ii, they jump to participant jjwith probability c_(ij)c_{ij}. After the random surfer has moved like this for a while, they are more likely to be at a trustworthy participant than at an untrustworthy participant. The global trust vector vec(t)\vec{t}is the stationary distribution of the Markov chain defined by CC.
6.6.4 Simplified EigenTrust Algorithm
Apparently vec(t)\vec{t}converges to the same vector for all vec(c)_(i)\vec{c}_{i}. We therefore choose the same vector vec(e)\vec{e}for all ii, whose values are each (1//"number of participants")(1 / \text{number of participants}), i.e., a uniform probability distribution.
From this, a simple version of the EigenTrust algorithm can be derived, which assumes that a central location knows all the values of the matrix CC.
delta\deltais assigned the length of the vector resulting from the subtraction of the vector vec(t)\vec{t}from the last iteration from the vector vec(t)\vec{t}of the current iteration. The algorithm terminates when this length (i.e., the gain in accuracy through one iteration) becomes smaller than a given value epsilon\epsilon.
6.6.4.1 Refinement
To be used in a real-world scenario, the algorithm needs to be extended by three aspects.
There will usually be a set of pre-trusted participants^(75){ }^{75}in a network. As will be shown later, there are several advantages for the EigenTrust algorithm if this set is taken into account. Mathematically, this can be represented by replacing the vector vec(e)\vec{e}with a distribution vec(p)\vec{p}that has the following properties:
Let PPbe the set of pre-trusted participants.
p_(i)=1//|P|p_{i} = 1 / |P|if i in Pi \in P, otherwise
p_(i)=0p_{i} = 0
In vec(t)=(C^(T))^(n) vec(p)\vec{t} = (C^{T})^{n} \vec{p}, all trust originates from pre-trusted participants. The algorithm will thus converge faster in the presence of malicious collectives that rate each other positively, since the collectives are only considered in later iterations.
There will always be inactive participants (e.g., new participants) who do not trust anyone or do not even know anyone. It is defined for these participants that they trust the pre-trusted participants PP.^(76){ }^{76}This results in the following redefinition for the matrix CC:
To minimize the risk of subversion of the algorithm by malicious collectives (groups of participants who rate each other very well but give bad ratings to outsiders), an additional extension is necessary: Let aabe a constant between 0 and 1.
The new line 5 prevents the trust in truly trustworthy participants from ever becoming so small that the algorithm gets stuck in a collective. Otherwise, the risk would be that all trust outside the collective would become vanishingly small compared to the malicious participants. This shows that it must be avoided under all circumstances that a participant from PPbecomes part of a malicious collective. The number of participants in PPshould therefore be reduced to a minimum.
6.6.4.2 Convergence
The EigenTrust algorithm generally converges quickly. Simulations have shown that delta\deltano longer changes significantly after fewer than 10 iterations.
A proof of the convergence of the EigenTrust algorithm will not be provided here. However, it should be noted that all values in C^(T)C^{T}are less than or equal to 1 and greater than or equal to 0. It is therefore intuitive that the length of the vector (C^(T))^(k) vec(t)(C^{T})^{k} \vec{t}decreases with increasing kkand that this decrease slows down. A detailed consideration of the convergence of PageRank, which can be transferred to EigenTrust, is given in [Haveliwala Kamvar 2003].
6.6.5 Distributed EigenTrust Algorithm
The distributed EigenTrust algorithm is also initially presented in a simplified version. For all participants ii, it is assumed that they know both their normalized local trust vector vec(c)_(i)\vec{c}_{i}and their own global trust value t_(i)t_{i}. Other participants can therefore learn t_(i)t_{i}by asking ii.
For the distributed algorithm, the assumption that CCis completely known to the computer is dropped. Instead, all participants collaborate to calculate the c_(ij)c_{ij}. To visualize the calculation process, it is helpful to imagine a kind of chess game with an arbitrary number of players. The game is divided into rounds, and in each round, each player performs a calculation whose result they pass on to a set of other players. The data a player receives at the end of each round forms the basis for the calculations of the next round. A distributed algorithm is only usable in practice if the number and size of messages transmitted between participants do not overload the network. The approach to distributing EigenTrust is to break down the term (1-a)C^(T) vec(t^((k)))+a vec(p)(1-a) C^{T} \vec{t^{(k)}} + a \vec{p}into its component notation: (1-a)(c_(1i)t_(1)^((k))+dots+c_(ni)t_(n)^((k)))+ap_(i)(1-a) (c_{1i} t_{1}^{(k)} + \ldots + c_{ni} t_{n}^{(k)}) + a p_{i}. It must now be ensured that iiknows the values of c_(ni)c_{ni}and t_(n)^((k))t_{n}^{(k)}for all nnand kkor knows that one of them is 0. A property of the distributed algorithm is that only the pre-trusted participants need to know their value for p_(i)p_{i}. They therefore remain anonymous.
Definitions:^(78){ }^{78}
A_(i)A_{i}: Set of participants who have made a trust statement about ii.^(79){ }^{79}
B_(i)B_{i}: Set of participants about whom iihas made a trust statement.
Theorem 1:
If j inA_(i)j \in A_{i}, then i inB_(j)i \in B_{j}or j inA_(i)=>i inB_(j)j \in A_{i} \Rightarrow i \in B_{j}. Proof by reflection: If jjhas made a trust statement about ii, then jjhas made a trust statement about ii.
Function Definitions:
Let kkbe the round index and a,b,ca, b, cbe participant indices.
step(k,a,b)=c_(ab)t_(a)^(k)step(k, a, b) = c_{ab} t_{a}^{k}
Let kkbe the round index and iibe the participant index.
compute_round(k,i)=(1-a)(step(k,1,i)+step(k,2,i)+dots+step(k,n,i))+ap_(i)compute\_round(k, i) = (1-a) (step(k, 1, i) + step(k, 2, i) + \ldots + step(k, n, i)) + a p_{i}
Let t_(i)^((a))t_{i}^{(a)}be global variables for a=ka = kand a=k+1a = k+1.