Portal:Complex Systems Digital Campus/RAPSODY
the repository for Open Questions, Challenges and Ressources of the
The phenomenological and theoretical reconstruction of the multilevel stochastic models of complex systems is the main and long lasting “computational problem” (fig.1). It encompasses the main data challenges, machine learning and AI current problems. Such reconstruction lies at the heart of Complex Systems Science as well as of the new integrative and predictive sciences in complex matter, biology, individual and social cognition and multilevel territories in the ecosphere.
The main CS-DC scientific challenge consists in two related reconstructions of the observed multiscale dynamics through multilevel stochastic models:
- a phenomenological reconstruction of the multiscale dynamics by the stochastic dynamics of its agents with their measured actions and interactions: this step consists in creating exchangeable cohorts of agent's trajectories at the different organisational levels.
- a theoretical reconstruction of the agent's trajectories of the multilevel exchangeable cohorts by its multilevel stochastic model.
With the 2nd internet revolution, massive decentralized dynamical data can be shared (IPFS: InterPlanetary File System) as well as a decentralized computational ecosystem federating heterogeneous network of IoT/laptops/GPU/HPC. The smart contract for sharing scientific data can respect in a fair way the combined efforts of experimental scientists as well as theoretical scientists in direction of the new integrative and predictive sciences. The decentralized computational ecosystem with IPFS will be as if scientists are using a single computer for data storing and their processing. Thus the 2nd internet revolution will play the role for these new sciences to be the main unique decentralized measurement device as the LHC is for High Energy Physics.
All the above is especially crucial in the WWW DAO (WorldWide Wellbeing Decentralized Autonomous Organization) for treating the self-governance problem of each individuated lifelong mutual wellbeing. A smart contract for sharing personal data with science is intrinsically respecting the data protection principles of the GDPR (General Data Protection Regulation). Such sharing provides the most satisficing highway toward the new mutual wellbeing science for bringing lifelong advices when difficulties happens. It is of the most interest of each one to participate to such smart sharing contract. It is a new ethical obligation for helping the new generations that are coming permanently inside human society.
multiscale data, multilevel agents, multilevel dynamics, multilevel model, 2nd internet revolution, smart contract, General Data Protection Regulation, internet of objects
- Paul Bourgine (chair | firstname.lastname@example.org)
- Pierre Baudot, Jeffrey Johnson, Carlos Barrios, Nadine Peyriéras, Pierre Collet and chairs
- GSI & TGSI forum
- A. Jeannin, A. Bruyant, N. Toussaint, I. Diouf , P. Collet, P. Parrend, Management of digital records inspired by Complex Systems with RADAR, Journal of Robotics, Networking and Artificial Life, Atlantis Press, pages -, Volume 5, n° 1, juin 2018
Phenomenological reconstruction (e-team)
The phenomenological reconstruction of multiscale dynamics produces the multilevel dynamics of the agents with their measured actions & interactions. This step is creating cohorts of agents that can be considered as 'exchangeable': it is a judgment saying that all the statistical inferences are independent of the order of trajectories in the cohort. Exchangeability is a precondition for any Machine Learning method and for any efficient statistic method.
The multiscale dynamics are like 2D+t or 3D+t multiscale video: the multiscale property of image comprises at least two levels of organization, the individuated agent and its environment; in the figure 1 about the first cell divisions of embryogenesis, the reconstruction comprises three levels: the cell, the cell groups at the origin of organs (a cohort of similar cells) and the environment. More generally, a video can be interpreted as a 'CW-complexe' or Piecewise Linear Manifold dynamics, through a segmentation of its components. The time between two images can be uniform as for a video or can be different. In the WWW DAO, videos will be about humans and their interactions or about phytosociology between vegetables cultivated together, etc..
The deep learning has recently done great progresses for recognizing the species of animals through ‘supervised learning’ when the database contains for each image the name of the species. The aim here is ‘unsupervised learning’ similar to the young children or animals who are categorizing a cohort of similar shapes without name. The young childs don’t need the very long simultaneous repetition of the shape and the name of supervised learning: they are learning the name in one shot.
Fig.2 is illustrating how unsupervised learning is using algebraic topology. The topological signature of ‘persistence diagram’ provides distances between shapes (top), allowing the use of clustering methods on shapes for building exchangeable cohorts (bottom). More generally, for a dynamical shape combined with measurements, the method is to compute the (co)-homology (i.e. the topology) of a dynamical data manifold (call it its invariants, shape, symmetries, critical points, multilevel structure …) for building exchangeable cohorts. In machine learning terms, the homology quantifies and decomposes the structure and pattern given by the data point in each dimension.
(Co-)homology is a highway for building exchangeable cohorts.
- Pierre Baudot (chair)
- Jeffrey Johnson (co-chair)
- Paul Bourgine (co-chair)
- Jacques Demongeot (co-chair),
- Masatoshi Funabashi, Salma Mesmoudi, Pierre Parrend, Ludo Seifert (members)
- see all the results in Topological Data Analysis notably persistent homology
- See all the results of the European project TOPDRIM
- See all the recorded talks of the TGSI Conference in Marseille available on the CS-DC forum
- MMS Dapp (Megadiversity Management System) & synecoculture Dapp
- HOPE Dapp (Health Optimisation through 4P Ecosystem)
- Baby Dapp
- Sport Dapp
Reconstructing multilevel models (e-team)
The second main problem for Complex Systems Science and 'Mutual Wellbeing Science' is the reconstruction of the multilevel model manifold from the cohort of multilevel agent's trajectories. By definition, a cohort has the property of exchangeability as discussed above: the statistical inference are independent from the order of the trajectories in the cohort.
There is two main ways for using the mathematical principle of the figure 3:
- One principle of Geometrical Science of Information (GSI) for machine learning is to minimize the Kullback-Leibler divergence ∆ for finding the best model. For modelling dynamical systems, it is necessary to use Artificial Recurrent Neuronal Network (ARNN). Notice that Machine Learning is always implicitly using interchangeable cohort as constructed by the phenomenological reconstruction.
- Summary statistics are existing under the 'exchangeability' property. Paul Ressel's theorem says that under exchangeability there exists always a group for summarizing all the information of the cohort and adding incrementally a new information by using the group's operator. Unfortunately, the theorem is not constructive. But if the group construction is not known, it is always possible to adopt optimal statistics, i.e. statistics using the maximum entropy principle (Jaynes). Optimal statistics are incremental algorithms with accumulators: that is a requirement for a huge decentralized measurement of multilevel dynamics.
Optimal statistics in WWW DAO as well as in the CS-DC are the 'default strategy'. Olympiads of models will be organized for finding better multilevel model manifold either by finding the summarizing group or better Artificial Recurrent Neuronal Network (ARNN). Olympiads of models are defined below.
- Paul Bourgine (chair)
- Pierre Baudot (co-chair)
- Juan Simoës, ...
- See Lauritzen for summary statistics
- Paul Ressel. De Finetti-type Theorems: an Analytical Approach. The Annals of Probability, 1985, Vol. 13, No. 3, 898-922
- Nihat Ay, Jürgen Jost, Hong Van Le, Lorenz Schwachhöfer . Information geometry - Springer 2017
- Shun-ichi Amari, Information Geometry and Its Applications. Springer Japan (2016), Applied Mathematical Sciences vol 194
- Ezra Miller, Bernd Sturmfels, Combinatorial Commutative Algebra, Graduate Texts in Mathematics, vol. 227, Springer-Verlag, New York City, 2005. ISBN 0-387-22356-8
- Geometric Science of Information - Third International Conference, GSI 2017, Paris, France, November 7-9, 2017, Proceedings. Editors: Nielsen, Frank, Barbaresco, Frédéric (Eds.)
- Geometric Science of Information - Second International Conference, GSI 2015 Author: Frank Nielsen, Frédéric Barbaresco (Eds.) Publisher: Springer(2015), Binding: Softcover, eBook
- Gabriel Peyré, Marco Cuturi Computational Optimal Transport 2018 arXiv:1803.00567
- Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2014) Volume 1641, Editor: AIP Publishing
- See all the results about large network dynamics from the European call on "Modeling multilevel dynamics"
CS-DC decentralized computational ecosystem (e-team)
The CS-DC decentralized computational ecosystem within the 2nd internet revolution: The CS-DC computational ecosystem (4nd commitment of our Cooperation Programme with UNESCO) will participate & contribute to the emergence of the global computational ecosystem federating heterogeneous network of IoT/laptops/GPU/HPC.
Given a multilevel problem to be solved and the adequacy in precision and time to solve it, the most efficient way to use distributed computational resources for solving it is a scientific challenge known as the « autonomic configuration » and « autonomic workflow » according to the particular multilevel reconstruction.
Thus each multilevel reconstruction challenge (especially during the olympiads of multilevel models) will require distributed computational resources IoT/laptops/GPU/HPC to be combined in the most efficient way (cost, energy). Such combination is depending on the particular multilevel models to be reconstructed.
For the main problem of §1, i.e. a data science problem, it is expected that « Machine learning » and « evolutionary computation » methods are relevant: fortunately, these two methods are parallelizable with HPC on GPU, i.e. the most powerful and cheap way to implement such methods.
- Carlos Barrios (chair)
- Pierre Collet (co-chair)
- Nadine Peyriéras (co-chair)
- See the international conferences on « autonomic configuration » and « autonomic workflow » since 10 years
- See the IeX ICO in Ethereum
RAPSODY & Trust in algorithm
A common principle for each WWW DAO Dapp is the 'trust in algorithms': an algorithm has to do what we believe it is doing. It means that any Dapp has to be 'certified' as ‘implementing its logical and mathematical specifications’:
- the compilers have to be certified (e.g. Haskell, OCaml, Scheme, ..) or are compiling into certified language (e.g. Pyramid Scheme into the Virtual Ethereum Machine, the demonstrator Coq is compiling into Haskell, OCaml, Scheme and is producing a LaTeX report)
- giving an algebraic mathematical problem to be solved and existing solvers, formal calculus can be used for rewriting the problem into the existing solvers
As a result, a new Dapp (as a problem to be solved) has to be built with certified Dapps (as problems already solved)
RAPSODY will use SAGE for bringing 'trust in algorithms'. It will be used internally for the fork between two or more ways of implementing. It will be used externally during the Olympiads of models.
SAGE can be used as a 'mathematical programming' language for specifying algebraic mathematical problems, existing solvers and theorems for reformulating a problem into its solvers (i.e. already solved problems, e.g. existing Dapps). Mathematics as a language has a precise meaning that is the same in all the human languages. SAGE is thus bringing concision, clarity, language neutrality and precise meaning. The mathematical specification of a problem is not only understandable by mathematicians but also, beyond, by a large part of human population. It is indeed especially the case in RAPSODY because phenomenological and theoretical reconstruction are essentially based on algebraic topology and algebraic geometry: topological and geometrical problems and theorems can be easily drawn and understandable by any human; and the equivalent algebraic part can be exactly executed by computers.
SAGE has its own 'Mathematical Knowledge Base' (mathKB). This mathKB will be augmented when necessary from the mathematical part of the IKB (CS-DC Integrated Knowledge Base) and from the mathematical transcription of ArXiv in MathML.
The different RAPSODY challenges correspond to a little number of well posed mathematical problems with standard solving methods. But RAPSODY Dapps in WWW DAO are providing advices that have to be as best as possible. The role of Olympiads is to compare different methods providing better advices than the standard current method. The competitors are using SAGE as a facility as well as for 'trust in algorithm'. They are using the CS-DC decentralized computational ecosystem.
The grading of competitors involves weighted multicriteria:
- reducing the measured distance ∆ between advices and choices
- computational tractability as measured by the CS-DC decentralized computational ecosystem above
- mathematical and logical soundness w.r.t. the specifications
The Jury will grade and rank competitors with the 'MAJ Dap'. It will recommend or not to substitute the first Prize method as the new standard method until the next Olympiad.
The Jury is the scientific committee of the Dapp with adjonction of some senior scientist for their special expertise. It is playing the same role with the same criteria for the internal fork of methods.
- Paul Bourgine (chair)
- Pierre Baudot (co-chair)
- Nadine Peyriéras (co-chair)
- Juan Simoes, Mark Hammons, Pierre Parrend, Pierre Collet, Linda Oulebsir Boumghar, Nadine Peyriéras, Masatoshi Funabashi, Salma Mesmoudi
- EASYcloud for decentralized resolution of optimisation problems by 'evolutionary computing'. Notice that evolutionary computing has a superlinear solving acceleration when the computation is distributed in an increasing computational places.
- BioEmergences for reconstructing and tracking the image components (i.e. a CW complex) of a time lapse + management of cohorts
- Free software of the Machine Learning Community
- Free software of the Geometrical Science of Information (GSI) and Topological Science of Information (TSI)
- Free software attached to the ArXiv Knowledge Base in MathML
- Pandoc for the translation of a formatted text into other formats.
- MathML: https://fr.wikipedia.org/wiki/MathML