COMPUTATIONAL METHODS FOR LARGE SYSTEMS
COMPUTATIONAL METHODS FOR LARGE SYSTEMS Electronic Structure Approaches for Bi...

Author:
Reimers J.R. (ed.)

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

COMPUTATIONAL METHODS FOR LARGE SYSTEMS

COMPUTATIONAL METHODS FOR LARGE SYSTEMS Electronic Structure Approaches for Biotechnology and Nanotechnology

Edited by

Jeffrey R. Reimers

A JOHN WILEY & SONS, INC., PUBLICATION

Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Computational methods for large systems : electronic structure approaches for biotechnology and nanotechnology / [edited by] Jeffrey R. Reimers. p. cm. Includes index. ISBN 978-0-470-48788-4 (hardback) 1. Nanostructured materials–Computer simulation. 2. Nanotechnology– Data processing. 3. Biotechnology– Data processing. 4. Electronics–Materials–Computer simulation. I. Reimers, Jeffrey R. TA418.9.N35C6824 2011 620 .50285– dc22 2010028359 Printed in Singapore oBook ISBN: 978047093077-9 ePDF ISBN: 978047093076-2 ePub ISBN: 978047093472-2 10 9 8 7 6 5 4 3 2 1

To Noel Hush who showed me the importance of doing things to understand the critical experiments of the day and the need for simple models of complex phenomena, and to George Bacskay who taught me the importance of getting the right answer for the right reason.

Contents Contributors

xiii

Preface: Choosing the Right Method for Your Problem

xvii

A

DFT: THE BASIC WORKHORSE

1

1

Principles of Density Functional Theory: Equilibrium and Nonequilibrium Applications

3

Ferdinand Evers

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 2

Equilibrium Theories, 3 Local Approximations, 8 Kohn–Sham Formulation, 11 Why DFT Is So Successful, 13 Exact Properties of DFTs, 14 Time-Dependent DFT, 19 TDDFT and Transport Calculations, 28 Modeling Reservoirs In and Out of Equilibrium,

34

SIESTA: A Linear-Scaling Method for Density Functional Calculations

45

Julian D. Gale

2.1 2.2 2.3 3

Introduction, 45 Methodology, 48 Future Perspectives, 73

Large-Scale Plane-Wave-Based Density Functional Theory: Formalism, Parallelization, and Applications

77

Eric Bylaska, Kiril Tsemekhman, Niranjan Govind, and Marat Valiev

3.1 3.2 3.3 3.4

Introduction, 78 Plane-Wave Basis Set, 79 Pseudopotential Plane-Wave Method, Charged Systems, 89

81

vii

viii

CONTENTS

3.5 3.6 3.7 3.8 3.9 3.10

Exact Exchange, 92 Wavefunction Optimization for Plane-Wave Methods, 95 Car–Parrinello Molecular Dynamics, 98 Parallelization, 101 AIMD Simulations of Highly Charged Ions in Solution, 106 Conclusions, 110

B

HIGHER-ACCURACY METHODS

117

4

Quantum Monte Carlo, Or, Solving the Many-Particle Schr¨odinger Equation Accurately While Retaining Favorable Scaling with System Size

119

Michael D. Towler

4.1 4.2 4.3 4.4 4.5 4.6 4.7 5

Introduction, 119 Variational Monte Carlo, 124 Wavefunctions and Their Optimization, Diffusion Monte Carlo, 137 Bits and Pieces, 146 Applications, 157 Conclusions, 160

127

Coupled-Cluster Calculations for Large Molecular and Extended Systems

167

Karol Kowalski, Jeff R. Hammond, Wibe A. de Jong, Peng-Dong Fan, Marat Valiev, Dunyou Wang, and Niranjan Govind

5.1 5.2 5.3 5.4 5.5 6

Introduction, 168 Theory, 168 General Structure of Parallel Coupled-Cluster Codes, 174 Large-Scale Coupled-Cluster Calculations, 179 Conclusions, 194

Strongly Correlated Electrons: Renormalized Band Structure Theory and Quantum Chemical Methods

201

Liviu Hozoi and Peter Fulde

6.1 6.2 6.3 6.4 6.5

Introduction, 201 Measure of the Strength of Electron Correlations, Renormalized Band Structure Theory, 206 Quantum Chemical Methods, 208 Conclusions, 221

204

CONTENTS

ix

C

MORE-ECONOMICAL METHODS

225

7

The Energy-Based Fragmentation Approach for Ab Initio Calculations of Large Systems

227

Wei Li, Weijie Hua, Tao Fang, and Shuhua Li

7.1 7.2 7.3 7.4 7.5 8

Introduction, 227 The Energy-Based Fragmentation Approach and Its Generalized Version, 230 Results and Discussion, 238 Conclusions, 251 Appendix: Illustrative Example of the GEBF Procedure, 252

MNDO-like Semiempirical Molecular Orbital Theory and Its Application to Large Systems

259

Timothy Clark and James J. P. Stewart

8.1 8.2 8.3 8.4 9

Basic Theory, 259 Parameterization, 271 Natural History or Evolution of MNDO-like Methods, Large Systems, 281

278

Self-Consistent-Charge Density Functional Tight-Binding Method: An Efficient Approximation of Density Functional Theory

287

Marcus Elstner and Michael Gaus

9.1 9.2 9.3 9.4 9.5

Introduction, 287 Theory, 289 Performance of Standard SCC-DFTB, 300 Extensions of Standard SCC-DFTB, 302 Conclusions, 304

10 Introduction to Effective Low-Energy Hamiltonians in Condensed Matter Physics and Chemistry Ben J. Powell

10.1 10.2 10.3 10.4 10.5

Brief Introduction to Second Quantization Notation, 310 H¨uckel or Tight-Binding Model, 314 Hubbard Model, 326 Heisenberg Model, 339 Other Effective Low-Energy Hamiltonians for Correlated Electrons, 349

309

x

CONTENTS

10.6 10.7

D

Holstein Model, 353 Effective Hamiltonian or Semiempirical Model?,

358

ADVANCED APPLICATIONS

367

11 SIESTA: Properties and Applications

369

Michael J. Ford

11.1 11.2 11.3 11.4

Ethynylbenzene Adsorption on Au(111), 370 Dimerization of Thiols on Au(111), 377 Molecular Dynamics of Nanoparticles, 384 Applications to Large Numbers of Atoms, 387

12 Modeling Photobiology Using Quantum Mechanics and Quantum Mechanics/Molecular Mechanics Calculations

397

Xin Li, Lung Wa Chung, and Keiji Morokuma

12.1 12.2 12.3 12.4

Introduction, 397 Computational Strategies: Methods and Models, Applications, 410 Conclusions, 425

400

13 Computational Methods for Modeling Free-Radical Polymerization

435

Michelle L. Coote and Ching Y. Lin

13.1 13.2 13.3 13.4 13.5

Introduction, 435 Model Reactions for Free-Radical Polymerization Kinetics, 441 Electronic Structure Methods, 444 Calculation of Kinetics and Thermodynamics, 457 Conclusions, 468

14 Evaluation of Nonlinear Optical Properties of Large Conjugated Molecular Systems by Long-Range-Corrected Density Functional Theory Hideo Sekino, Akihide Miyazaki, Jong-Won Song, and Kimihiko Hirao

14.1 14.2 14.3 14.4 14.5

Introduction, 476 Nonlinear Optical Response Theory, 478 Long-Range-Corrected Density Functional Theory, 480 Evaluation of Hyperpolarizability for Long Conjugated Systems, 482 Conclusions, 488

475

CONTENTS

15 Calculating the Raman and HyperRaman Spectra of Large Molecules and Molecules Interacting with Nanoparticles

xi

493

Nicholas Valley, Lasse Jensen, Jochen Autschbach, and George C. Schatz

15.1 15.2 15.3 15.4

Introduction, 494 Displacement of Coordinates Along Normal Modes, 496 Calculation of Polarizabilities Using TDDFT, 496 Derivatives of the Polarizabilities with Respect to Normal Modes, 500 15.5 Orientation Averaging, 501 15.6 Differential Cross Sections, 502 15.7 Surface-Enhanced Raman and HyperRaman Spectra, 506 15.8 Application of Tensor Rotations to Raman Spectra for Specific Surface Orientations, 507 15.9 Resonance Raman, 508 15.10 Determination of Resonant Wavelength, 509 15.11 Summary, 511 16 Metal Surfaces and Interfaces: Properties from Density Functional Theory

515

Irene Yarovsky, Michelle J. S. Spencer, and Ian K. Snook

16.1 16.2 16.3 16.4 16.5

Background, Goals, and Outline, 515 Methodology, 517 Structure and Properties of Iron Surfaces, 521 Structure and Properties of Iron Interfaces, 538 Summary, Conclusions, and Future Work, 553

17 Surface Chemistry and Catalysis from Ab Initio–Based Multiscale Approaches

561

Catherine Stampfl and Simone Piccinin

17.1 17.2 17.3 17.4 17.5

Introduction, 561 Predicting Surface Structures and Phase Transitions, 563 Surface Phase Diagrams from Ab Initio Atomistic Thermodynamics, 568 Catalysis and Diffusion from Ab Initio Kinetic Monte Carlo Simulations, 576 Summary, 584

18 Molecular Spintronics Woo Youn Kim and Kwang S. Kim

18.1 18.2 18.3

Introduction, 589 Theoretical Background, 591 Numerical Implementation, 600

589

xii

CONTENTS

18.4 18.5

Examples, 604 Conclusions, 612

19 Calculating Molecular Conductance

615

Gemma C. Solomon and Mark A. Ratner

19.1 19.2 19.3 19.4 19.5 19.6 19.7 Index

Introduction, 615 Outline of the NEGF Approach, 617 Electronic Structure Challenges, 623 Chemical Trends, 625 Features of Electronic Transport, 630 Applications, 634 Conclusions, 639 649

Contributors

Jochen Autschbach,

University at Buffalo–SUNY, Buffalo, New York

Eric Bylaska, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Lung Wa Chung, Fukui Institute for Fundamental Chemistry, Kyoto University, Kyoto, Japan Timothy Clark, Computer-Chemie-Centrum, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany Michelle L. Coote, ARC Centre of Excellence for Free-Radical Chemistry and Biotechnology, Research School of Chemistry, Australian National University, Canberra, Australia Wibe A. de Jong, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Marcus Elstner, Institute of Physical Chemistry, Universit¨at Karlsruhe, Karlsruhe, Germany; Institute for Physical and Theoretical Chemistry, Technische Universit¨at Braunschweig, Braunschweig, Germany Ferdinand Evers, Institute of Nanotechnology and Institut f¨ur Theorie der Kondensierten Materie, Karlsruhe Institute of Technology, Karlsruhe, Germany Peng-Dong Fan, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Tao Fang, School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing, China Michael J. Ford, School of Physics and Advanced Materials, University of Technology, Sydney, NSW, Australia Peter Fulde, Max-Planck-Institut f¨ur Physik komplexer Systeme, Dresden, Germany; Asia Pacific Center for Theoretical Physics, Pohang, Korea Julian D. Gale, Department of Chemistry, Curtin University, Perth, Australia xiii

xiv

CONTRIBUTORS

Michael Gaus, Institute of Physical Chemistry, Universit¨at Karlsruhe, Karlsruhe, Germany; Institute for Physical and Theoretical Chemistry, Technische Universit¨at Braunschweig, Braunschweig, Germany Niranjan Govind, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Jeff R. Hammond, The University of Chicago, Chicago, Illinois Kimihiko Hirao,

Advanced Science Institute, RIKEN, Saitama, Japan

Liviu Hozoi, Max-Planck-Institut f¨ur Physik komplexer Systeme, Dresden, Germany Weijie Hua, School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing, China Lasse Jensen,

Pennsylvania State University, University Park, Pennsylvania

Kwang S. Kim, Center for Superfunctional Materials, Pohang University of Science and Technology, Pohang, Korea Woo Youn Kim, Center for Superfunctional Materials, Pohang University of Science and Technology, Pohang, Korea Karol Kowalski, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Shuhua Li, School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing, China Wei Li, School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing, China Xin Li, Fukui Institute for Fundamental Chemistry, Kyoto University, Kyoto, Japan Ching Y. Lin, ARC Centre of Excellence for Free-Radical Chemistry and Biotechnology, Research School of Chemistry, Australian National University, Canberra, Australia Akihide Miyazaki, Toyohashi University of Technology, Toyohashi, Japan Keiji Morokuma, Fukui Institute for Fundamental Chemistry, Kyoto University, Kyoto, Japan; Cherry L. Emerson Center for Scientific Computation and Department of Chemistry, Emory University, Atlanta, Georgia Simone Piccinin, CNR-INFM DEMOCRITOS National Simulation Center, [email protected] Group, Trieste, Italy

CONTRIBUTORS

xv

Ben J. Powell, Centre for Organic Photonics and Electronics, School of Mathematics and Physics, The University of Queensland, Queensland, Australia Mark A. Ratner, Northwestern University, Evanston, Illinois George C. Schatz, Northwestern University, Evanston, Illinois Hideo Sekino, Toyohashi University of Technology, Toyohashi, Japan Ian K. Snook, Applied Physics, School of Applied Sciences, RMIT University, Victoria, Australia Gemma C. Solomon, Northwestern University, Evanston, Illinois Jong-Won Song, Advanced Science Institute, RIKEN, Saitama, Japan Michelle J. S. Spencer, Applied Physics, School of Applied Sciences, RMIT University, Victoria, Australia Catherine Stampfl, School of Physics, The University of Sydney, Sydney, Australia James J. P. Stewart, Stewart Computational Chemistry, Colorado Springs, Colorado Michael D. Towler, TCM Group, Cavendish Laboratory, Cambridge University, Cambridge, UK Kiril Tsemekhman, University of Washington, Seattle, Washington Marat Valiev, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Nicholas Valley, Northwestern University, Evanston, Illinois Dunyou Wang, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Irene Yarovsky, Applied Physics, School of Applied Sciences, RMIT University, Victoria, Australia

Preface: Choosing the Right Method for Your Problem Computational methods have now advanced to the point where there is choice available for almost any problem in nanotechnology and biotechnology. In this book, the various methods available are presented and applications developed. Given the difficulty in solving (relativistic) quantum mechanical equations for systems containing thousands of atoms, this situation is truly amazing and demonstrates the results of dedicated work by many researchers over a long period of time. Once demeaned by researchers as being useless for everything practical, computational methods have come into their own, providing fresh insight and predictive design power for wide-ranging problems: from superconductivity to semiconductivity to giant magnetoresistance to molecular electronics to spintronics to natural and synthetic polymer composition and properties to color design to nonlinear optics to energy flow to electron transport to catalysis to protein function to drug design. Although much modern software is to be commended for its accessibility and ease of use, this advantage can be a luring trap. Electronic structure calculations on systems of any size are never simple. Many things can go wrong, and just because a method has always done the job in the past doesn’t mean that it will continue to do so for a new problem that may appear very similar but which in fact embodies an additional unexpected effect. Proper understanding of the methods, including their strengths and weaknesses, is always essential. This book sets out to provide the background required for a range of approaches, containing extensive literature references to many of the subtle features that can arise. Practical examples of how this knowledge should be applied are then given. Amazing as progress has been, many significant problems in physics, chemistry, biology, and engineering will forever remain outside the reach of direct quantum mechanical electronic structure calculations. By no means does this mean that the technologies now available cannot be usefully employed to tackle these problems, however, and a significant part of this book is devoted to multiscale-linking methods. For example, the surfaces of most heterogeneous catalysts are extremely complex, and hundreds of chemical reactions may be involved. Applications of this type of problem include the combustion of fossil fuels, atmospheric pollution modeling, and many industrial chemical reactions and smelting processes. Natural and synthetic polymers present similar challenges. What existing electronic structure methods offer is the data to go into more complex, perhaps multiscale models of the phenomena. Other xvii

xviii

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

examples in quite different areas include protein folding, biological processes on the microsecond-to-second time scale, including the origin of intelligence, and long-range strong electron correlations in superconductors and other materials. The fortunate position that we are in today is owed primarily to the development of density functional theory (DFT). This is the basic workhorse for electronic structure computations on large systems, being appropriate for biological, chemical, and physical problems. Part A of the book is devoted to the fundamentals of DFT, stressing the basics in Chapter 1 and then its two most common implementations strategies, atomic basis sets in Chapter 2 and planewave basis sets in Chapter 3. In the early days, atomic basis sets were designed to solve the burning issues at the time, such as the nature of the hydrogen molecule and the water molecule, while plane-wave basis sets could tackle problems of similar difficulty, such as the structure of simple metals. Today, both types of methods can be applied to almost any problem, each with its own advantages and disadvantages. An important feature of Chapter 1 is that it describes not only traditional DFT for the ground state of molecules and materials but also modern time-dependent approaches designed for excited states and nonequilibrium transport environments. Deliberately missing from this book is an extensive discussion of which density functional to use. This may seem a terrible oversight in a book that is really intended as a practical tool for a new science. DFT gives the exact answer if the exact density functional is used, but alas this is unknown and perhaps even unknowable. So what we now have is a situation in which computational programs can let the user select between hundreds of proposed approximate functionals, or even make a new one. However, from a practical perspective, the situation is not that bad. Only a handful of density functionals are in common use, with just 14 mentioned in this book (B3LYP, B97D, BLYP, BOP, BP86, CAM-B3LYP, LC-BOP, LDA, LDA+DMFT, LDA+U, PBE, PBE0, PW91, and SOAP), with the most commonly used functionals being B3LYP, LDA, PBE, and PW91. B3LYP is the most commonly used functional for chemical problems, owing to its inclusion of more physical effects, whereas PW91 and PBE are the most commonly used functionals in the physics community, as they are typically good enough in these applications and are much faster to implement. A density functional is not a single unit but usually comes as a combination of various parts, each intended to include some physical effect. Choosing a functional that includes all of the physical effects relevant to a particular application is thus essential. In this book the applications chapters provide significant discussion as to which functionals are appropriate for common applications. Many specialized functionals exist that are not discussed, so although the book describes what is good for most, experienced users should be aware that other attractive options do exist. The most common physical effects included in modern density functionals are short-range correlation, short-range exchange, long-range correlation, long-range exchange, asymptotic correction, and strong correlation. All density functionals include short-range correlation and short-range exchange, with LDA including

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

xix

only these contributions and thus being one of the simplest and most computationally expedient functionals available. LDA gives the exact answer for the free-electron gas, a problem to which many simple metals can realistically be compared. When the nature of the atomic nuclei become important, this functional takes the wrong qualitative form, however. Nevertheless, it provides a useful point even in the worst-case scenarios and hence forms a simple and useful approach. It does not provide results of sufficient accuracy to address any chemical question, however, so its realistic use is confined to a few problems involving simple metals. The next simplest functionals improve on LDA by adding a derivative correction to the local correlation description and are generically termed generalized-gradient (GGA) approximations, with classic functionals of this type including BP86, PW91, and PBE. In general, GGAs provide descriptions that attain chemical accuracy and hence can be widely applied. Sometimes LDA provides results in better agreement with experiment than common GGAs, however, and researchers are thus tempted simply to use LDA. This is a very bad practice, as GGAs always contain more of the essential physics than does LDA, and what is required instead is to move to a more complex functional that includes even more interactions. Get the right answer for the right reason. In widespread use for chemical properties are hybrid functionals such as B3LYP and PBE0, which include long-range exchange contributions in the density functional. This improves magnetic properties, long-range interactions, excited- and transition-state energetics, and so on. Such methods are intrinsically much more expensive than GGAs, however. Recent advances of great relevance to biological simulations include the development of density functionals containing long-range exchange, such as B97D, as is required to model dispersive van der Waals intermolecular interactions. As the exchange and correlation parts of the density functionals are obtained independently, physical constraints concerning their balance are not usually met, leading to errors in their properties at long range that become important for charge separation processes, extended conjugation, band alignment at metal–molecule interfaces, and so on. Modern functionals such as CAM-B3LYP and LC-BOP contain corrections that reestablish the proper balance, improving properties computed. Finally, approaches such as LDA+U provide explicit empirical corrections for the extremely short range, strong electron correlation effects that dominate the chemistry of the rare earth elements, for example, and are often relevant for metal-to-insulator transitions and superconductivity. Over the next decade, the future for density functional theory looks bright. There is much current interest not only in developing corrections to account for the shortcomings of standard GGA-type functionals, but there is also keen interest in developing new classes of functionals that contain intrinsically the correct asymptotic properties for electrons in molecules. This should dramatically simplify functional design and implementation, making the use of DFT much easier for users. Certainly the most significant issue with current implementations of DFT is that no systematic process exists for improving functionals toward the illusive

xx

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

exact functional. This is where alternative computational strategies of an ab initio nature can be very useful. Part B of the book looks at methods that can be used when modern DFT just doesn’t work. Historically, the most common ab initio method for electronic structure calculation has been Hartree–Fock configurationinteraction theory. This involves use of a simplistic approximation, that proposed by Hartree and Fock, followed by expansions that converge to or even explicitly determine the exact answer (within the basis set used). The Hartree–Fock approximation itself is about as accurate as LDA and is not suitable for studying chemical problems, but like LDA can provide good insight into the operation of more realistic approaches. Although codes exist that can in principle give the exact solution to any problem, in practice this can only be achieved for the smallest systems, certainly nothing of relevance to this book. As a result, some empirically determined level of truncation of the ab initio expansion is necessary (coupled to a choice of basis set, of course), making their practical use rather similar to that of DFT—always find out what works for your problem using model systems for which the correct answer is known. The coupled-cluster method provides the “gold standard” for chemical problems, often producing results to an order-of-magnitude higher accuracy than can be achieved by DFT, but at much greater computational expense. Nevertheless, how such methods can be applied to large systems of nanotechnological and biotechnological relevance is shown in Chapter 5. These methods fail for metals, however, and so are less popular in solid-state physics applications. They handle strong electron correlations properly and easily, of course, and how they may be combined with DFT to solve such key problems as those relevant to metal–insulator transitions and superconductivity, the combination allowing the strengths of each method to be exploited while circumventing the weaknesses, is described in Chapter 6. Hartree–Fock-based approaches will always scale extremely poorly as the system size increases, and an alternative ab initio method exists that scales much better while being applicable to molecules and metals alike: quantum Monte Carlo. The problem with this method has always been its startup cost, as even the simplest systems require enormous computational effort. But the time has now come where algorithms and computers are fast enough to solve many chemical and physical problems to a specifiable degree of accuracy. The method has come of age, and these advances are reviewed in Chapter 4. Because of the excellent scaling properties of this method, applications to larger and larger systems can now be expected to appear at a rapid rate. But no matter how far computational methods such as DFT, configuration interaction, or quantum Monte Carlo methods advance, the researcher will hunger for the ability to treat larger systems, even if at a more approximate level. Part C of this book addresses these needs. Chapter 7 covers approximate but accurate schemes for implementing DFT and other methods that allow complex systems to be broken down into discrete fragments, achieving considerable computational savings while allowing chemical intuition to be used to ensure accuracy. Chapter 8 describes semiempirical Hartree–Fock-based approaches in which most of the interactions are neglected and the remainder parameterized,

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

xxi

leaving a priori computation schemes that at times achieve chemical accuracy and are available for all atoms except the rare earths. A similar approach, but this time modeled after DFT, is described in Chapter 9. The DFT approach widely applicable to both biological systems and materials science but requires parameters to be determined for every pair of atoms in the periodic table, providing increased accuracy at the expense of severe implementational complexity. It is now sufficiently parameterized to meet wide-ranging needs in biotechnology and nanotechnology. Even so, some problems, such as superconductivity and the Kondo effect, require the study of electron correlations on length scales well beyond the reach of semiempirical electronic structure calculations. In Chapter 10 we look at a range of basic chemical models that describe the essential features of such systems empirically, leaving out all nonessential aspects of the phenomena in question. These methods follow from the analytical models used to put together the basics of chemical bonding and band structure theories in the 1930s–1960s, with the semiempirical methods described in Chapters 8 and 9 also originating from these sources. Accurate electronic structure calculations remain important, but in Chapter 10 we see that they only need to be applied to model systems to generate the empirical parameters that go in the electronic structure problem of the full system. So, no matter what the size of the system, electronic structure methods are now in a position to contribute to the modeling of real-world problems in nanotechnology and biotechnology. Choosing whether to use empirical models parameterized by high-level calculations, use the DFT workhorse, or use methods that allow systematic improvement toward the exact answer is now a pleasant problem for researchers to ponder. Just because a certain type of problem has been solved historically by one type of approach does not mean that this is the best thing to do now . I hope that this book will allow informed choices to be made and set new directions for the future. Part D presents applications of electronic structure methods to nanoparticle and graphene structure (Chapter 11), photobiology (Chapter 12), control of polymerization processes (Chapter 13), nonlinear optics (Chapter 14), nanoparticle optics (Chapter 15), heterogeneous catalysis (Chapters 16 and 17), spintronics (Chapter 18), and molecular electronics (Chapter 19). This book has its origins in the Computational Methods for Large Systems satellite meeting at the very successful WATOC-2008 conference organized by Leo Radom in Sydney, Australia. I hope the book captures some of the excitement of that meeting and the overwhelming feeling that we are now at the tip of an enormous expansion of electronic structure computation into everyday research in newly emerging technologies and sciences. I have had a go at most things described in this book at some stage of my career, and can vouch for a lot of it. As for the rest, well, they are things that I always wanted to do! I hope that you enjoy reading the book as much as I have enjoyed editing it.

xxii

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

Color Figures

Color versions of selected figures can be found online at ftp://ftp.wiley.com/public/sci_tech_med/computational_methods Acknowledgments

I would like to thank Dianne Fisher and Rebecca Jacob for their help in assembling the book, Anita Lekhwani at Wiley for the suggestion of making a book based around WATOC-2008, Leo Radom for organizing WATOC-2008, and the many referees whose anonymous but difficult work helped so much with its production. Jeffrey R. Reimers School of Chemistry The University of Sydney January 2010

PART A DFT: The Basic Workhorse

1

Principles of Density Functional Theory: Equilibrium and Nonequilibrium Applications FERDINAND EVERS Institute of Nanotechnology and Institut f¨ur Theorie der Kondensierten Materie, Karlsruhe Institute of Technology, Karlsruhe, Germany

Arguably, the most important method for electronic structure calculations in intermediate- and large-scale atomic or molecular systems is density functional theory (DFT). In this introductory chapter we discuss fundamental theoretical aspects underlying this framework. Our aim is twofold. First, we briefly explain our view on several aspects of DFTs as we understand them. Second, we discuss the fundamentals underlying applications of DFT to transport problems. Here, we offer a derivation of the salient equations which is based on single-particle scattering theory; the more standard approach relies on the nonequilibrium Green’s function (or Keldysh) technique. More practical aspects of applying DFT to large systems such as nanoparticles, liquids, large molecules, and proteins are described in Chapter 2 (using atomic basis sets) and Chapter 3 (using plane-wave basis sets). Other recent reviews of basic application procedures by K¨ummel and Kronik1 and Neese2 are also available. Chapters 11 to 19 focus on applications, introducing extensions of the basis methods when required. 1.1 EQUILIBRIUM THEORIES

The interacting N -electron problem is a formidable challenge for the theoretical disciplines of physics and chemistry. It is formulated in terms of a Hamiltonian, Hˆ , which has the general structure Hˆ =

i

[ε(pˆ i ) + vex (rˆ i )] +

1 u(rˆ i − rˆ j ) 2 ij

(1.1)

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

3

4

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Here we have introduced the following notation: vex describes the system-specific time-independent external potential, which is generated, for example, due to the atomic nuclei. ε(p) denotes the dispersion of the free particle, establishing the relation between the momentum of the particle and its energy in free space (i.e., in the absence of vex and the third term in u). For example, a single free particle with mass m has a dispersion ε(p) = p2 /2m. The third term introduces the twoparticle interactions [e.g., u(r) = e2 /|r| for the Coulomb case]. (We indicate an operator by Oˆ to distinguish it from its eigen- or expectation values.) Density functional theory in its simplest incarnation serves to calculate several ground-state (GS) properties of this interacting many-body system. For example, one obtains the GS particle density, n(r), the GS energy, E0 , or the workfunction (ionization energy), W. DFT owes its attractiveness to the fact that all of this can be obtained, in principle, by solving an optimization problem for the GS density alone without going through the often impractical explicit calculation of the GS wavefunction, 0 , of the Hamiltonian (1.1). The actual task is to find a density profile, n(r), so that the functional inside the brackets, ˜ + drvex (r)n(r) E0 = min F [n] ˜ (1.2) n(r) ˜

is invariant under small variations, δn(r). ˜ Here F is a certain functional of the test density n(r) ˜ that depends on the free dispersion, ε(p), and the type of twoparticle interactions, but not on the (static) environment, vex (r). [The explicit definition of F is given in Eq. (1.10)]. The optimizing density coincides with the GS density and the optimum value of the functionals inside brackets delivers the GS energy. 1.1.1 Density as the Basic Variable

At first sight, the very existence of a formalism that allows us to obtain the GS properties mentioned without evaluating 0 itself may perhaps be surprising. After all, the particle density appears to involve a lot fewer degrees of freedom than 0 , which is the canonical starting point for calculation of the expectation values of the observables. Indeed, 0 (r1 , . . . , rN ) is a complex field that depends on the individual coordinates of each of the N particles. By contrast, the density is an expectation value of the density operator: n(r) ˆ =

N

δ(r − rˆ i )

(1.3)

i=1

which may be obtained by integrating out most of the coordinates (“details”) of 0 : (1.4) n(r) = dr1 · · · drN δ(r − ri )|0 (r1 , . . . , rN )|2 i

n(r) is a real field depending on a single coordinate only.

EQUILIBRIUM THEORIES

5

At a second glance, however, the essential concepts underlying DFT are quite naturally understood. From a certain perspective, most of the information content of the ground state 0 is redundant. To see why this is a case, we discuss an example. Consider all thermodynamic properties of a system described by the Hamiltonian (1.1). Each property corresponds to calculating some ratio of expectation values: O=

ˆ −βHˆ ] Tr[Oe Tr[e−βHˆ ]

(1.5)

with an inverse temperature, β = 1/kT , and Oˆ denoting the operator corresponding to the observable of interest. The important thing to notice is that the system characteristics enter the average only via Hˆ . Therefore, within a given set of systems with members sharing the same kinetic energy and two-body interaction (“universality class”), all system specifics (i.e., observables) are determined uniquely by specifying the external potential , so O is a functional of vex: O[vex ]. This simple observation already implies that within such a universality class, the system behavior can be reconstructed from knowledge of a scalar field [here vex (r)], and in this sense most of the information content of 0 is redundant. In the Schr¨odinger theory, the classifying scalar field is the external potential. DFT amounts to a change of variables that replaces vex (r) → n(r). Such a transformation is feasible because the density operator and the external potential v ( r ˆ ) = drvex (r)n(r). ˆ Therefore, the average vex enter Hˆ as a product, N i=1 ex i density and vex are conjugate variables and a relation n(r) =

∂E0 [vex ] ∂vex (r)

(1.6)

holds true. Under the assumption that Eq. (1.6) can be inverted (at least “piecewise”), we can employ a Legendre transformation to facilitate the change in variables from vex to n: (1.7) F [n] = E0 [vex ] − dr n(r)vex (r) where the external potential is now the dependent variable given by vex (r) =

−∂F [n] ∂n(r)

(1.8)

Thus, it is suggested that the density n can also be considered the fundamental variable, so that observables are functionals thereof. The ground-state energy is an example of this. Summarizing: Underlying DFT is the insight that within a given universality class, each physical system can be identified uniquely either by the belonging “environment,” vex (r), or by its GS density, n(r). Therefore, in principle, knowing just the ground-state density is enough information to determine any observable (equilibrium) quantity of a many-body system.

6

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Remarks

• •

A formal proof that the density can act as the fundamental variable was presented by Hohenberg and Kohn3 ; see Section 1.1.1. A generalization of DFT to spin or current DFT may be indicated for systems with degeneracies. Then additional fields such as magnetization and current density are needed to distinguish among the system states.

1.1.2 Variational Principle and Levy’s Proof

Just the mere statement that equilibrium expectation values of observables can be calculated from some functionals once the GS density, n, is known, is not very helpful. For DFT to be self-consistent, also needed is a procedure to obtain this GS density by not referring to anything other than the functionals of n itself. This is where the variational principle kicks in, which says that the GS has a unique property in that it minimizes the system’s total energy. This implies, in particular, that the GS has a density that minimizes (for a fixed environment vex ) the functional E0 [n]. Hence, we can find n by solving the optimization problem (1.2), involving only variations of the density. A particularly instructive derivation of Eq. (1.2) has been given by Levy.4 We summarize the essential logical steps, to remind ourselves that the connection between the variational principle and DFT is actually deep and not related only to practical matters. In fact, Levy’s proof starts with the variational principle for the GS. It implies that there is a configuration space, C, of totally anti˜ with the normalization property N = dr | ˜ n(r)| ˜ symmetric functions, , ˆ ), ˆ ˜ ˜ ˜ together with a functional E[] = |H | defined on this space, which is optimized by the GS, 0 , with the GS energy, E0 , being the optimum value; explicitly, ˜ = | ˜ Tˆ + Uˆ | ˜ + E[]

dr vex (r)n(r) ˜

(1.9)

where Tˆ abbreviates the kinetic energy and Uˆ the interaction energy appearing ˜ The trick in Levy’s in Eq. (1.1), and n˜ is the particle density associated with . argument is to organize the minimum search in two steps. In the first step the total configuration space, C, is subdivided into subspaces such that all wavefunctions ˜ n(r)| ˜ Next, inside a given subspace have identical density profiles n˜ = | ˆ . within each subspace a search is launched for the elements that minimize E. Thus, a submanifold, Mpreopt , is identified which contains a set of “preoptimized” elements. By construction, each element n˜ of Mpreopt is uniquely labeled by the associated density profile n˜ (see Fig. 1.1). In the second step, the minimum search is continued, but it can now be restricted to finding the one element, 0 , of Mpreopt that minimizes E. The motivation behind this particular way of organizing the search is the following: The division procedure in step 1 has been constructed such that the second term in Eq. (1.9) does not contribute to preoptimizing; within a given

EQUILIBRIUM THEORIES

7

preopt

~ n3 ~ n1

~ n2

Fig. 1.1 (color online) Schematic Al representation of the constraint search strategy in C space. One sorts the space of all possible (i.e., antisymmetrized, normalizable) wavefunctions into submanifolds. By definition, wavefunctions belonging to the same submanifold generate the same density profile, n(r). ˜ Each submanifold has a wavefunction [n(r)] ˜ (at fixed external potential vex ), which has the lowest energy. These wavefunctions sit on a hypersurface (a “line”) in the configuration space which is parameterized by n(r). ˜ The surface is continuously connected if the evolution of [n(r)] ˜ with the density profile is smooth (i.e., if degenerate shells with more than one optimum state do not exist). (We identify with each other states that differ only by a spatially homogeneous phase.) Typically, for every external potential, vex , there is exactly one such surface. The groundstate energy is found by going over the surface and searching for the global energy minimum.

subspace it is just a constant. In this step, only the first term is minimized, with an extremal value, F [n] ˜ ≡ n˜ |Tˆ + Uˆ |n˜

(1.10)

The important observation is that by construction the functional F [n] ˜ is universal (i.e., independent of external conditions, vex ). (This statement is contained in the Hohenberg–Kohn theorem.3 ) Therefore, F is found by preoptimizing once and for all. After F has been identified, the calculation of system-specific properties (depending on vex ), which was described in Eq. (1.2), requires only a restricted search within the submanifold Mpreopt . The benefit is tremendous, since the volume to be searched, Mpreopt , is tiny compared to the original wavefunction space C. Remarks • F [n] has the exact property

∂F [n] ˜ + vex (r) = μ ∂ n(r) ˜ n=n ˜

8

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

• •

Proof: The ground-state density, n, is an extremal point by construction under the constraint N = dr n(r). ˜ Introducing a Lagrange parameter, μ, we can release theconstraint and perform an unrestricted search minimizing F [n] ˜ + μN + dr[vex (r) − μ]n(r). ˜ The claim follows after functional differentiation. The minimum search in Eq. (1.2) is in a space of scalar functions n, ˜ which have the property that they are “-representable”: For a given n(r) ˜ there ˜ n(r)| ˜ This is at least one element of C with the property n(r) ˜ = | ˆ . implies, for example, positivity: n˜ ≥ 0. We presented Levy’s argument for ground-state DFT. It is obvious, however, that the restriction to GS and the collective mode “density” was not crucial. Only the variational principle and a linear coupling of an environmental field to some collective mode (e.g., density, spin density, current density) should be kept. Therefore, generalizations of ground-state DFT to many other cases have been devised: for example, (equilibrium) thermodynamic DFT at nonzero temperature, magnetic properties (spin DFT and current DFT), and relativistic DFTs. Moreover, it has been shown that certain excited states can also be calculated exactly with a ground-state (spin) DFT. This happens when the Hamiltonian, Hˆ , exhibits symmetries, such as spin rotational invariance. Then the Hilbert space decomposes into invariant subspaces each carrying its own quantum number(s), q: for example, a spin multiplicity. The minimum search may then proceed in every subspace, separately, giving a separate functional Fq for each of them. The local q-minima thus obtained are valid eigenstates of the full Hamiltonian (Gunnarsson–Lundqvist theorem5 ).

1.2 LOCAL APPROXIMATIONS

The precise analytical dependency of the energy functional F [n] on the density n(r) is not known, of course. Available approximations employ knowledge, analytical and computational, about homogeneous interacting Fermi gases (i.e., the case vex = const). Indeed, it turns out that the homogeneous system also provides a very useful starting point to build up a zeroth-order description in the inhomogeneous environments that are relevant for describing atoms and molecules. 1.2.1 Homogeneous Electron Gas

Homogeneous gases are relatively simple. The particle density, n, is just a parameter and all functionals, which in general involve multiple spatial integrals over expressions involving n(r) at different positions in space (nonlocality property), turn into functions of n. Analytical expressions for them can usually be derived from perturbative treatments of E0 (n), which are justified in two limiting cases: where a control parameter, rs , is either very large or very small. For the homogeneous electron gas, rs can easily be identified: It is the ratio of two energies. The first energy is the typical strength of the interaction that two

LOCAL APPROXIMATIONS

9

particles feel in the electron gas in three-dimensional space: (e2 /ε0 )n1/3 . To see whether or not this energy is actually sizable, one should compare it to another energy. The correct energy scale to consider will be a measure of the kinetic energy of the particles. The average kinetic energy of a fermion depends on the gas density, n. To derive an explicit expression, we recall that due to the Pauli principle, all particles that share the same spin state must be in different momentum states, |p. Therefore, when filling up the volume, higher and higher momentum states, up to a maximum momentum value, pF , will be occupied. The kinetic energy of the particles occupying the highest-energy (Fermi energy) states, εF (n) ≡ ε(pF ), will be a good measure for the typical kinetic energy of a gas particle. The situation is best visualized recalling the familiar quantum mechanical textbook problem of “a particle in a box” with box size L. The energy levels of the box can be ordered according to the number of nodes exhibited by the corresponding wavefunctions. The spatial distance between two nodes gives half the wavelength, λ/2, with an associated wavenumber k = 2π/λ and momentum p = k. The maximum wavelength reached by N particles (with spin 12 ) filling the box is λF /2 = L/(N/2) = 2/n, giving rise to a maximum wavenumber, the Fermi wavenumber kF = πn/2, and a maximum momentum pF = kF . In three dimensions, similar considerations yield πkF3 /3 = (2π)3 (n/2). Employing these results, our dimensionless parameter can now be specified as rs ∼ e2 n1/3 /ε0 εF (n), which conventionally is cast into the form 1 4π 3 rs = 3 3 na0 stipulating a parabolic dispersion ε(p) = p2 /2m (ε0 : effective dielectric constant; ˚ Bohr’s radius). Analytical expansions of E0 (n) are a0 = 4πε0 2 /me2 ≈ 0.529 A: available in the limiting cases 1/rs 1 or rs 1. Typically, in particular with molecular systems, one has the marginal case rs 1. Here, computational methods such as quantum Monte Carlo calculations (see Chapter 4) help to interpolate the gap. Motivated from the weakly interacting limit (rs 1), conventionally we consider the following splitting of the GS energy per unit volume† : ε(k) + vXC (n) (1.11) ε˘0 (n) = 2 |k|≤kF (n)

For homogeneous densities, the Hartree term reads n dr u(r − r ). Since the spatial summation over the Coulomb potential, ∼1/r, does not converge, the integral makes a contribution to the energy balance which is formally infinite. This divergency is an artifact of modeling the interacting electron gas without taking into consideration the (positive) charge of those atomic nuclei (“counter charges”) that provide the source of the electrons to begin with. The physical system is always (close to) charge neutral, so that (on average) nnuclei = −nelectrons . This implies that the nuclei provide a “background” potential, nnuclei dr u(r − r ), that leads to an exact cancellation of the divergent contribution in the Hartree term. Therefore, this particular term should be ignored when dealing with the homogeneous electron system (the Jellium model). †

10

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

where the factor of 2 accounts for the electron spin. The first term comprises the kinetic energy of the free gas. Its dependency on the density is regulated via the Fermi wavenumber, kF (n). The second term includes the remaining correlation effects and therefore has a weak coupling expansion. For the Coulomb case, the leading term is ∼1/rs with subleading corrections,6 vXC (n) = −n

0.9163 + n[−0.094 + 0.0622 ln rs + 0.018rs ln rs + O(rs )] (1.12) rs

in Rydberg units (ERy = EHartree /2 ≈ 13.6 eV). 1.2.2 Local Density Functional

The information taken from homogeneous systems for constructing functionals describing inhomogeneous systems is the dependency of the GS energy per volume on the particle density, ε˘ 0 (n). A leading-order approximation for the general F -functional is obtained by (1.13) F [n] = dr˘ε0 (n(r)) This approximation is valid if the inhomogeneous system is real-space separable, meaning that it can be decomposed into a large number of subsystems that (1) still contain sufficient particles to allow for treatment as an electron gas with a finite density, (2) are already small enough to be nearly homogeneous in density, and (3) have negligible interaction with each other. Systems exhibiting a relative change of density, which is large even on the shortest length scale available, the Fermi wavelength λF , do not satisfy (1) and (2) simultaneously. So a minimal condition for the applicability of Eq. (1.13) is λF ∇n n 1

(1.14)

Remarks (3) implies that the interaction is short range, ideally u(r − r ) ∼ • Condition δ(r − r ). For the Coulomb case, we separate from the 1/|r − r |-interaction a long-range term, which is then treated by introducing an extra term, the Hartree potential. • Since the Fermi wavelength itself depends on the density, λF ∼ n−1/d , relation (1.14) is satisfied typically only in the large n-limit. There, the main contribution to the energy (1.13) stems from the kinetic term in Eq. (1.11). Therefore, the leading error in the local functional (1.14) usually comes from the fact that the Thomas–Fermi approximation [kF (r) ≡ kF (n(r))] ε(k) (1.15) Tˆ ≈ 2 dr |k|≤kF (r)

KOHN–SHAM FORMULATION

•

11

gives only a very poor estimate of the kinetic energy of an inhomogeneous electron gas, even for noninteracting particles. The failure of the Thomas–Fermi approximation is the main reason that orbital-free DFT has a predictive power too limited for most practical demands. The search for more accurate representations of the kinetic energy in terms of n-functionals is at present an active field of research.7,8

1.3 KOHN–SHAM FORMULATION

Better estimates for the kinetic energy can be obtained within the Kohn–Sham formalism.9 One addresses the optimization problem (1.2) by reintroducing an orbital representation of the density with single-particle states, n(r) =

N˜

|φ (r)|2

(1.16)

=1

called the Kohn–Sham or molecular orbitals. The orbitals φ are sought to be ortho-normalized; the parameter N˜ is free, in principle. However, with an eye on approximating the kinetic energy of the interacting system by the energy of the free gas, N˜ is usually chosen to be equal to the number of particles, N˜ = N . With this choice, the optimization problem formally reads 1 ∂ [E0 [n(r)] − ε (φ |φ − 1)] = 0 2 ∂φ∗ (r)

(1.17)

featuring the Kohn–Sham energies (or molecular orbital energies), ε , which play the role of Lagrange multipliers ensuring normalization. Equation (1.17) can be cast conveniently into a form reminiscent of a Schr¨odinger equation of N single particles: [ε(p) + vs (r)]φ (r) = ε φ (r)

(1.18)

where we have employed a substitution (p = −i∂x ), 1 ∂ E0 [n(r)] = [ε(p) + vs (r)]φ (r) 2 ∂φ∗ (r)

(1.19)

which is merely a definition of an auxiliary quantity, the effective potential vs (r). The set of N equations given by Eq. (1.18) constitutes the Kohn–Sham equations. Remarks

•

The Kohn–Sham (KS) formalism should give a much improved description of the kinetic energy, because by construction it reproduces exactly the kinetic energy of the inhomogeneous, noninteracting gas.

12

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

• •

The fictitious KS particles live in an effective potential which modulates their environment such that their density and all related properties coincide with those of a true many-body system. The potential term has a decomposition vs (r) = vex (r) + vH (r) + vXC (r)

•

where the second term includes the Hartree interaction, which for a specific two-body interaction potential u(r − r ) reads vH (r) = dr u(r − r )n(r ). The third term, the exchange–correlation potential , incorporates all the remaining, more complicated many-body contributions. In particular, we have also lumped the difference between the free and interacting kinetic energies into this term. Solving the KS equations requires diagonalization of a KS-Hamiltonian: ˆ + vs (r) ˆ Hˆ KS = ε(p)

•

•

•

(1.20)

(1.21)

The dimension of the corresponding Hilbert space, Nφ , usually exceeds the particle number substantially: Nφ N. Therefore, occupied (real) eigenstates that finally enter the construction of the density [Eq. (1.16)] need to be distinguished from unoccupied (virtual) ones. The selection process follows the variational principle. Similar to the Hartree theory and in pronounced contrast to the Schr¨odinger equation for a single particle, the KS equations pose a self-consistency problem: The potential vs (r) is a functional of n(r), so it needs to be determined “on the fly.” We emphasize that even though the functional vs [n](r) may exhibit a very complicated—in particular, nonlocal —dependency on the ground-state particle density, the effective potential that finally is felt by the KS particles is perfectly local in space. It provides an effective environment for the KS particles, so that the many-body density can be reproduced. The self-consistent field (SCF) problem in DFT is much easier to solve than the Hartree–Fock (HF) equations, which are nonlocal in space and, what is much worse, even orbital dependent. As a consequence of the orbital dependency of the Fock operator, a real HF orbital interacts with N − 1 other real orbitals, whereas a virtual orbital interacts with N real orbitals. The situation in DFT is much simpler in the sense that occupied and unoccupied orbitals all feel the same effective potential vs [n](r). Notice, however, that this computational advantage comes at the expense of the derivative discontinuity, an unphysical feature of exact exchange correlation functionals (see Section 1.5.3) that is very difficult to implement in efficient approximation schemes. Our derivation of the Kohn–Sham equations was tacitly assuming the following: The density of any electron system, including the interacting systems, can be represented in the manner of Eq. (1.16), where the orbitals

WHY DFT IS SO SUCCESSFUL

13

φ are normalizable solutions of a (single-particle) Schr¨odinger equation. Is this really true? The answer is: Not always. That is, systems with degenerate ground states may exhibit a particle density that can only be represented as a sum of independent contributions coming from a number g of single Slater determinants. A general statement that is valid for all practical purposes is that any fermionic density may be represented uniquely as a weighted average of g degenerate ground-state densities of some effective single-particle Schr¨odinger problem [Eq. (1.18)].10,11 1.3.1 Is the Choice of the KS–Hamiltonian Unique?

For an interacting many-body system, splitting between kinetic and potential energy as suggested in Eqs. (1.19) and (1.20) is not as unique as it may appear at first sight. To give a straight argument, recall that the dispersion relation of the free particles, ε(p), can be altered substantially by interaction effects. For example, the mass of the electron describes how the particle’s energy depends on its momentum. In the presence of interactions, an electron always moves together with its own screening cloud, brought about by the presence of other electrons. Although this does not change the wavelength (i.e., the momentum) of the electron, it does change its velocity. It tends to make it slower, so that the “effective” mass increases. Such interaction effects on parameters such as the mass, the thermodynamic density of states, and the magnetic susceptibility are called Fermi-liquid renormalizations. Having this in mind, one could easily imagine another splitting featuring a renormalized kinetic energy, ε∗ (p), which would describe a more adapted description of the dispersion of charged excitations (e.g., the propagation of screened electrons) in the interacting quantum liquid.12 A remaining, residual res interaction, VXC , would appear to be designed so that the ground-state density produced by this effective system would also coincide with the true density. Such a renormalized splitting is rarely employed in practice, perhaps because a good approximation for the residual functionals is not available. For the effective single-particle problem that yields the exact ground-state density, we conclude that various choices are possible, the choices differing from one another in the dispersion ε(p) that enters the kinetic part of the KSHamiltonian. Very few restrictions on the possible functional forms of ε(p) exist; the parabolic shape and the trivial form ε ≡ 0 (with proper readjustments of vXC ) are just two choices out of many. 1.4 WHY DFT IS SO SUCCESSFUL

The precise dependency of the exchange–correlation potential vXC on the density n(r) is not known. In the simplest approximation, the local density approximation (LDA), one takes for vXC the result obtained from the homogeneous electron gas [Eq. (1.12)], but replacing the homogeneous density with n(r) (see Section 1.2.2).

14

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Remarks • The universal success of DFT in chemistry and condensed matter physics came with the empirical finding that the combination of KS theory with LDA (and its close relatives) works in a sufficiently quantitative way to make it possible to calculate ground-state energies (and hence to determine molecular and crystal structure) even outside the naive regime of the validity of LDA as given by relation (1.14). This is due to a cancellation of errors in the kinetic and exchange correlation part of the KS-Hamiltonian (1.21).13 • In analogy with Hartree–Fock theory, a fictitious “KS–ground state” wavefunction, , is often considered. It is constructed by building a Slater determinant from the real KS orbitals. In contrast to HF, this state is not optimal in an energetic sense. It does, however, reproduce the exact particle density. In the same spirit, KS energies are often interpreted as single-particle energies, even though from a dogmatic point of view there is no (close) connection between the Lagrange multipliers and the true many-body excitations; indeed, to the best of our knowledge, a precise justification of this practice has never been given. Still, the pragmatic approach has established itself widely, since it often gives semiquantitative estimates for Fermi-liquid renormalizations, which are important, for example, in band structure calculations. • The implementation of efficient codes is much easier in DFT than in HF theory, due to the fact that functionals are only density and not orbital dependent. For this reason, many powerful codes are readily available in the marketplace. • At present, because of the virtues noted above, DFT is by far the most widely used tool in electronic structure theory (lattice structures, band structures) and quantum chemistry (molecular configurations), with further applications in many other fields, such as nuclear physics, strongly correlated systems, and material science. 1.5 EXACT PROPERTIES OF DFTs

Since there is no analytic solution of the general interacting many-body problem, it is not surprising that exact statements about exchange correlation functionals are scarce. Precise information is, however, available in the presence of an interface to the vacuum. Imagine a situation in which a molecule or a piece of material is embedded in a vacuum. The material is associated with an attractive KS potential “well,” vs , which binds N electrons to the nuclei (or atomic ion cores). Outside the material, the binding potential and the particle density rapidly approach their asymptotic zero values. Exact information is available about how the asymptotic value is approached.

EXACT PROPERTIES OF DFTs

15

1.5.1 Asymptotic Behavior of vXC

Consider the Hartree term vH (r) =

occ

dr u(r − r )|φ (r )|2

(1.22)

=1

in the KS equations [ε(p) ˆ + vex (r) + vH (r) + vXC (r)]φ (r) = ε φ (r)

(1.23)

It contains at = a piece u(r − r )|φ (r )|2 , which incorporates an interaction of a particle in the occupied orbital φ with its own density. This spurious, nonphysical interaction is known as a self-interaction error. In principle, it should be eliminated by an counterpiece contained in the exchange part of vXC .† The construction and application of empirical corrections for this effect are the subject of Chapter 14. The Hartree term is known exactly in the asymptotic region. This is the reason that it is possible to draw a rigorous conclusion about vXC . To be specific, we consider the case of Coulomb interactions. In the asymptotic regime a distance r away from the materials center, where the particle density is totally negligible, all spurious contributions made by an occupied orbital add up to e2 /r. To cancel this piece we must have vXC (r) → − r→∞

−αN−1 e2 + + ··· r 2r 4

(1.24)

whenever the particle density vanishes. The correction term, which we have also given here, describes the polarizability, αN−1 , of the many-body system (with N − 1 particles). This term incorporates the interactions with the fluctuating charge density of the mother system that particles feel when they explore the asymptotic region. † This cancellation may be seen explicitly within the Hartree–Fock approximation. That is, the interaction term reads

σ =↑,↓

dr u(r − r )φ∗ σ (r )[φ σ (r )φσ (r) − δσσ φ σ (r)φσ (r )]

so that the piece with l = l, σ = σ in the first (Hartree) term is eliminated by a corresponding piece in the second (Fock) term.

16

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Remarks

• •

A more intuitive way to rationalize the leading asymptotics of vXC is to recall that an electron that makes a virtual excursion from its host material into vacuum still interacts with the hole that it leaves behind. The first term in Eq. (1.24) describes the interaction with this virtual hole. Both terms appearing in Eq. (1.24) are not recovered in local approximation schemes, such as LDAs and generalized-gradient approximations (GGAs), which stipulate the form vXC (r) ≈ vXC (n(r), ∇n(r), . . .). The statement is obvious, because the density is exponentially small in the asymptotic region (see Section 1.5.2), whereas the potential (1.24) is not. This defect has very serious consequences, since the van der Waals dispersion interactions, vXC ∼ −αN−1 /r 4 , ignored in LDAs and GGAs, provide the dominating intermolecular forces that prevail, for example, in biochemical environments. To address this problem, Grimme14 has proposed an ad hoc empirical procedure that adds a long-range term to standard energy functionals. The functional contains specific parameters, essentially modeling the local polarizability of single atoms or molecular groups chosen so that a rough description of the van der Waals interaction is retained.

1.5.2 Workfunction

Now, consider the KS potential well in its ground state with N occupied bound orbitals φ. Generically, every such orbital contributes to the particle density n(r) at a point r unless it happens that φ has a node there: φ(r) = 0. This is also true in the asymptotic region far away from the well’s center. However, in this region the state φHOMO with the largest KS energy [highest occupied molecular (or material) orbital (HOMO)] gives the dominating contribution almost everywhere (i.e., at all points where |φHOMO (r)|2 > 0). It is easy to see why this is. In the asymptotic region vs (r) decays in a power-law manner with the distance r from the well’s center (Fig. 1.2). Therefore, the KS equations read −

2 2 ∂ (rφ ) = ε (rφ ) 2m r

(1.25)

where ε < 0 denotes the ionization energy of a bound KS state. The solution is φ ∼

1 −√2m|ε |/2 r e r

(1.26)

so that generically the HOMO orbital has the smallest KS energy by modulus, |εHOMO |. At large enough distances, it will give the only relevant contribution. [Exceptions to the rule occur only in the case of a vanishing prefactor not written in Eq. (1.26).] For this reason, the KS energy of the highest occupied molecular level is actually a physical observable; it gives the ionization energy or workfunction (Janak’s theorem15,16 ).

EXACT PROPERTIES OF DFTs 0

vs W

17

r −e2/r

−|εHOMO|

Fig. 1.2 Effective potential (solid line) near a surface of a simple metal. Surface atoms (dark balls) and the electron liquid (light background) are also indicated.

1.5.3 Derivative Discontinuity

The derivative discontinuity17,18 (DD) is perhaps one of the less intuitive properties that an exact XC potential must exhibit. We discuss it here in some detail, since the fact that local approximations are not capable of capturing it even qualitatively often leads to very important artifacts in the KS spectra which are not a genuine feature of DFT itself but, rather, of the LDA. We will see that the DD is related intimately to the fact that the N (real) particles in a many-body system interact with only N − 1 partners, while an infinitesimal test (virtual) charge in such a system would interact with N (i.e., all the other particles). Since vXC [n] has access to the total density only, it cannot easily distinguish real and virtual orbitals with their different interacting environments (as HF does). It turns out that the way DFT implements such behavior is via a very sharp (i.e., nonanalytic) behavior of vXC [n] on the particle density n(r). 1.5.3.1 Isolated System Consider an isolated quantum dot, such as a single atom or a molecule, with N electrons. The corresponding KS system exhibits a number of N KS particles that occupy the N lowest-lying KS states. It is important to recall that each KS particle interacts with the total charge density, vXC [nN ], only, including the density contribution that comes from itself. In this respect, KS particles are fundamentally different from physical particles, which do not interact with themselves, of course. Next, add one additional particle, the excess charge, δN = 1; to be specific, put it into the lowest unoccupied molecular orbital (LUMON ). The new XC functional of the “anion” will be vXC [nN+1 ]. What are the consequences of charging for the KS energies? Due to the change nN → nN+1 , every original particle interacts with one more charge, δN , the excess particle in the LUMON . Therefore, the energy of every one of the first N orbitals shifts by the amount U , which measures the interaction with the excess particle (see Fig. 1.3). Notice also that the energy of the LUMON (now, better, HOMON+1 ) has shifted by U after it was occupied. This is because in KS theory, all orbitals, occupied and unoccupied, are calculated in the same effective potential.

18

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

U

HOMON+1

LUMON

HOMON

Fig. 1.3 Evolution of the energy of KS-frontier orbitals with increasing electron number from N (left) to N + 1 (right). The KS-LUMON jumps upon occupation by an amount U . By contrast, in Hartree–Fock (HF) energy the HF-LUMON is already calculated anticipating an interaction with one more particle (as compared to HF-HOMON ). Therefore, such a jump does not occur in HF theory.

So far, no peculiarities have appeared. To see that there is indeed something looming on the horizon, now add a fractional excess charge, say an infinitesimally small one, δN ≪ 1, rather than an integer charge. Then the original KS orbitals should remain invariant by definition, since the perturbation is infinitesimally small so that the charge density is not disturbed. But, what are the energy and shape of the newly occupied orbital? The salient point is that a real particle does not interact with itself. Therefore, the energy of a physical orbital should not be sensitive to its occupation. Hence, the workfunction of an atom with a fractionally occupied HOMO is the same as that of one with an integer occupation. We conclude that the fractionally occupied orbital must have the energy HOMON+1 , which exceeds the energy of the empty orbital LUMON by the amount U . So evolution of the energy of HOMON+δN with δN is not smooth; an arbitrarily small change in the density, δN , must result in a finite reaction of vXC [n] if the particle number, N , is near integer values: δEXC [n] δEXC [n] − (1.27) XC (r) = δn(r) N+δN δn(r) N−δN This is the (in)famous derivative discontinuity (DD). 1.5.3.2 Coupled Subsystems (Partial Charge Transfer) To illustrate the importance of the DD, we now give a typical example where fractional charge occurs.

TIME-DEPENDENT DFT

19

Consider two subsystems, which are partially decoupled in the sense that electronic wavefunctions interact only weakly. Such could be, for example, two functional groups in the same molecule or two neighboring molecules in a biological environment. To be specific, we imagine here the atom from Section 1.5.3.1 and a second many-body system, a metal surface. Each system has its own workfunction: for example, WAN+1 > WS . Let us bring the atom into the vicinity of the surface, but keeping their distance d extremely large. Since only the total particle number N = NA + NS is conserved, there will be a net exchange of charge, δN , between S → A. This implies that the atomic orbitals acquire a finite broadening, , which however is small, |WAN+1 − WS |, since d is large. In this situation and in the absense of ionization, the net particle flow from S → A is exponentially small. As a consequence, the HOMON+1 fills up, but only with a very small fraction of an electron. A To describe correctly how the HOMON+1 fills upon approach of the two A subsystems, it is crucial that the piece of the XC functional describing A indeed reacts to the flow, so that the LUMON A of the coupled atom is shifted upward against the uncoupled atom by U . If U is on the order of the mean level spacing or even bigger—as it tends to be for nanoscopic systems such as atoms and small molecules—this shift is important for understanding charge transfer in DFT. On a qualitative level, the DD suppresses charge fluctuations between weakly coupled subsystems. Remarks

•

• •

The spatial modulation of vXC induced by the DD reflects the differences in the workfunction seen in different charge states of the isolated subsystems before they have been coupled. Therefore, quantitative estimates about the size of the DD-induced modulations can be obtained by calculating workfunctions of the constituting subsystems and their anions/cations. The DD enters in a crucial way the DFT-based description of the gate dependence of the charge inside a quantum dot. Without DD, the width of the Coulomb oscillations is U rather than max(, T ) and therefore qualitatively wrong.19 In LDA-type approximations the DD is missing, since by construction the potentials evolve smoothly when an infinitesimal probing charge is added. Currently, attempts are under way to design orbital-dependent functionals which can take the DD into account (in a spirit similar to HF theory). K¨ummel and Kronik1 have compiled a review about the most recent developments in this direction.

1.6 TIME-DEPENDENT DFT

Since the 1980s, attempts have been made to generalize equilibrium theory into time dependent phenomena. A detailed account of its foundations may be found in recent monographs.20,21 We discuss only those most basic aspects which are

20

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

important to shed some light on the connection between TDDFT and transport calculations. Consider the time-dependent Schr¨odinger equation ˆ ˆ ˆ i∂t (t) = T + U + Vex + dr φex (rt)n(r) ˆ (t) (1.28) where Tˆ and Uˆ abbreviate the kinetic and interaction energies given explicitly in Eq. (1.2) and, again, ˆ Vˆex = dr vex (r)n(r) describes the electrostatic environment. The time evolution of all observables is fixed by (1) the time-dependent external potential φex (rt) and (2) the initial conditions (i.e., the wavefunction i at the initial time t = 0). This suggests that the response of all those systems, which have been prepared in an identical way and therefore share the same initial state, is dictated by a single scalar field vex (t). In this respect, the situation is very reminiscent of the equilibrium case. To prove also that for time-dependent phenomena the density may serve as the fundamental variable, one should demonstrate that an invertible relation analog to Eq. (1.6) exists, at least in principle, which allows reconstruction of the probing potential φex (t) from knowledge of n(t) (and i ) at all times t ≥ 0. A proof that this indeed is the case for a wide class of potentials φex (t) was constructed first by Runge and Gross22 and corroborated by many later authors, in particular by van Leeuwen.23 1.6.1 Runge–Gross Theorem

The Runge–Gross theorem emphasizes that the time evolution of the density n(t) is a unique characteristic of the probing potential φex (t): Two probing fields, which differ by more than a homogeneous shift in space, invoke two different density evolutions. This insight is then later used to argue that a density profile, n(rt), that is driven in one system with interaction Uˆ by φex (t) can also be seen in another system with a different interaction Uˆ after φex (t) has been replaced by the appropriate modulation φex (t). In particular, Uˆ can also be zero, which is the foundation of the time-dependent DFT. We offer a proof of these statements which relies on the familiar fact that a solution of a partial differential equation (here in time) is unique once the initial situation and the evolution law have been specified. Proof The strategy is to relate the probing field φex to the second time derivatives n. ¨ For the first time derivative, Heisenberg’s equation of motion tells us that

n(rt) ˙ =

1 (t)|[n(r), ˆ Tˆ ]|(t) i

(1.29)

TIME-DEPENDENT DFT

21

because all other terms in Uˆ , Vˆex , and φex commute with the density operator n(r). ˆ By comparing with the continuity equation, n(rt) ˙ + ∂r (t)|jˆ (r)|(t) = 0

(1.30)

one may identify the proper definition of a current density operator, jˆ (r). The procedure is familiar from elementary textbooks on quantum mechanics. The second derivative reads 2 1 (t) [n(r), ˆ Tˆ ], Hˆ (t) (t) (1.31) n(rt) ¨ = i where Hˆ (t) is the Hamiltonian driving the time evolution in Eq. (1.28). This equation is readily recast into the shape δn(rt) ¨ = − dr (rt, r t)∂r φex (r t) (1.32) where we have introduced a correlator, i ˆ (t) jˆ (r ), n(r) (t)

(1.33)

1 ∂r (t) jˆ (r), Tˆ + Uˆ + Vˆex (t) i

(1.34)

(rt, r t) = and the abbreviation δn(rt) ¨ = n(rt) ¨ +

The second term appearing in this expression describes the internal relaxation of the electron system (“gas” or “liquid”; e.g., due to viscoelastic forces). The equal-time commutator in Eq. (1.33) is closely related to the density matrix; in terms of fermionic field operators, one has ˆ † (r)ψ(r ˆ ) + ψ ˆ † (r )ψ(r)|(t) ˆ n(rt, r t) = 12 (t)|ψ so that (rt, r t) =

1 [n(rt, r t)∂r δ(r − r ) − δ(r − r )∂r n(rt, r t)] m

(1.35)

Feeding this expression back into Eq. (1.32) and recalling that n(rt, rt) ≡ n(rt), we recover Newton’s third law, δn(rt) ¨ =

1 ∂r n(rt)∂r φex (rt) m

(1.36)

as we should. Clearly, a spatially homogeneous part of the probing potentials can never be recovered from the density evolution, since such potentials do not

22

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

exert a force. By contrast, the inhomogeneous piece can be reconstructed from its accelerating effect on the density.† Technically speaking, Eq. (1.36) represents a linear, first-order (in space) differential equation for the probing field φex (t). Combining with the Schr¨odinger equation (1.28), i∂t (t) = Hˆ (t)(t) one obtains a system of two linear equations, which are local in time and readily integrated starting from the initial time t = 0. This is how, in principle, the probing field may be reconstructed (up to a homogeneous constant), if only n(rt) is known: n(rt) → φex (rt). Since the other direction, φex (rt) → n(rt), is provided trivially by the Schr¨odinger equation, we readily conclude that φex (rt) ↔ n(rt) Extension So far we have shown how the probing potential φex (rt) can be calculated if the density evolution and the initial state are given. It is also tacitly understood here that the Hamiltonian (i.e., the dispersion, Tˆ , the electrostatic environment, Vˆex , and the interaction, Uˆ ) are known. Their structure cannot be reconstructed with n(rt). In conjunction with Eq. (1.36), this last observation has an important implication. Consider, for example, two systems with two different interactions, Uˆ and Uˆ , and two different initial states, i and i , that both satisfy the con˙ i ), dition that their initial density n(rti ), together with the time derivative n(rt coincide. Under this condition, for both systems an equation of the type (1.36) holds true, since the derivation made no special assumption about the structure of Uˆ . Therefore, for any (reasonable) interaction Uˆ we can find a time-dependent single-particle potential such that the density of the many-body system follows a predefined time evolution n(rt). We can even go a step further. In fact, we have shown how to calculate Uˆ -depending single-particle potentials, vs , such that systems with different interactions can exhibit the same time-dependent density. This means, in particular, that we can model the time evolution n(rt) of interacting systems driven by φex (rt) by studying a reference system of noninteracting particles that experience a particular driving field vs (rt). This field can be constructed from the (invertible) mapping

φex (rt)

Uˆ

↔

Eq. (1.28)

n(rt)

Uˆ = 0

↔

Eq. (1.36)

vs (rt)

(1.37)

at least in principle. Some of the conclusions, which we have arrived at here, were presented earlier by van Leeuwen24 based on the same equations but with somewhat different arguments.‡ statement is true in those spatial regions where the particle density is nonvanishing n(r) ≥ 0. thank G. Stefanucci for bringing Ref. 24 to our attention and for a related discussion.

† This ‡ We

TIME-DEPENDENT DFT

23

Remarks

•

•

By including in addition to the scalar probing potential φex (t) a vector probing potential, Aex (t), and keeping the current density explicit as a second collective field, one can generalize the argument presented above to derive a time-dependent current DFT. A proof in the spirit of van Leeuwen24 has been given by Vignale.25 Exactly the same arguments that have been presented for the case of a single wavefunction (t) also apply to an ensemble of wavefunctions characterized by a statistical operator ρˆ with only minor modifications: (1) quantum mechanical expectation values turn into ensemble averages, and (2) the Schr¨odinger equation is replaced by the von Neumann equation ρˆ =

•

•

i [ˆρ, Hˆ (t)]

(1.38)

This prompts a generalization of TDDFT to finite temperatures. In principle, one can in this way also consider systems with a coupling to a heat bath (e.g., bosons). The only essential modification occurs in Newton’s law, which now needs to account, for example, for a change in the effective dispersion 1/m due to the electron–boson coupling. First attempts to develop a TDDFT for a system coupled to reservoirs have been reported.26 – 28 Notice that the appearance of the gradients in Eq. (1.36) is due to particle number conservation. The reason is that symmetric correlators of the type ˆ n(r (t)|[[n(r), ˆ O], ˆ )]|(t)

•

vanish after integration over one of the spatial coordinates if Oˆ commutes ˆ Nˆ ] = 0. Indeed, in Eq. (1.31) with the total particle number operator, [O, this is the case, because any term in the Hamiltonian commutes with the total particle number operator Nˆ . Hence, such correlators have vanishing (real space) Fourier components at zero wavenumbers, q = 0. Assuming analyticity, we can say that the correlator is proportionate to the product of two wavenumbers, q and q , and for this reason two gradients appear in Eq. (1.36). The validity of time-dependent DFT is based on three elementary observations all of which relate to the fact that (quantum) mechanics is governed by linear differential equations in time: 1. The total force can be deduced from its action on the particle density. 2. This force can be split into an external and internal component; the internal component acting at time t can be calculated knowing just (t). 3. To calculate (t), only forces acting prior to t and the initial conditions have to be known.

24

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

1.6.2 Dynamical Kohn–Sham Theory

The Runge–Gross theorem and its extensions teach us that there is a reference system of noninteracting particles living in a potential vs (rt) [Eq. (1.37)], so that at t > 0 its density evolves in time in exactly the same way that it does for many-body system. The dynamics of this reference system are governed by an effective Schr¨odinger-type equation, the dynamic Kohn–Sham equations. With the decomposition vs = vex + vH + vXC + φex , they read i ˆ + vex (r) + φex (rt) + vH (rt) + vXC (rt)]φ (r) ∂t φ (r) = [ε(p)

(1.39)

where φex (rt) is the time-dependent probing field and n(r, t) =

N

|φ (rt)|2

=1

vH [n](rt) =

dr u(r − r )n(r t)

(1.40)

The functional vXC [n](rt) is the piece of vs [n](rt) that accommodates the interactions beyond the mean field (Hartree) type. It depends on the time-dependent particle density, including its history. Moreover, as a first-order differential equation, Eq. (1.39) needs to be complemented with an initial condition. Part of this is, of course, that n(r, t = 0) coincides with the density of the many-body system at t = 0. However, in addition, the functional vXC will in general also depend on the many-body wavefunction of the initial state, I ≡ (t = 0), which may—but does not have to be—an equilibrium state. 1.6.3 Linear Density Response

Consider a situation where the many-body system is in thermal equilibrium at times t < 0 before the probing field φex (rt) is switched on. Moreover, assume that the perturbation is going to be very weak, so that the requirements for the application of the linear response theory are met. Under this condition, an explicit expression for the XC-functional vXC is readily written down. Indeed, there is a matrix χ(rt, r t ), the density susceptibility, which relates the probing field to the (linear) system response, n = n − neq : (1.41) n(rt) = dt dr χ(rt, r t )φex (r t ) The matrix χ(t, t ) is an equilibrium correlation function of the system, and it therefore depends only on the time differences t − t . We can use its inverse, χ−1 , to define an operator kernel fXC via the decomposition χ−1 = χ−1 KS − fH − fXC

(1.42)

TIME-DEPENDENT DFT

25

The operator χKS describes the density response of the equilibrium KS system, ignoring the feedback of φex (t) into vH and vXC [Eq. (1.39)]; explicitly, χKS (rr z) =

1 f (ε ) − f (ε ) |n(r)| ˆ |n(r ˆ )| ε − ε − z ,

where |, | and ε, denote the unperturbed (φex ≡ 0) KS orbitals and KS energies and z = ω + iη lies in the complex plane. The feedback is then taken into account by fH = u(r − r ) for the Hartree term vH and by fXC for the exchange correlation potential, vXC , in Eq. (1.39). From this point of view it is obvious how to construct the dynamic correction of the XC functional to the equilibrium functional: vXC [n](rt) =

eq vXC [neq ](r)

+

dt

dr fXC (r, r ; t − t )n(r t )

(1.43)

Remarks

•

•

•

We have just constructed a single-particle theory, which has the property that it gives the correct linear dynamical response of the many-body system. The procedure relies on the familiar notions of linear response theory only and does not make reference to the underpinnings of the time-dependent DFT. It is emphasized here that the genuine statements of time-dependent DFT, when applied to systems that are in equilibrium at t < 0, reside in the claim that an effective single-particle description exists even outside the linear regime. Much of the recent improvement29 in quantitative calculations of optical spectra of single molecules is due to including the terms fH and in particular fXC into the analysis (in addition to χKS ), which have often been ignored before. In this way the single-particle spectrum of the bare Kohn–Sham system is dressed so as to produce the correct many-body excitations. Often, the success of this procedure is attributed to the time-dependent DFT. This is misleading, however, since it is merely the consequence of a proper application of the standard theory of linear responses. The best used approximation on fXC is the adiabatic LDA (ALDA). It comprises two steps. First is the adiabatic approximation, ad (rt, r t ) fXC

eq ∂vXC [n](r ) = ∂n(r)

δ(t − t )

(1.44)

n(rt)

This step, by definition, erases all memory effects, so a δ-function in time appears. The complete absence of memory suggests one more approximation, which also eliminates nonlocal correlations in space. This is necessary, because signal propagation occurs with a finite velocity and therefore always

26

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

has a retardation time. Therefore, density fluctuations in different spatial regions cannot be correlated instantaneously. This aspect is built into eq dvXC (n) ALDA δ(r − r )δ(t − t ) (1.45) fXC (rt, r t ) = dn n(rt)

automatically, where in Eq. (1.44) approximant.

eq vXC

has been replaced by its LDA

1.6.4 Time-Dependent Current DFT

The frequency structure of fXC has been worked out in the hydrodynamic regime of small wavenumbers and frequencies by Kohn, Vignale, and co-workers.30,31 It is seen explicitly there that severe memory effects indeed exist due to general conservation laws, which express themselves as singular behavior in correlation functions with respect to wavenumber and frequency. As usual, singularities may be partly eliminated by reformulating in terms of correlation functions of the (generalized) velocities. In the case of the particle density, one introduces the longitudinal current density, j (qω) =

−iω qn(qω)

(1.46)

In this way one absorbs factors q −1 , thus removing nonlocal behavior in the density kernels, which indicates, for example, the slow density relaxation due to particle number conservation. In this spirit the time-dependent current DFT (TDCDFT) was developed.30,31 Apart from the fact that it works with current-density kernels, which are more local than those in TDDFT, TDCDFT offers yet another attraction. In addition to the density [or j , Eq. (1.48)] it also features a second independent collective field, the transverse currents j t . Therefore, TDCDFT can in principle also describe the orbital response to probing vector potentials (i.e., magnetic fields). 1.6.5 Appendix: Variational Principle

Unlike the case with equilibrium theory, a variational principle is not required in order to derive the dynamical Kohn–Sham equations. Still, it is desirable to have a formulation of TDDFT available in terms of an action, for example, because one may hope to be able to calculate vs by performing a functional derivative. In this section we investigate the “naive” trial action ∞ ˆ (t)|(t) ˜ ˜ ˜ dt (t)|i∂ A[] = t −H

0 ∞

= 0

ˆ ˆ ˆ ˜ ˜ dt (t)|i∂ − t − T − U − Vex |(t)

∞

dt

drφex (rt)n(rt) ˜

0

(1.47)

TIME-DEPENDENT DFT

27

˜ which is defined over the space CI of complex fields (t) with constraints given by (1) the antisymmetry requirement in all N coordinates r1 · · · rN , and (2) the ˜ initial condition (0) = I . The solution of the Schr¨odinger equation for a given ˜ external field φex (rt) is the one element (t) of CI that optimizes A[]. In full analogy to the equilibrium case, the functional equation (1.47) can be used as a basis to find an action functional of the density alone by preoptimizing. We first perform a decomposition of CI into subsets; the elements of each subset have the same evolution n(rt). ˜ Second, we find within each one of these subsets ˜ These states form the that are optimal with respect to A[]. those states n(rt) ˜ † ensemble Mpreopt of preoptimized fields. In this way we arrive at an action functional, which is defined on Mpreopt : ˜ = SI [n]

0

∞

dt n˜ (t)|i∂t − Tˆ − Uˆ |n˜ (t)

(1.48)

Sn˜ is the dynamical analog of F [Eq. (1.37)]. The Schr¨odinger time evolution of the density, n(rt), is the single one that optimizes the full action, AI [vex , n] ˜ = SI [n] ˜ −

∞

dt[vex (r) + φex (rt)]n(rt) ˜

(1.49)

0

The variational space associated with this action is spanned by all those n(rt) ˜ ˜ ˜ which are -representable: There is at least one element (t) of CI such that ˜ ˜ n(rt) ˜ = (t)| n(r)| ˆ (t). Remarks

• •

Preoptimizing is a constrained minimum search in the subspace of possible wavefunctions that satisfy the initial condition (2). Therefore, each initial condition carries its own functional: SI [n]. By construction, the search over -representable densities leads to a variational equation, ˜ δSI [n] = φex (rt) + vex (r) δn(rt) ˜ n(rt)=n(rt) ˜

(1.50)

Its solution, n(rt), defines the Schr¨odinger dynamics for the density corresponding to a given probing field φex (rt). A more explicit expression for the left-hand side may be obtained by taking the time derivative and comparing with Eq. (1.36). † With every optimum (t), the related function e iϕ(t) (t) with ϕ(0) = 0 is an optimum, which n˜ n˜ differs by a time-dependent, spatially homogeneous phase shift. The shift merely reflects the necessity to fix the zero of energy. We identify all those states with one another that differ only by a spatially homogeneous phase ϕ(t).

28

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

•

• •

• •

Consider to generate all possible solutions of Eq. (1.50) by scanning through the space of all allowed (i.e., sufficiently smooth) probing fields φex (rt). This subset of the -representable variational space is called v-representable. An arbitrary element of the variational space n(rt) ˜ is certainly -representable but may not be v-representable. The Schr¨odinger dynamics is unitary: N = dr n(rt) is an invariant of motion. v-representable states obey unitarity, but -representable states may not. By taking a functional derivative, ∂φex (r t ) δSI [n] ˜ ∂ = χ−1 (r r, (t − t)) = (1.51) ∂n(rt) δn(rt ˜ ) n=n ∂n(rt) ˜ a relation to the reciprocal of the density correlation function is derived. Note that the ∂ derivative relates to density differences within the set of all n(rt) that are v-representable. Our notation emphasizes this difference with the earlier δ derivative [Eq. (1.50)]. The right-hand side of Eq. (1.51) is subject to causality; the density n(rt) indicates changes in the probing potential φex (rt ) only at later times, t > t . Equation (1.51) pays respect to this asymmetry, since the ∂ and δ derivatives must not be interchanged. The causality issue noted above makes it very obvious that an action principle should not be based solely on the variational space of v-representable histories n(rt). This issue has been discussed in detail by van Leeuwen.23,32 In response, this author derives an action S employing the Keldysh formalism. The procedure by itself does appear to lead to fundamentally new insights. However, it has the charming feature against the naive starting point [Eq. (1.47)] that only one (enlarged) variational space for n(rt) appears. In addition, there is an important conceptual advantage, since—in principle—within this approach it is clear how one can calculate vXC in a systematical perturbation theory.

1.7 TDDFT AND TRANSPORT CALCULATIONS

In this section we discuss the application of TDDFT in the context of charge transport. The focus will be on the dc limit. There are various ways how to formulate the transport problem; we shall elaborate on the consequences from linear response and scattering approaches. We concentrate on the presentation of those elementary facts that are specific of a treatment of transport within the framework of TDDFT. An attempt is being made to be as self-contained as possible. 1.7.1 Linear Current Response

One way to establish a current flow in a system, which initially is in a thermodynamic equilibrium, is to switch on an electric field Eex (rt). This field is not

TDDFT AND TRANSPORT CALCULATIONS

29

the one that an electron feels when it accelerates. The accelerating (local) field, E, also contains an induced component, E = Eex + Eind

(1.52)

We restrict ourselves to initial situations that respect time-reversal invariance. Then the induced field is generated by a shift of charges, e n, under the influence of Eex ; we have Eind (rt) = −∂r dr u(r − r ) n(r t) (1.53) By definition, the conductivity matrix, σij , relates only the total field, E, to the linear response of the current density by ji (rω) =

dr σij (r, r , ω)Ej (r ω)

(1.54)

To make contact to TDDFT, we decompose j into a longitudinal (curl free) piece, j , and a transverse (source free) field, jt . 1.7.1.1 Magnetization (Transverse) Currents By construction, jt incorporates the orbital ring currents that may be understood as a local magnetization density defined via jt (rt) = c∂r × m(rt), where c denotes the velocity of light. Nonvanishing magnetizations occur in equilibrium systems only in the presence of (spontaneously) broken time-reversal invariance. In these cases, the current DFT (CDFT) has to be employed, where the magnetization is explicitly kept as a second collective field in addition to the particle density. We consider here only systems that are invariant under time reversal. Then, ring currents vanish in the initial state, jt = 0. In such systems transverse currents can emerge in the presence of external driving fields.† Since they are not accompanied by density fluctuations, TDDFT does not monitor them. This implies, in particular, that the transverse currents of the time-dependent KS system do not, in general, coincide with the physical magnetization currents. 1.7.1.2 Longitudinal Currents The continuity equation connects j with the time dependency of the particle density. Therefore, the physical longitudinal current density and the longitudinal KS currents coincide. Hence, it makes sense to introduce a conductivity of the KS particles via

ji (r, ω) =

dr σKS,ij (r, r , ω)[Eex + Eind + EXC ]j (r , ω)

(1.55)

† As an example we mention a ring current flowing in a perfectly conducting cylinder that closes around a time-dependent magnetic flux.

30

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Just like physical particles, KS particles do not react to the external field but, rather, to the local field. This field contains the same Hartree-type term that originates from vH in Eq. (1.39) and that was already present for the physical particles [Eq. (1.53)]. However, for KS particles not only vH but also vXC acquires a correction with a change in the density since fXC (r, r , t − t ) =

∂vXC [n](rt) ∂n(r t )

(1.56)

does not vanish [see Eq. (1.43)]. The resulting excess force EXC from this contribution reads (1.57) EXC (rω) = −∂r dr fXC (r, r , ω) n(r , ω) in full analogy with Eq. (1.53). Remark

•

The exchange–correlation field EXC comprises a piece that originates from the adiabatic term given in Eq. (1.44). On the level of the ALDA, we have eq dvXC (n) ALDA EXC (rω) = −∂r n(r, ω) (1.58) dn eq n (r)

In addition, EXC also comprises a second piece, which brings in the viscoelastic properties of the correlated electron liquid. This piece is usually ignored in TDDFT, because it is very difficult to formulate in a purely density-based language. This is not surprising, because the viscosity is intimately related to shear forces within the liquid that derive from mixed terms ∂jx /∂y typical of transverse current patterns. Such forces are more naturally described within time-dependent current DFT.30,31 1.7.1.3 Quasi-One-Dimensional Wire We consider as an illustrative example the dc response of a quasi-one-dimensional wire of length L to an electric field in longitudinal direction, E(r) = ez E(z). The dc current, I , is given by

L

I =

dz gKS (z, z )[Eex + Eind + EXC ](z )

(1.59)

dr⊥ dr⊥ σKS (r, r )

(1.60)

0

gKS (z, z ) =

where it was assumed that the longitudinal field components have negligible variation in the perpendicular wire direction r⊥ . Since any configuration of driving fields has as an associated dc current I that is the same for all observation points

TDDFT AND TRANSPORT CALCULATIONS

31

z, we conclude that the kernel (1.60) is independent of its arguments and define a KS conductance: GKS = gKS (z, z ).

L

I = GKS

dz [Eex + Eind + EXC ](z )

(1.61)

0

The first two terms in the integral add up to the physical voltage drop, V , along the wire. The appearance of the third term indicates that the KS particles experience another voltage, which differs by the amount

L

VXC =

dz EXC (z )

(1.62)

0

Remarks

•

The ALDA contribution to the effective driving field is conservative, so it may be written as a gradient of a potential,

L 0

•

n(L) eq ALDA dz EXC (z ) = −vXC (n(z))n(0)

As long as observation times are considered such that the effect of the charge transfer on the local charge density is still negligibly small (long wire limit), we can take n(L) = n(0), so that the ALDA contribution vanishes (for macroscopically homogeneous wires). Nonzero contributions to VXC come from the viscous term. The viscosity tends to reduce the response of the electron liquid to external forces. Density functional theories take this behavior into account by “renormalizing” the true forces with EXC . On a very qualitative level, the viscous forces tend to hinder the current flow through narrow constrictions with “sticky” walls. For this reason, their effect has been investigated in the context of current flows through single molecules.33 However, as pointed out previously19 (and what underlies the debate34,35 ), borrowing concepts from hydrodynamics to apply them on the molecular scale is not straightforward—for example, the viscosity: This describes how much momentum is transferred per time from a fast-moving stream to a neighboring one that flows into the same direction but with a lower speed. On a microscopic level, momentum exchange is mediated via collisions between the flowing particles. Therefore, it is clear that a description in terms of the macroscopic parameter “viscosity” can be valid only on length and time scales that substantially exceed the interparticle scattering length and time. Both scales become very large in fermion systems at low temperature, and in particular can easily exceed the dimensions of those atomistic or molecular systems that one would like to treat. Applications in mesoscopic semiconductors enjoy a much better justification.

32

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

1.7.2 Scattering Theory

The linear response theory is a framework for calculating the dynamical reaction to linear order in the probing field of any many-body system. Its advantage is that it is completely generally applicable. For the same reason, situations are easily identified, where alternative formalisms are better adapted and therefore allow a simpler and more transparent analysis. In this section we consider an example thereof—the dc transport through a quantum dot (e.g., a molecule) which has been wired to a left and a right reservoir (see Fig. 1.4). We consider quasi-one-dimensional well-screened wires, so that particles inside the wire do not interact with each other. The traveling waves along the wire are categorized by scattering states. Each such state is equipped with a continuous longitudinal degree of freedom associated with a wavenumber, k, a discrete transverse degree, the channel index n [which should not be confused with the particle density n(r)], and a dispersion relation En (k). In this language the current flowing through the wire is described by a superposition of scattering states. How the particles that enter the wire from a reservoir distribute over the available scattering states is dictated by distribution functions, fL,R (E), which are properties solely of the left and right reservoirs. The specifics of the quantum dot enter the construction of the scattering states in terms of the reflection and transmission coefficients, r˜nn (E, E ) and t˜nn (E, E ). They describe the probability amplitude for a particle that approaches the quantum dot with energy E in channel n to be either reflected or transmitted into the channel n with energy E . 1.7.2.1 Landauer Theory The scattering description is particularly convenient if scattering is elastic, so in each single scattering process the state of the quantum dot is preserved; in particular, each scattering event conserves the energy of the incoming particle, E = E . Under this specific condition, the current is simply given by the Landauer formula, (1.63) I = dE T (E)[fL (E) − fR (E)]

n

n′

k

t k

k′

r

Fig. 1.4 (color online) Wiring a molecule to source and drain reservoirs: scattering states description with longitudinal (k) and transverse (n) quantum numbers.

TDDFT AND TRANSPORT CALCULATIONS

33

with a transmission function T (E) =

|tn n (E)|2 ≡ Tr tt †

(1.64)

nn

where tn n = t˜ν ν (vν /vν )1/2 , with vν = ∂εν (k)/∂k being the group velocity of particles traveling in channel n with energy E. Here we follow the common convention that each reservoir acts as a thermal bath characterized by a temperature and an electrochemical potential, μL,R . Then the distributions fL,R are simply Fermi functions with bath parameters. 1.7.2.2 Scattering Theory and TDDFT: Relaxation Problem Scattering theory describes a nonequilibrium situation that is (quasi-)stationary in time. Even though a current flows, expectation values of local (intensive) operators, in particular of jˆ(r) and n(r), ˆ are time independent.† By contrast, TDDFT has been developed to describe the time evolution of the density, n(rt), under the action of a time-dependent potential, φex (t), away from some initial condition. Both approaches may apply simultaneously if in the course of time evolution a quasistationary nonequilibrium situation develops.36 – 38 This can happen if the superposition of φex (t) and the induced field, vind (t), shifts the electrochemical potentials of the two reservoirs against each other:

[vex (rt) + vind (rt)]RL

→ μR − μL

tτtrans

(1.65)

Then, after waiting a time τtrans in which transient dynamic phenomena have died out due to internal relaxation processes, a flow may establish that indeed it is quasistationary. The current will be monitored properly by TDDFT, since it equals the flux of particles out of one of the reservoirs: I = N˙ L = −N˙ R . In this quasistationary regime, by definition the particle and current densities are time independent. One might then suspect that the KS potentials should also have become stationary. This point is perhaps not quite as obvious as it might look. Namely, the fact that the density is time independent by itself does not always imply that the Hamiltonian is stationary. For example, homogeneous ring systems that close around time-dependent fluxes can exhibit time-dependent ring currents that leave the density completely invariant. To exclude such artifacts, one can operate with probing fields φex (t) that couple to the density itself and that become time independent after switching on. Then, at least in the linear response regime, functionals are guaranteed to become time independent, since they derive from linear-response kernels [Eq. (1.43)] (see the remark below). Once we accept that potentials become stationary, we may define scattering states. However, whether this concept is useful or not depends on whether one † We are assuming here that the reservoirs are ideal. They remain in thermodynamic equilibrium with fixed temperature, chemical potential, and so on, even in the presence of a current flow. In reality, this condition requires a separation of scales: macroscopic reservoirs and microscopic currents.

34

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

can identify the rules pertaining to how the physical current should be constructed from them. Whether or not the same rules apply for the KS scattering states of TDDFT that work for the truly noninteracting case is not a priori clear, however. Indeed, after switching on the bias voltage, V , the workfunction of each reservoir shifts against the vacuum level. Apart from this effect, each reservoir stays in complete thermal equilibrium due to their macroscopic size each all the time. According to the general principles of the DFT outlines in earlier sections, the distribution function of KS particles inside each reservoir should still be given by fL,R with the appropriate chemical potentials μL,R and eV = μL − μR , as usual. This was the point of view that has been adopted elsewhere.36 However, this conclusion is not fully consistent with a result that we derived above. Namely, as we have seen in the linear response theory, the KS voltage does not in general coincide with the difference of the reservoir workfunctions. This effect has been incorporated37,38 using Fermi functions with chemical potentials that do not coincide with physical values. Here it remains an open question as to how this finding could be reconciled with the requirement that each reservoir must stay in its own equilibrium. This apparent inconsistency of DFT-based scattering theory at the moment is seemingly unresolved. Remarks • The precise conditions under which a nonequilibrium current flows in a quasistationary manner are very difficult to state. That flow at small enough currents is always quasistationary is supported by linear response analysis. It suggests (1) that linear responses to a sufficiently weak field never mix frequencies (i.e., they simply follow the external stimulus in time). Furthermore, (2) slow-enough driving fields, ωτtrans 1, signalize the dc behavior. So, combining (1) and (2), one concludes that the linear regime should always be quasistationary. • A breakdown of the quasistationary regime at sufficiently large currents is suggested by analogy to hydrodynamics as described by the Navier–Stokes equations. Here it is known that a laminar (i.e., quasistationary) regime should be separated from turbulence that develops at larger currents. Since at least on a qualitative level, the micro- or nanoscopic flow of the electron liquid is also a hydrodynamic phenomenon, a “turbulent” regime could exist here as well. This is also supported from the observation that the TDDFT equations are nonlinear in the density and therefore should host chaotic regimes. 1.8 MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM 1.8.1 External and Internal Hilbert Spaces

Scattering theory operates in a basis of scattering states; that is, it uses those quantum numbers that reflect the behavior of wavefunctions in the asymptotic (i.e., free of scattering potential) region of space (the external Hilbert space).

MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM

35

HC HL

u

u

HR

HC

Fig. 1.5 (color online) Partitioning of the scattering zone near a molecule or quantum dot underlying the Hamiltonian equation (1.66).

For some applications, this representation is suboptimal. From a computational perspective, this can happen if the Hilbert space of states in the vicinity of the scatterer (the internal or microscopic Hilbert space) is very large or complicated, so that computations do not allow us to keep explicit track of additional degrees of freedom. For example, if one is to describe the current flow through a molecule (molecular electronics)or a quantum dot, one can keep molecular states that incorporate the molecule itself plus the states of a few lead atoms. The entire contact, which encompasses 1023 atoms, can certainly not be dealt with in a computer. In more technical terms, we consider a partitioning of the system into left and right asymptotic regions, which are connected by a center region as given in Fig. 1.5 and detailed in the Hamiltonian ⎞ ⎛ 0 HL u† (1.66) H = ⎝ u HC v ⎠ 0 v † HR The matrices HL,R comprise all the leads and are macroscopic, whereas HC describes only the scattering region and therefore should have a microscopic size. If HC is still very complicated, a formulation is desired that does not refer explicitly to the external, macroscopic Hilbert space (leads and reservoirs) but just focuses on the internal space. Roughly speaking, one would like to convert the trace over the external, channel degrees of freedom [Eq. 1.64] into another trace, which is only over the internal space of the molecule or quantum dot. A formal way to derive such a representation employs the Keldysh technique, also referred to as the nonequilibrium Green’s function method .39 For noninteracting particles it yields predictions for physical observables which are identical to the scattering theory. Similar to earlier authors,40 we employ the latter method here to derive the key formulas that underlie a great many applications of ab initio transport calculations for nanostructures. 1.8.2 Born Approximation, Tˆ -Matrix, and Transmission Function

Consider the situation where the left and the right leads are decoupled, u = v = 0 at t = 0. As before, we denote their eigenstates by a pair of indices |nk (left) and |n k (right). When contact is established at t = 0, an initial state |nk becomes unstable. It can decay into the state |n k . The rate for this process is given

36

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

to lowest order by the Born approximation, which is equivalent to the familiar “golden rule” when applied to the scattering problem: ˆ 2 τ−1 n n (En (k)) = 2πδ(En (k) − En (k ))|n k |T (En (k)|nk|

(1.67)

Here, we have already refined the bare expression by introducing the Tˆ -matrix , which makes it formally exact. How to relate Tˆ to the original Hamiltonian, (1.66), will be shown in Section 1.8.3. The right-going current injected in this way from a left-hand-side wire state |nk into the right lead is just dk τ−1 n n (En (k))fL (En (k))(1 − fR (En (k ))) n

where fL (En (k)) is the occupation of the initial state and 1 − fR (En (k )) is a measure of the available space in the final state. The total current is the difference between all right- and left-flowing components: (1.68) dk dk τ−1 I =e n n (En (k))[fL (En (k)) − fR (En (k ))] n n

Comparing this expression with the Landauer formula, Eq. (1.63), we conclude that (1.69) dk dk δ(E − En (k))τ−1 T (E) = n n (E) n n

= (2π)2

dk dk δ(E − En (k))δ(E − En (k ))|n k |Tˆ (E)|nk|2

n n

(1.70) =

(2π)2 |n k |Tˆ (E)|nk|2 |v v | n n

(1.71)

nn

where the last line should be complemented with E = En (k) = En (k ). Keeping Eq. (1.64) in mind, we have the identification (up to a phase factor) tn n = √

2π n k |Tˆ (E)|nk |vn vn |

(1.72)

Equation (1.70) has a compact notation if one introduces separate traces TrL,R,C over the Hilbert spaces of HL,R,C : T (E) = (2π)2 TrR [δ(E − HR )Tˆ (E)δ(E − HL )Tˆ † (E)]

(1.73)

MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM

37

1.8.3 Tˆ -Matrix and Resolvent Operator

We now specify how to relate Tˆ to the original Hamiltonian, H , detailed in Eq. (1.66). Our derivation starts with the observation that all information about transport across the center region is encoded in the resolvent operator, G(z) =

1 z−H

(1.74)

Retarded (advanced) operators are defined via Gret (E) = G(E + iη)[Gav (E) = G(E − iη)]; the matrix elements x|Gret,av (E)|x define the Green’s functions.† Actually, we care only for transfer processes, so only those matrix elements n k |G(z)|nk are of interest that connect states in the left and right leads. The corresponding off-diagonal sector of the full resolvent matrix may be obtained from an elementary matrix inversion. Its matrix elements have the property n k |G(z)|nk = n k |gR (z)[v † GC (z)u]gL (z)|nk

(1.75)

The matrix product that appears here inside · · · has the form familiar from the Dyson equation in T -matrix notation41 : G = G0 + G0 Tˆ G0

(1.76)

where G−1 0 = z − H0 is the bare Green’s function in the absence of an interlead coupling, u, v = 0. In Eq. (1.75) the first term in the Dyson equation is missing, since the off-diagonal matrix elements that connect different leads vanish if there is no transmission. Thus it is clear that the desired relation is just Tˆ (z) = v † GC (z)u

(1.77)

with the resolvent operators of the central region and the leads 1 z − HC − R − L 1 gR,L (z) = z − HR,L GC (z) =

(1.78) (1.79)

and self-energies L (z) = ugL (z)u†

R (z) = vgR (z)v †

(1.80)

† The infinitesimal parameter η in Eq. (1.74) shifts the poles of G into the complex plane. In this way it is ensured that the density of states, −(1/π)G(E + iη), becomes a smooth function of energy. Otherwise, the Hamiltonian (1.66) could not model metallic reservoirs, which by definition have a smooth, nonvanishing density of states near the Fermi energy.

38

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Notice that G and R,L act on the Hilbert space of HC only, whereas gR,L acts on the spaces of HR,L . With this result, we can rewrite Eq. (1.73), av T (E) = TrC [L Gret C (E)R GC (E)]

(1.81)

where we have introduced L = 2πuδ(E − HL )u†

R = 2πvδ(E − HR )v †

(1.82)

ret † so that R,L = −2 R,L . Equation (1.81) is the desired relation. The leads appear only implicitly in the self-energies, L,R ; they have been “integrated out.”

Remarks

•

•

Formula (1.81) is most useful whenever (1) one can give recursive algorithms, so can be calculated without having to deal with the full Hilbert space at a time, or (2) one can design approximations for so that it is not necessary to deal with the Hilbert space of the leads at all. One can argue that simple but highly accurate approximations can indeed be given if HC is “large enough”, (i.e., comprises a sufficiently large part of the leads). Almost all scientific works that perform a channel decomposition begin by rewriting Eq. (1.81), which employs the matrix 1/2

1/2

τ = L GC R

(1.83)

so that by construction, T (E) = TrC ττ† . Authors interpret τ as a transmission matrix and hence identify the eigenvectors of ττ† as the transmission channels. We wish to point out here that this widespread practice has to be taken with a grain of salt. 1. The trace in Eq. (1.81) is over the states of the central region and not over the (transverse) Hilbert space of the leads. Ironically, this is why we have derived it in the first place. Therefore, the matrix product in TrC [· · ·] acts on a Hilbert space that is disconnected from the transverse lead space, where the product tt † that appears in the Landauer formula, Eq. (1.63), lives. Hence, the channels of the leads and the eigenvectors of ττ† have nothing to do with each other. 2. In particular, τ should not be confused with the true transfer matrix t, given in Eq. (1.72). 3. One of the irritating artifacts that an uncontemplated adoption of this practice may prompt is related to the fact that the size of the central Hilbert space is a matter of convention. For this reason, the common channel analysis produces results that cannot be, in general, model † We

have used δ(E) = (i/2π)[G(E + iη) − G(E − iη)].

MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM

39

independent. For example, the number of transmitting states (evanescent and propagating ones) may increase with the Hilbert space size. A more detailed discussion of this and related issues can be found elsewhere.42,43 1.8.4 Nonequilibrium Density Matrix

So far, we have used scattering theory to describe the current flow through a nanojunction or molecule. A very similar analysis allows us to derive even a slightly more general object, the density matrix, ρ(x, x ), in the presence of nonequilibrium. It is a matrix representation of the operator dk |nkr r nk|fL (En (k)) + dk |n k l l n k |fR (En (k )) (1.84) ρˆ = n

n

where |nkr (|n k l ) denote the right (left)-going states emerging from the left (right) electrodes. The diagonal elements are of particular importance, since they give the particle density, n(x) = ρ(x, x), at any position x: dk |x|nkr |2 fL (En (k)) ρ(x, x) = n

+

dk |x|n k l |2 fR (En (k ))

(1.85)

n

In this section we repeat what we did in the previous section for the Landauer formula, but now for the density matrix. We derive an expression that relates those elements of ρˆ from the central Hilbert space only, in terms of GC and L,R alone. Indeed, consider the expression for the equilibrium density per spin inside the central region: neq (x) = dE x|δ(E − H )|xf eq (E) (1.86) Employing a series of standard transformations, which rely upon nothing but the definitions given in the preceding section, we may cast it into a form that is already similar to Eq. (1.85): 1 eq av eq (1.87) dEx|Gret n (x) = − C (E) − GC (E)|xf (E) 2iπ ret av 1 ret eq dE x|Gret =− C (E) L + R GC (E)|xf (E) (1.88) π 1 av eq (1.89) dE x|Gret = C (E) [L + R ] GC (E)|xf (E) 2π

40

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

=

n

+

2 eq dk |x|Gret C (En (k))u|nk| f (En (k))

2 eq dk |x|Gret C (En (k ))v|n k | f (En (k ))

(1.90)

n

The states |nk (|n k ) denote the eigenstates of the left (right) lead in the absence of a coupling, u, v = 0. Comparing Eq. (1.90) with the equilibrium limit of Eq. (1.85), f eq = fL = fR , suggests the identification x|nkr = x|Gret C (En (k))u|nk l

x|n k =

x|Gret C (En (k ))v|n k

(1.91) (1.92)

for point x inside the central region. The educated reader may recognize the relations above as an incarnation of the well-known Lippmann–Schwinger equation. Thus equipped, we rephrase the original expression for the density operator in the following way: dE ret ret av (1.93) [G L Gav ρˆ = C fL (E) + GC R GC fR (E)] 2π C which is valid inside the central region (matrix notation suppresses the argument energy, E). This equation is the main result of the present section. Needless to say, by differentiating off-diagonal elements of ρˆ , the current density and therefore also the Landauer formula may be rederived. 1.8.5 Comment on Applications

By far the largest fraction of the vast body of DFT-based transport literature employs scattering theory in the formulation of the preceding section. The logic is that one solves the KS equations (1.39) with a particle density, n(x), which is calculated from the nonequilibrium density operator (1.93), which also takes the reservoirs into account. The KS-Hamiltonian is then used, in turn, to construct the central Green’s function and finally, also, the transmission function, (1.81), and the current, (1.63). In this final section we comment briefly on several general aspects of this research. Also, practical aspects of applications in spintronics and molecular electronics are highlighted in Chapters 18 and 19, respectively. Transmission functions, T (E), are of interest mostly near the Fermi energy, EF , since one has for the zero-bias conductance, G = T (EF ). In this region, T (E) usually is dominated by the resonances originating from just two (transport) frontier orbitals. Calculations should yield the positions EHo, Lu and the broadenings Ho, Lu of the resonances. In the case of resonances that do not interfere with others (isolated resonances), these parameters may be extracted by simply fitting a Breit–Wigner (Lorentzian) lineshape to T (E). Sometimes more complicated situations exist,

MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM

41

where electrons can flow through the molecule via different paths that interfere with each other.44 In this case the lineshape is not just a Lorentzian, but may, for example, be of the Fano type. Also, this structure is characterized by very few parameters only, which may be extracted from a suitable fit. The numerical accuracy of both types of parameters, resonance positions and line widths, that one can get from the DFT-transport calculation depends on the approximations made in the underlying exchange correlation (XC) functional, of course. In transport calculations additional complications arise due to the presence of the electrodes (or reservoirs), which make it necessary to find a good approximation for the self-energies R,L . 1.8.5.1 Self-Energies R,L The self-energies are crucial for the calculation of the resonance width. This is obvious, since without them, R,L = 0, there would be no level broadening at all: Each transport resonance would be arbitrarily sharp. Therefore, care is needed with the construction of these objects. However, quite in contrast to a widespread perception in the scientific community, it is not necessary—and in practice not even always helpful—to perform an exact construction of R,L along the lines of Eq. (1.80). This point has been made earlier19,45,46 and we rephrase it here. Consider the KS equation of the central region in the presence of a coupling to the electrodes:

[E − HC − L (E) − R (E)]| = 0

(1.94)

The Hermitian sector of adds to the Hamiltonian HC and therefore shifts the bare eigenvalues of HC . The anti-Hermitian sector, L,R , leads to a violation of the continuity equation; it shifts eigenvalues away from the real axis into the complex plane, thus providing a finite lifetime. The physics that is incorporated in this way is transparent: Any traveling wave that moves toward the interface between the central region and the left and right electrodes will just penetrate it without being backscattered. From the viewpoint of the central system, the interface is absorbing. It is well known since the early days of nuclear physics that proper modeling of absorbing boundaries is via optical (i.e., non-Hermitian) potentials. This is exactly what the self-energy does. With this picture in mind, it is obvious that an interface modeling of L,R with the property that incident waves are fully absorbed will give the same values for positions and lifetimes of transport resonances. Therefore, as long as the boundary of the central region does not itself hinder the current flow, a modeling of in terms of an optical potential will give accurate results. All the material specifics that are contained in the exact L,R matrices can readily be ignored. To meet the condition for simple modeling, in practical terms the central region should comprise pieces of the electrodes that are large enough. Then complete absorption may be achieved with a leakage rate per interface site η that is still sufficiently small, to prevent feedback into the resonance energies.

42

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

1.8.5.2 System-Size Dependency: Separation of Scales To the best of our knowledge, all prominent DFT-based transport codes work with approximated self-energies. Unfortunately, a systematic check of quantitative results on the approximation scheme used is still not a standard procedure. If optical potentials with strength η are employed, the transmission resonances, , that we would ultimately like to calculate should be invariant under a change of η by a factor of 10 or more. The existence of such an invariance is a consequence of a separation of scales. The transport resonances reflect the lifetime of a state located in that subregion (“bottleneck”) of the central region, which determines the resistance (see Fig. 1.5). If the particle has escaped this region, it vanishes into the leads once and for all—in reality. To catch this aspect, the modeling parameter η has just to be big enough to prevent the model particle from returning to the bottleneck. If the size of the central region is taken sufficiently large, much larger than the bottleneck, one can allow for η , and a separation of scales has been achieved. Remark

•

Self-energies, , offer a rich toolbox for including effects of reservoirs with precision without keeping a large number of degrees of freedom explicit in the calculations. Recent applications of the principle describe systems with an inhomogeneous magnetization.47 Also in this context, working with model self-energies rather than (formally) exact expressions proves reasonably accurate and highly useful.48

Acknowledgments

In this chapter I give a pedagogical introduction to the field, which has grown partly out of several lectures given at Karlsruhe University in recent years. This explicit style is at the expense of accounting for a great many interesting developments pursued by many of my colleagues. Therefore, the chapter cannot serve as—and certainly has not been meant to be—a fair and proper review of the field. Finally, it is a pleasure to thank numerous colleagues for generously sharing their insights with me. Most notably, I am indebted to Alexei Bagrets, Kieron Burke, Peter Schmitteckert, and Gianluca Stefanucci for useful discussions that took place over recent years. Also, I am grateful to Alexei Bagrets and Soumya Bera for critical proofreading of the manuscript.

REFERENCES 1. 2. 3. 4. 5.

K¨ummel, A.; Kronik, L. Rev. Mod. Phys. 2008, 80 , 3. Neese, F. Coord. Chem. Rev . 2009, 253 , 526–563. Hohenberg, P.; Kohn, W. Phys. Rev . 1964, 136 , 864. Levy, M. Proc. Natl. Acad. Sci. USA 1979, 76 , 6062. Gunnarsson, O.; Lundqvist, B. I. Phys. Rev. B 1976, 13 , 4274; ibid., 1977, 15 , 6006.

REFERENCES

43

6. Mahan, G. D. Many Particle Physics, Plenum Press, New York, 2000. 7. Parr, R.; Yang, W. Density Functional Theory of Atoms and Molecules, Oxford University Press, New York, 1989. 8. Igor, V.; Ovchinnikov,; Neuhauser, D. J. Chem. Phys. 2006, 124 , 024105. 9. Kohn, W.; Sham, L. J. Phys. Rev . 1965, 140 , 1133. 10. Ullrich, C. A.; Kohn, W. Phys. Rev. Lett. 2002, 89 , 156401–1. 11. Chayes, J. T.; Chayes, L.; Ruskai, M. B. J. Stat. Phys. 1985, 38 , 497. 12. Ho, K. M.; Schmalian, J.; Wang, C. Z. Phys. Rev. B 2008, 77 , 073101. 13. Burke, K. The ABC of DFT, chem.ps.uci.edu, 2007. 14. Grimme, S. J. Comput. Chem. 2004, 15 , 1463. 15. Janak, J. F. Phys. Rev. B 1978, 18 , 7165–7168. 16. Almbladh, C.-O.; von Barth, U. Phys. Rev. B 1985, 31 , 3231. 17. Perdew, J. P.; Parr, R. G.; Levy, M.; Balduz, J. L. Phys. Rev. Lett. 1982, 49 , 1691. 18. Perdew, J. P.; Levy, M. Phys. Rev. Lett. 1983, 51 , 1884. 19. Koentopp, M.; Burke, K.; Evers, F. Phys. Rev. B 2006, 73 , 121403. 20. Dreizler, R. M.; Gross, E. K. U. Density Functional Theory, Springer-Verlag, Berlin, 1990. 21. Marques, M. A. L.; Ullrich, C. A.; Nogueira, F.; Rubio, A.; Burke, K.; Gross, E. K. U., Eds. Time-Dependent Density-Functional Theory, Springer Lecture Notes in Physics, Vol. 706. Springer-Verlag, Berlin, 2006. 22. Runge, E.; Gross, E. K. U. Phys. Rev. Lett. 1984, 52 , 997. 23. van Leeuwen, R. Phys. Rev. Lett. 1998, 80 , 1280. 24. van Leeuwen, R. Phys. Rev. Lett. 1999, 82 , 3863. 25. Vignale, G. Phys. Rev. B 2004, 70 , 201102. 26. Burke, K.; Car, R.; Gebauer, R. Phys. Rev. Lett. 2005, 94 , 146803. 27. D’Agosta, R.; Di Ventra, M. Phys. Rev. B 2008, 78 , 165105. 28. Hyldgaard, P. Phys. Rev. B 2008, 78 , 165109. 29. Onida, G.; Reining, L.; Rubio, A. Rev. Mod. Phys. 2002, 74 , 601–659. 30. Vignale, G.; Kohn, W. Phys. Rev. Lett. 1996, 77 , 2037–2040. 31. Vignale, G.; Ullrich, C. A.; Conti, S. Phys. Rev. Lett. 1997, 79 , 4878. 32. van Leeuwen, R. Int. J. Mod. Phys. B 2001, 15 , 1969. 33. Sai, N.; Zwolak, M.; Vignale, G.; Di Ventra, M. Phys. Rev. Lett. 2005, 94 , 186810. 34. Sai, N.; Zwolak, M.; Vignale, G.; Di Ventra, M. Phys. Rev. Lett. 2007, 98 , 259702. 35. Jung, J.; Bokes, P.; Godby, R. W. Phys. Rev. Lett. 2007, 98 , 259701. 36. Evers, F.; Weigend, F.; Koentopp, M. Phys. Rev. B 2004, 69 , 235411. 37. Stefanucci, G.; Almbladh, C.-O. Europhys. Lett. 2004, 67 , 14. 38. Stefanucci, G.; Almbladh, C.-O. Phys. Rev. B 2004, 69 , 195318. 39. Meir, Y.; Wingreen, N. S. Phys. Rev. Lett. 1992, 68 , 2512. 40. Khomyakov, P. A.; Brocks, G.; Karpan, V.; Zwierzycki, M.; Kelly, P. J. Phys. Rev. B 2005, 72 , 035450. 41. Ferry, D. K.; Goodnick, S. M. Transport in Nanostructures, Cambridge Studies in Semiconductor Physics and Microelectronic Engineering, Cambridge University Press, New York, 1997.

44

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

42. Bagrets, A.; Papanikolaou, N.; Mertig, I. Phys. Rev. B 2007, 75 , 235448. 43. Solomon, G. C.; Gagliardi, A.; Pecchia, A.; Frauenheim, T.; Di Carlo, A.; Reimers, J. R.; Hush, N. S. Nano Lett. 2006, 6 , 2431–2437. 44. Cardamone, D. M.; Stafford, C. A.; Mazumbdar, S. Nano Lett. 2006, 6 , 2422. 45. Evers, F.; Arnold, A. Molecular conductance from ab initio calculations: self energies from absorbing boundary conditions, arXiv:cond-mat/0611401, Lecture Notes, Summerschool on Nano-Electronics, Bad Herrenalb, Germany, 2005. 46. Arnold, A.; Weigend, F.; Evers, F. J. Chem. Phys. 2007, 126 , 174101. 47. Jacob, D.; Rossier, J. F.; Palacios, J. J. Phys. Rev. B 2005, 71 , 220403. 48. Bagrets, A. Unpublished, 2009.

2

SIESTA: A Linear-Scaling Method for Density Functional Calculations JULIAN D. GALE Department of Chemistry, Curtin University, Perth, Australia

This chapter provides a practical overview of the basic theory required to perform density functional calculations on nanoparticles, materials, and large biological systems using the SIESTA program. This program uses discrete atomic basis sets to enable rapid interpretation of results in terms of chemical models, a feature key to many applications, including an understanding of transport properties of materials. It achieves linear scaling (the computer resources required scale linearly with system size for very large systems) using basis set confinement techniques. Many examples of the use of SIESTA are provided in Chapter 11.

2.1 INTRODUCTION

The past two decades have seen the rise of density functional theory (DFT) from a technique largely confined to solid-state physics to arguably the most popular quantum mechanical technique, embraced by chemists, geologists, and most scientific disciplines concerned with the atomic structure of nature. This popularity has arisen largely from its ability to provide a reasonable quality description of properties at a relative modest computational cost in comparison to traditional wavefunction theory–based approaches. Whereas DFT in its purest sense is an exact theory,1 the practical realization through modern functionals is recognized as having several limitations, including the lack of a pathway for continuous improvement of the answers in the manner possible within postHartree–Fock techniques. Despite such caveats, there are many systems for which density functional theory is a valuable and worthwhile approach.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

45

46

SIESTA: A LINEAR-SCALING METHOD FOR DFT

In this chapter we do not set out to critique the use of DFT, but assume that the reader has already studied Chapter 1, which covers this approach to electronic structure theory, and determined that it represents an appropriate choice to solve the problem of interest. Instead, we focus on another aspect of DFT that has led to its widespread use: the plurality of numerical implementations of the method and the availability of efficient software. Because of the focus on the density for the exchange and correlation potentials, which typically represent the most complex contributions to calculate within electronic structure theory, Kohn–Sham DFT has lent itself to a far greater diversity of practical calculation schemes. While wavefunction theory (WFT) has been dominated by the use of Gaussian basis sets to expand the eigenstates (see, e.g., Chapter 5), DFT has seen a plethora of choices, including plane waves (see Chapter 3), Slater orbitals (see Chapter 15), Gaussians, grids, finite elements, and wavelets, to name but a few. Nanoscience has pushed experiments to the lower limits of the length scale for the fabrication of materials. Conversely, for computational methods it has led to push toward calculations with a greater number of atoms than ever before. Given that many nanoscale phenomena are related to the effects of quantum confinement on electronic properties, this has, in particular, driven the desire to perform largescale theoretical studies based on electronic structure techniques rather than forcefield approaches. Although simplified quantum mechanical approaches, such as tight binding (see Chapter 10) or semiempirical (see Chapter 8) methods, have a valuable role to play in this realm, ideally it would be possible to use firstprinciples methods to ensure the reliability of results. In light of the above and the fact that there are many different numerical schemes for density functional theory, it is possible to reconsider the choice of algorithms and ask what represents the optimal implementation for large systems? Although there will never be an unambiguous answer to this question, we can define the key characteristics of any such method. First, the method must scale with the lowest power possible of the size of the system, typically related to the number of basis functions required, N , or number of atoms. Second, the cost per basis function, which represents the prefactor, or slope of the cost versus system size, must be as low as possible. If we consider Hartree–Fock or Kohn–Sham theory specifically, there are two main steps in a calculation: the construction of the Hamiltonian matrix and determination of the eigenstates at a self-consistent field. For small system sizes in a localized basis set, such as Gaussians, the first step is the dominant expense and scales formally as N 4 , since the Hartree energy depends on the interaction of two density contributions and therefore up to four different basis functions. However, this can be reduce to N 3 for Kohn–Sham theory via density fitting.2 In practice, for large systems, the scaling is typically reduced through neglect of terms against a threshold. As system size increases, the solution for the eigenstates becomes the major cost since they must be orthogonalized with respect to each other, which leads to a scaling of N 3 . The key to achieving improved scaling is locality, which is usually considered to be in real space. For example, if an atom were only to interact with other

INTRODUCTION

47

particles out to a given radius, then once the dimensions of the system exceed this cutoff value, the number of interactions per atom remains constant regardless of increasing size. In other words, the total cost will scale linearly with the dimensions of the system. This will be equally true regardless of whether the system is a finite nanoparticle or a periodic solid. This raises the question of whether it is likely to be feasible in electronic structure theory to confine interactions to within a finite range. Given the central role of the long-range Coulomb potential in the Hamiltonian, at first sight it might be thought that this would not be possible. However, through screening, it turns out that even such interactions lead to quite short-ranged behavior in real space, leading to the near-sightedness principle.3 For example, in an insulating or semiconducting material it is known that states decay exponentially with distance, where the rate of decay depends on the bandgap of the substance. Even metals, where there is no gap, exhibit power-law decay behavior. Provided that it is possible to reformulate density functional theory in a way which ensures that both the generation and solution of the Kohn–Sham equations exploits the inherent locality that exists in many systems, it should be possible to achieve linear scaling of the computational expense for large enough problems. The challenge then becomes to lower the prefactor (i.e., cost per atom) sufficiently that the crossover point at which such algorithms become more efficient than traditional ones is as low as possible. Linear-scaling methods will only be of value if this occurs for numbers of atoms that are currently accessible and of interest for scientific study. Although the specific crossover point can vary strongly according to the details of the method, linear-scaling methods typically become competitive with established algorithms for a few hundred atoms in density functional theory. Having set the scene, the objective of this chapter is to present an overview of one approach for achieving linear-scaling density functional theory, known as the SIESTA methodology4 and embodied in the code of the same name. This is just one of several possible methods, and a list of some of the other most widely used candidates is given in Table 2.1. It would take too long to review the relative strengths and weaknesses of each particular implementation. However, the main differences between methods usually involve a compromise between the ability to have a systematically improvable basis set (similar to the manner that is possible with plane waves) and the lowering of the prefactor of the linear scaling, which requires the most compact basis set representation. To place the SIESTA approach in context, it targets the lowest prefactor by using physically motivated basis functions while sacrificing the arbitrary convergence with respect to the size of the basis. The aim of this chapter is to provide a conceptual and practical guide to the use of SIESTA that will be useful to those encountering the program that implements the methodology for the first time. For full mathematical details of the SIESTA methodology we refer the reader to the original manuscripts where this can be found.4 Although the focus will be specifically on SIESTA, it is hoped that an understanding of the motivation and background will also be valuable to those wishing to engage in linear-scaling electronic structure theory, regardless of the particular implementation.

48

SIESTA: A LINEAR-SCALING METHOD FOR DFT

TABLE 2.1 Various Methodologies for Linear-Scaling Density Functional Theory, Classified According to the Nature of the Basis Functionsa Basis Set Gaussian atomic orbitals Gaussians/plane waves Numerical atomic orbitals Blips Periodic sync functions

Implementation

Availability

FreeOn (MONDO set) GAUSSIAN Q-CHEMb QUICKSTEP

GPL Commercialc Commercial GPL

SIESTA PLATO OpenMX CONQUEST

Free to academics Contact authors GPL Contact authors (GPL proposed) Commercial

ONETEP

Ref. 5 6 7 8 9 10 11 12

a

Note that this tabulation aims to highlight the most widely known implementations rather than being exhaustively comprehensive. It is also subject to constant change due to developments in the field. b The construction of the Fock matrix can be linear scaling, but diagonalization is used to solve the SCF. c Features required for a fully linear-scaling calculation may not be available in the distributed version.

2.2 METHODOLOGY 2.2.1 Density Functional Theory

The fundamentals of density functional theory were outlined in Chapter 1, so only a concise statement of the relevant aspects is made here. For the purposes of the present discussion, we focus solely on the Kohn–Sham formulation of DFT, where a set of orthogonal wavefunction-like one-electron states are introduced to facilitate calculation of the kinetic energy, and the exchange-correlation potential is formulated as a local functional of the density and, where appropriate, its curvature. Thus, we will consider the linear-scaling implementation of the local density approximation (LDA) and the generalized-gradient approximation (GGA) formulations of DFT.13 Extension to other forms of approximation, such as metaGGAs,14 hybrid functionals,15 or LDA + U16 is possible, but beyond the scope of the present chapter. 2.2.2 Pseudopotentials

When solving for the electronic structure of a system, in principle, all electrons must be included since they contribute to the potential experienced by other particles and determine the nodal structure of the eigensolutions. In practice, it is intuitive that the core electrons of an atom are weakly perturbed by chemical changes to the geometry and bonding arrangements, in comparison to the valence

METHODOLOGY

49

electrons, and therefore, several approximate methods have evolved to treat these core states in order to reduce computational expense. At the simplest level, the frozen-core approximation can be made in which the occupancy of the core states is fixed to remove them from the self-consistent procedure. Alternatively, the core electrons and nucleus, which have opposite sign charges and therefore partially cancel each other, can be replaced by a combined effective potential, known as a pseudopotential . In brief, the concept of a pseudopotential is that it replaces the exact potential due to nucleus and core electrons, within a given radius of the atomic center, by an effective potential. Within this distance, known as the core radius, the potential is smoothed and tends to a finite value at the nucleus while matching the true potential at the boundary. Due to the smoothing of the potential, the radial nodes of the valence states are lost in the core region since there is no longer a requirement to maintain orthogonality to the core states. In nearly all cases, a nonlocal pseudopotential is used, which implies that there is a different potential for each l angular momentum channel, with a separate core radius, rcore , appropriate to that channel. Outside the core radii, all channels, regardless of angular momentum, experience exactly the same potential, known as the local component. Thus, the nonlocal contribution to the pseudopotential acts only within a small spherical region close to the nucleus. Nonlocal pseudopotentials are most commonly formulated according to the prescription of Kleinmann and Bylander.17 While in many implementations the local component of the pseudopotential is chosen to be one of the angular momentum channels, there is no requirement to do so. Indeed, SIESTA exploits the freedom to select the local component independently and chooses the potential that results from the smooth electron density: sinh(1.82r/rcore ) 2 ρlocal (r) ∝ exp − sinh(1)

(2.1)

The construction of a pseudopotential generally involves satisfying at least four criteria: 1. Boundary matching. Beyond the core radius, the all-electron and pseudowavefunctions must match for each angular momentum channel. 2. Smoothness. Within the core radius, the pseudovalence wavefunction should have no radial nodes. 3. Eigenvalue matching. The eigenvalues for the pseudopotential problem must match the all-electron values for the atomic reference state chosen. 4. Norm conservation. The integral of the valence electron density from the nucleus to the core radius must be equal in the pseudopotential and allelectron cases.

50

SIESTA: A LINEAR-SCALING METHOD FOR DFT

Other conditions may also be imposed; for example, the logarithmic derivative and their first energy derivative may also be required to match outside the core region.18 An all-electron and a pseudo-wavefunction are compared in Fig. 2.1. Although the conditions noted above are necessary for most pseudopotentials, this does not lead to a unique definition of what form the potential should take, so numerous schemes for the generation of pseudopotentials have arisen. In the case of SIESTA, pseudopotentials are usually generated through the use of a separate program known as ATOM, which presently supports three types of pseudopotential; improved Troullier–Martins (TM2),19 Hamann–Schl¨uter–Chiang (HSC),18 and Kerker.20 Of these, the Troullier–Martins scheme has been become the standard choice for use with SIESTA. In the plane-wave community, the use of pseudopotentials is almost mandatory for practical calculations since the effective potential is smoothed out and the nuclear cusp removed, thereby drastically reducing the number of basis functions required to construct the Fourier expansion of the eigenstates. Even when working with localized orbitals there are some benefits to the use of pseudopotentials, aside from the reduction of the number of electrons and orbitals. The core electrons are much more strongly bound than the valence electrons and therefore dominate the total energy. Because electronic structure calculations often rely on computing small energy differences between large total energies, the inclusion of the core electrons can decrease the level of numerical precision in such quantities. Furthermore, as the atomic number of an element increases, it becomes important to correct the calculation for relativistic effects, which most strongly affect the core electrons. Through the use of a pseudopotential it is possible to

Wavefunction

0.6 0.4 0.2 0 –0.2 –0.4

0

1

2

3

4

5

6

Radius (a.u.)

Fig. 2.1 All-electron ( ) versus pseudovalence state (- - -) for the silicon 3s orbital. The core radius for the 3s state is 1.9 a.u. For comparison, a poorly constructed pseudo-3s state (– · –) is included for the case when the core radius is too small (1.1 a.u.), leading to an inner maximum.

METHODOLOGY

51

subsume the majority of the relativistic effects into the effective potential, such that a full relativistic calculation is required for the isolated atom only during generation of the pseudopotential, rather than for the entire problem. Of course, it is important to note that some relativistic effects must be taken into account explicitly when necessary, such as spin-orbit coupling. Recent years have seen a number of developments in the area of pseudopotentials with the advent of the ultrasoft pseudopotential (USP)21 and projector augmented wave (PAW)22 methods. For USPs, the requirement of norm conservation is relaxed and this is compensated for by the addition of an augmentation charge density. The PAW approach focuses on the augmentation of the wavefunction, rather than the density, and thus makes it possible to recover all-electron properties in the frozen core limit. Both methods lead to a dramatic reduction in the reciprocal space cutoff associated with the pseudopotential, which greatly accelerates the computation. In the case of SIESTA, which as we shall see works with real space-localized basis functions, there is likely to be little benefit associated with a switch to either of these more contemporary pseudopotential types, while the complexity of implementation is greatly increased. Consequently, SIESTA continues to employ norm-conserving pseudopotentials, which are generally more robust and easier to construct (see, e.g., an article by Bili´c and Gale23 ). Although it is impossible to give a comprehensive guide to the generation of pseudopotentials, some important general guidelines can be given. 2.2.2.1 Choice of Electronic Configuration When generating a normconserving pseudopotential it is necessary to specify an atomic configuration whose eigenvalues and wavefunctions will be reproduced outside the core region. Usually, this is chosen to be the ground state for the isolated atom. However, for the study of ionic materials there may be merit in using a positively ionized state if this is closer to the real oxidation state of the cation. Although, in principle, a pseudopotential is supposed to be transferable across a range of charge states, it will be more accurately closer to the state for which it is generated. In the case of anions in ionic materials (e.g., the oxide ion), it is not generally a good idea to use the negatively charged state since this will be very diffuse and may be unbound (as is the case for O2− ). 2.2.2.2 Choice of Functional It is important to use the same density functional for generation of the pseudopotential as you intend to employ in the explicit valence calculation. Although the use of an LDA pseudopotential in a GGA calculation can often lead to fortuitously good results with respect to experimental data, it is important to remember that the objective is to reproduce the all-electron limit for a single given functional. 2.2.2.3 Choice of Core Radius The general guiding principle in the choice of the core radius is that a larger radius leads to a softer (and for plane waves, therefore more efficient) pseudopotential, whereas a smaller radius should ensure

52

SIESTA: A LINEAR-SCALING METHOD FOR DFT

greater transferability and reliability. Beyond this broad statement, there are a number of limitations on the upper and lower bounds to the core radius. If the radius becomes too large, there is a risk that the core regions of two adjacent atoms might overlap and this would invalidate the calculation. On the lower bound, the core radius must lie farther from the nucleus than the last radial node of the all-electron wavefunction; otherwise, the removal of nodal structure will not be possible. In practice, making the core radius too small can lead to spurious features in the pseudo-wavefunction, such as inner maxima, due to enforcement of the norm-conversation condition (see Fig. 2.1 for an example of what happens as the core radius becomes too small). The optimal choice for the core radius usually will lie close to the outer maximum in the all-electron wavefunction. With the Troullier–Martins construction scheme, the core radius can lie outside the maximum, and the wavefunction will still be well reproduced beyond the turning point. 2.2.2.4 Choice of Core–Valence Split For many elements, especially those toward the right-hand side of the periodic table, there is no ambiguity as to the valence electrons of an atom. However, for quite a large number of elements there may be cause for careful consideration, depending on the material to be studied. For example, aluminum has the electron configuration [1s2 2s2 2p6 ]3s2 3p1 , where the brackets delimit the conventional core electrons. If one were to perform a study of aluminum nanoparticles, for example, only including the 3s and 3p electrons in the valence would be a reasonable choice, since the atom is close to the charge neutral state. However, if one were instead to study the material Al2 O3 , where the nominal oxidation state is Al(III), the 3s and 3p electrons have been largely ionized. Here the 2p electrons then become the highest occupied state of aluminum, and the conventional choice of valence would lead to a poor pseudopotential description. For elements toward the beginning of a new block of the periodic table, it is therefore necessary to modify the pseudopotential choice to allow for these semicore states. 2.2.2.5 Evaluating Pseudopotential Accuracy A good indicator as to whether semicore states need to be included is whether there is any significant overlap between the electron density of the valence and core electrons (see Fig. 2.2, which shows the case of Fe where there is significant overlap between the 4s/3d states and the underlying 3s/3p). There are two common methods for handling semicore states; either the electrons can be explicitly included in the calculation, or partial core corrections can be applied.24 Partial core corrections, also known as nonlinear core corrections, aim to correct for the fact that exchange-correlation potential depends on the total electron density and is therefore not readily separable into core and valence contributions if there is any overlap of the density between regions. To handle this, partial core corrections operate by including a smooth piece of frozen electron density that matches the exact core density down to a given radius and then tends smoothly to zero at the nucleus. This density is then added back during calculation of the exchange-correlation potential to capture the nonlinearity in the region of density overlap. Note that this extra density

METHODOLOGY

53

35 AE core charge AE valence charge PS core charge PS valence charge

30 25 20 15 10 5 0

0

0.5

1

1.5

2

2.5

3

Fig. 2.2 Electron density for an iron atom, showing the all-electron curve (core contribution in - - - and valence in – – -), the valence-only contribution from the pseudopotentialgenerated orbitals ( ), and the partial core correction density (– · –) as a function of radius (in a.u.). Note the overlap between the core and valence densities in the region between 0.2 and 0.7 a.u. that leads to the need for partial core correction.

is not included in the norm-conservation requirement of the pseudopotential. The choice of the radius for the partial core corrections is a compromise between being small enough to describe sufficient core electron density and large enough to minimize the computational work associated with evaluating accurately the exchange-correlation potential for the combined density. While for plane-wave methods the use of partial core corrections is often the preferable approach to semicore states since it reduces the size of the basis set significantly, for the SIESTA method the two approaches are similar in cost, and therefore the use of explicit semicore states may be favored. Having generated a new pseudopotential and inspected its properties visually to check that there are no untoward characteristics, the next important step is to test it by comparing the energies for changes in atomic state between the all-electron- and the pseudopotential-based calculation. Configurations for testing might usually include ionization from the various valence orbitals, as well as promoting electrons from one angular momentum to another. If the pseudopotential passes this examination, it is ready for validation in a full calculation of a molecule or solid. 2.2.3 Basis Sets

Numerical solution of the Kohn–Sham equations is performed by expanding the orbitals or bands in terms of a computationally convenient mathematical function: the basis set. The coefficients that determine how much these functions contribute

54

SIESTA: A LINEAR-SCALING METHOD FOR DFT

are found by applying the variational principle. As mentioned in the introduction, there are many possible choices that could be made for the basis set, although Gaussians25 have dominated the molecular community while plane waves have been the de facto standard for solid-state physics. In choosing the optimal basis set for large linear-scaling calculations, we are guided by the need for locality in real space and the requirement to minimize the number of basis functions needed to obtain reasonable numerical precision. Clearly, a physically motivated basis set that takes into account the shape of atomic orbitals will best satisfy the latter criterion. If pseudopotentials of the form described in the preceding section are employed, then neither existing Slater, or Gaussian, basis sets will be of the correct form, due to the modification of shape in the nuclear region. Taking the discussion above into account, it can be seen that the optimal compact basis set is to work with exact solutions to the pseudopotential form of the atomic problem, provided that they can be represented. Following the approach taken by other researchers, such as Becke and Dickson26 in the NUMOL code and Delley27 in DMol, the basis set can conveniently be represented by a numerical tabulation rather than a specific, but approximate, analytical form. In the SIESTA methodology, the standard choice of basis set is pseudoatomic A for atom A, which are tabulated on a logarithmic radial orbitals (PAOs), ϕnlm grid for each angular momentum and then multiplied by the appropriate spherical harmonics: A A ϕnlm (r, θ, ϕ) = Rnl (r)Ylm (θ, ϕ)

(2.2)

These PAOs can be determined conveniently during generation of the pseudopotential and represent a “perfect” basis set for describing the isolated atom. While the PAOs above decay rapidly with distance, as do other atomic-centered basis functions, they only tend asymptotically to zero at infinite radius. To achieve linear scaling it is necessary to impose on the Hamiltonian strict locality in real space. The most common approach to achieving this is to introduce a drop tolerance in some form and to neglect integrals when they fall below a certain magnitude. However, this is fundamentally unappealing since it corresponds to modifying the Hamiltonian being solved, although this may be a philosophical point rather than a practical difficulty. In the SIESTA methodology, an alternative approach is taken in which the basis functions are localized rather than modifying the Hamiltonian. Following the fireball concept of Sankey and Niklewski,28 the eigenfunctions of the pseudoatomic problem are found within the confines of a spherical boundary at which the potential becomes infinite. In this way, the tails of the PAOs are modified such that they go rigorously to zero at a given radius, as shown in Fig. 2.3. This radius, rc , can be selected to be different for each angular momentum. Radial confinement is clearly an approximation, but it allows a choice to be made readily between higher precision, corresponding to large rc , or greater computational efficiency as the radius decreases. Although there is the flexibility

METHODOLOGY

55

1.4

Wavefunction

1.2 1 0.8 0.6 0.4 0.2 0

0

1

2 Radius (a.u.)

3

(A) 0.1 0.09 0.08 Wavefunction

0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 2.5

3 Radius (a.u.)

3.5

(B)

Fig. 2.3 (A) Pseudoatomic orbitals (PAOs) for oxygen 2s, illustrating the shape for the ), hard confinement with an energy shift of 0.02 Ry (- - -), and unconfined orbital ( soft confinement with an energy shift of 0.02 Ry, a potential V0 of 50 Ry, and a radius of soft confinement commencement of 0.8 times the hard confinement radius (– · –). (B) Close-up of the region where the confined orbitals approach the cutoff radius of 3.2 a.u.

to choose an individual radius for each orbital in the valence of every atom, it is preferable to have a more systematic method for selecting radii. Choosing a single fixed radius of confinement for all atoms is obviously not a sensible approach, since atoms with different atomic radii will be affected to varying extents. Hence, the calculation would be biased toward the precise description

56

SIESTA: A LINEAR-SCALING METHOD FOR DFT

of light atoms. When an orbital is radially confined, its energy increases with respect to the free atom. Therefore, a natural concept to aid in the selection of appropriate radii is the energy shift. Here, a single energy value is specified for all atoms and the radius of confinement found that raises the energy of each orbital by this amount. Typically, energy shifts in the range 0.001 to 0.02 Ry (1 Ry = 0.5 Ha ∼ 13.6 eV) are useful depending on whether precision or speed is being sought, respectively. As with all approximations, it is important to test the consequences of a given choice for the specific property of interest before proceeding. Although the default energy shift–based scheme provides a good first estimate of the radii in many cases, there are alternative approaches to refining the truncation of the orbitals. 2.2.3.1 Soft Confinement In the default confinement scheme the orbital goes to zero at the cutoff radius. However, there is a discontinuity in the derivatives of the orbital, which can lead to difficulties during structural optimization and more acutely during phonon calculations. The solution to this problem is to use a potential that tends asymptotically to infinity in a smooth manner rather than applying a discontinuous hard-wall potential.29 The form of the potential currently used is

Vsoft (r) = V0

e−(rc −r)(r−rs ) rc − r

(2.3)

This introduces two new parameters that determine the shape of the basis set tail by determining the radius at which the potential begins, rs , and the magnitude, V0 . 2.2.3.2 Basis Set Enthalpy In a further alternative scheme, an external pressure, Pext , can be applied to the atomic orbitals. This leads to determination of the radii through the associated enthalpy by adding a Pext V term to the intrinsic energy, where V represents the volume of the confinement sphere.30 Under this scheme, the confinement radii now correspond to equal hardness among the basis functions, rather than energy perturbation.

Occasionally, it may be beneficial to intervene manually in the choice of radii. For example, in the case of negatively charged species such as the oxide ion, which is nominally O2− , the radii determined by typical energy shift values as being appropriate for a neutral oxygen atom may be too confined to allow a good description of the anion in an ionic crystal. Although the formulation of PAOs above provides a good starting point for a basis set, it is well known that increased variational freedom is required to allow the system to respond to the changes associated with chemical bonding, external fields, or other perturbations to the electronic structure. In the Gaussian community this is achieved through the use of multiple-zeta basis sets, where one or more Gaussians (usually, the outermost function) is decontracted from the

METHODOLOGY

57

Slater-type orbital to allow the effective atom size to respond to its environment. When working with a numerical representation of the valence orbitals on a radial grid, there is no equivalent means of creating distinct “zetas.” Indeed, there is the flexibility to choose any arbitrary partitioning of the valence orbital into multiple components. From experience it is known that the objective is to allow the outermost part of the radial function to vary independent of the inner part while maintaining the smoothness of the basis functions. In the current SIESTA methodology, the division of the radial function into multiple components is achieved using the split-norm concept. Here a second, or higher, radial function is designed to pos1ζ sess the same tail as the full valence orbital, ϕl , outside a split radius, rs , while inside this value it decays according to a polynomial to be zero at the nucleus: r(a1 − b1 r 2 ) r < rs 2ζ (2.4) ϕl (r) = 1ζ ϕl (r) r ≥ rs The polynomial coefficients are determined by matching the function and its derivative at the split radius. If this new function is subtracted from the original valence orbital, the result is a contracted basis function that goes to zero at the split radius. Motivated by similar arguments to the use of the energy shift, the split radius is usually chosen indirectly by specifying the norm of the valence state to be included in the outer function. Typically, an outer zeta should contain on the order of 15% of the total norm. For hydrogen, in a double-zeta basis set, a value closer to 50% can prove more effective, given that the variation in effective size between a neutral hydrogen atom and a proton-like state can be particularly extreme. Conversely, very small values for the split norm can represent a poor choice since their effect is negligible and can lead to linear-dependence issues in the basis set. There are several things to note regarding the choice of the split-norm approach to increasing the radial variational freedom of the basis set. As already pointed out, this is just one possible choice and there are many other possible approaches. In the all-electron numerical methodology of Delley,27 an alternative strategy is employed in which the basis functions for charged atomic states are used for the additional radial functions to describe more contracted environments. Alternatively, one could use extra Gaussian functions to mimic a standard multiple-zeta basis set from conventional molecular quantum mechanics.31 A strength of the split-norm approach is that the operation can be applied as many times as desired to create a basis set of arbitrary size in a systematic fashion. Usually, a doubleor triple-zeta basis is sufficient unless trying to achieve plane-wave levels of numerical convergence. We should note that the use of terms double zeta (DZ) and triple zeta (TZ) is a matter of conforming to the nomenclature that has arisen in the Gaussian community, although strictly speaking it is incorrect since there are no “zetas” (i.e., Gaussian exponents) in the present approach. In the terminology of Delley, the basis sets are referred to more correctly as double numeric (DN), triple numeric (TN), and so on.

58

SIESTA: A LINEAR-SCALING METHOD FOR DFT

It may be questioned whether an approach that allows atoms to adopt a smaller effective radius, but not a larger one, is always sufficient. The answer is usually in the affirmative. If the minimal basis set is constructed for the neutral atom, when an atom is placed in a crystal, or even in a molecule, the rate of decay of the valence states will usually be increased by Pauli repulsion due to the neighboring atoms. Hence, a shorter-range basis set is generally appropriate, although with some exceptions. Although the split-norm approach provides increased radial variational freedom, there is also the need to consider angular augmentation of the basis set. For example, a minimal basis set for hydrogen would only include the 1s orbital, but the moment an external field is applied, or the hydrogen forms a covalent bond to another atom, there is a need to describe asymmetric contributions to the electronic structure about the hydrogen nucleus. Therefore, it is necessary to include basis functions of higher angular momentum than those from the occupied valence states alone, and these are known as polarization functions. Typically, functions with a value of the angular momentum quantum number, l , one higher than that of the highest occupied state, are needed as a minimum requirement for a reliable description of the electronic structure (i.e., 2p for H, 3d for C, 4f for Fe, etc). Hence, the default basis set, and minimum recommended quality, for SIESTA would be double-zeta polarized (DZP). Although some special cases, such as bulk silicon, are relatively well described with a minimal basis set, these are the exceptions rather than the rule. The key question with polarization functions is how to obtain the radial form of these basis functions. Unfortunately, the excited states of the pseudopotential atomic problem tend to be either rather extended in space, or even unbound, and therefore taking the hard confined unoccupied orbitals, as basis functions can often be unsatisfactory. In an attempt to circumvent this problem, the default method for the generation of polarization functions uses perturbation theory. By applying an electric field to the atomic problem, states of higher angular momentum are created, and these are taken as the polarization functions. The choice of good polarization functions is the most difficult part of the basis set creation and is often responsible for lower-quality results, as can be demonstrated in an example. If we consider the comparison of results for the molecule SO2 , as obtained using the default DZP basis set in SIESTA and from the use of the same density functional with a range of standard Gaussian basis sets, it can be seen that there is some discrepancy (Table 2.2). If instead of using the default polarization functions, the shape of the radial part of this basis set is tuned by using a soft-confinement potential to lower the energy of the system variationally, a significant improvement is achieved. Indeed, the results for the DZP basis set are now very close to those for the equivalent Gaussian basis set. While default basis sets can be generated within the SIESTA methodology, according to the energy shift, split-norm, and perturbative polarization function schemes described above, there is also a possibility for the user to control the

METHODOLOGY

59

TABLE 2.2 Comparison of Optimized Structural Parameters for the Molecule SO2 with the PBE Functional as a Functional of Basis Set Quality Basis Set STO-6G 6-31G 6-311G 6-31G* 6-311G* DZP (standard/0.01 Ry) DZP (optimized polarization)

˚ r(S–O) (A)

∠(O–S–O) (deg)

1.628 1.634 1.630 1.483 1.477 1.509 1.482

107.40 114.67 114.66 119.34 119.04 118.71 119.34

basis fully. Accordingly, there are methods to tune to the basis set performance in a number of ways. 2.2.3.3 Charge State By default the basis set is generated for the reference state used in pseudopotential generation. However, a charge on a species can also be specified during basis set creation. Here a positive charge will lead to more contracted basis functions, while a small negative charge will result in more diffuse PAOs. Note that a large negative charge would not be sensible since species become formally unbound. 2.2.3.4 Variational Optimization The experience of other communities that have adapted molecular basis sets to the solid state shows that optimization of the basis set parameters with respect to the total energy of a target material can improve the results substantially.32 Although compromising the transferability, it allows the best results to be obtained for a particular problem while maintaining a low prefactor for the computational cost.

As with all numerical approximations, it is important to test the influence of basis set quality before embarking on any scientific study. While DZP should be adequate to obtain at least qualitatively correct results for most problems, this should not be assumed a priori for a new class of problem. It is also important to consider the consequences of radial confinement for the study to be undertaken. For example, if considering the decay properties of the electronic states of a surface into vacuum, by construction the answer will be in error unless steps are taken to rectify this.33 The present method will also share much of the cautionary advice common to all localized, atomic-centered basis sets, including basis set superposition error (BSSE) and the need for floating functions when describing states that involve electron density in a region away from atomic centers (e.g., a defect such as an F-center). BSSE can be a particular issue, since the overlap of basis functions from different atoms allows the radial confinement to be released, thereby artificially inflating the binding energy even more than usual. Therefore, when considering molecular adsorption, particularly if it is weak, it is essential to work with a low value for the energy shift and to apply a counterpoise correction34 to the final result in order to extract a meaningful binding energy.

60

SIESTA: A LINEAR-SCALING METHOD FOR DFT

2.2.4 Construction of the Kohn–Sham Equations

Once the basis set is defined, it is then possible to define the Kohn–Sham equations for the system of interest (see Section 1.3). Note that because the basis set is nonorthogonal, the overlap matrix must also be computed, in addition to the Hamiltonian. Although the average user of the SIESTA methodology need not understand all the details of how the elements of the Hamiltonian and overlap matrices are computed, it is essential to possess some appreciation of the underlying concepts and the numerical approximations that influence calculation quality. In considering the construction of the Kohn–Sham equations, it is possible to break the problem down into several components:

• • • • • •

Overlap matrix elements between basis functions Kinetic energy of basis functions Nonlocal contribution of the pseudopotential (confined to core region) Local contribution of the pseudopotential (long-range) Hartree potential (mean-field Coulomb interaction of electrons) Exchange-correlation contribution; either LDA or GGA

As emphasized previously, the key is to evaluate the terms in a manner that is linear scaling and efficient. The components naturally break down into two different classes of integral to be evaluated: those that depend on the basis functions only, and those that depend on the electron density or are potentially long-range. Considering first the overlap matrix elements, kinetic energy matrix elements, and the nonlocal contribution of the pseudopotential, these are all strictly local in real space, due to the finite range of the basis set. The first two terms depend on pairs of overlapping orbitals, and therefore the range is at most twice the largest orbital cutoff radius for any species. In the case of the nonlocal pseudopotential projectors, these give rise to matrix elements between the atomic center associated with the pseudopotential and the basis functions of up to two neighboring atoms. Hence, the range is slightly greater, spanning twice the largest orbital cutoff radius, plus twice the largest core radius for any pseudopotential. However, the range of interaction is still readily predefined. Evaluation of these two- or threecenter integrals can be performed readily by use of a Fourier expansion (see the original papers for full details8,28 ). The key point is that these integrals are performed with a default reciprocal space cutoff of 2000 Ry, which is sufficient to ensure that they are numerically well converged in all but the most extreme circumstances. Furthermore, the cost of these matrix elements is usually a minor part of the total computing time of any calculation. Therefore, the user need not be particularly concerned with the evaluation of these contributions to the Hamiltonian and overlap matrix. The remaining contributions to the potential and energy are more complex than the terms above since they involve the electron density rather acting directly on the basis functions. The electron density is, of course, expanded in terms of

METHODOLOGY

61

the basis functions: ρ(r) =

μν

ρμν =

ρμν φ∗ν (r)φμ (r)

i

BZ

cμi (k)oi (k)ciν (k)eik(rν −rμ ) dk

(2.5) (2.6)

where the coefficients are stored as the density matrix elements, ρμν . Here integration over the Brillouin zone is explicitly included and oi (k) represents the occupancy of eigenstate i at a given point in reciprocal space. If evaluated simplistically, this would make the Coulomb interaction between two points of electron density a long-range interaction that scales as the fourth power of the number of basis functions. Fortunately, this is less problematic than it appears for two reasons. First, the contribution due to the local part of the pseudopotential is of opposite sign to the interaction with the electron density. For a charge-neutral system, these two contributions cancel in the long-range limit, so the Coulomb interaction is ultimately screened. Second, the use of an auxiliary basis set to represent the electron density is well known to reduce the scaling problem and improve computational efficiency.2 Many different choices could be made to converge Coulomb sums efficiently, such as fast multipole methods,35 and to represent the electron density in an auxiliary basis set. In the SIESTA methodology, the choice was made to represent the electron density on a uniform Cartesian grid of points in real space. This decision can be justified for a number of reasons. First, unlikely in some localized basis sets, there is no natural representation to choose for the density expansion; although the basis functions themselves have some of the correct properties, it is difficult to extend the minimal set to ensure an accurate representation of the density at all points. A Cartesian grid is systematic and basis set shape independent; as the fineness of the grid increases, the aliasing error should decrease, as all Fourier components become representable. Second, the construction of the electron density is rigorously linear scaling. As shown in Fig. 2.4, only basis functions within the maximum cutoff radius can contribute to the electron density at a given grid point, and therefore the cost per point does not depend on the overall system size. Third, calculation of the exchange-correlation contribution for both LDA and GGA becomes a trivial summation over grid points. In the case of GGAs, calculation of the gradient of the density is facilitated by the use of a finite difference expansion36 over the neighboring grid points (and equally important, the additional contribution to the potential from the GGA is straightforward to determine in the same way). Once the total electron density on the grid points is known, it is possible to begin computation of the electrostatic potential, consisting of the electron–electron interaction (Hartree potential) and the electron–local component of the pseudopotential interaction. We note that the Hartree term is based on the interaction between the electron density at all points to give a single orbital-independent potential and therefore contains the self-interaction of an

62

SIESTA: A LINEAR-SCALING METHOD FOR DFT

Fig. 2.4 Calculation of the density based on two orbitals (large circles) on an underlying Cartesian mesh. Here the density contribution would only be nonzero at the mesh points (small circles).

electron with its own density, as is the norm within standard Kohn–Sham theory. Rather than working directly with the total electron density, it is advantageous to divide the electrostatic contributions into two parts: the neutral contribution and the deformation density. The electron density of the neutral atoms can readily be computed on the grid and subtracted from the total electron density to leave the deformation density. The neutral atom density can then be added to the local part of the pseudopotential to yield a potential that goes strictly to zero at the outermost core radius. Being local, the electrostatic contribution of the neutral atoms is readily computed. Having determined the deformation density on a uniform grid, δρ, the calculation of electrostatic potential due to this quantity, δVH , can be made through solution of Poisson’s equation: δρ(r) = ρtot (r) − ρNA (r) = −

1 2 ∇ δVH (r) 4π

(2.7)

At present, SIESTA solves for the potential through the use of a fast Fourier transform (FFT), as many efficient libraries are available to perform this task. Although this approach is not actually linearly scaling (N ln N ), the relative low scaling, combined with the efficiency of the method, ensures that the contribution to the computational cost is negligible and therefore the deviation from linear scaling due to this contribution has yet to be observed. Arguably a more significant drawback of the use of FFTs, with practical consequences for the user, is the requirement that all systems must have threedimensional periodic boundary conditions. In the implementation of the SIESTA method, all systems are automatically enclosed within a periodic cell, regardless

METHODOLOGY

63

of whether it is a molecule, a polymer, a surface, or a solid. For cases where there is no natural periodicity, the fictitious cell parameter(s) is chosen so as to ensure that there is no overlap between the basis functions of images. Although this guarantees that there are no direct matrix elements between periodic repeats, there is a potential for interaction via electrostatic terms. Consequently, for systems with a strong dipole or higher-order moment, it is recommended that the explicit convergence with respect to cell size be tested. Unlike plane-wave methods, the cost of including a large region of vacuum is generally small since there is no change in the basis set associated with this, and the only computational cost lies in the Fourier transform step to compute the potential. Hence, it is usually straightforward and inexpensive to ensure that the interaction between periodic images is negligible. An alternative to the use of fast Fourier transforms is to employ multigrid methods to solve the problem.37,38 This has the advantage of being linear scaling and can be adapted to any set of boundary conditions that are required. Although it has been explored in conjunction with the SIESTA method,39 the absolute performance remains slower than the use of FFTs, so it has not yet been adopted within the distributed implementation. Once the potential due to the deformation density is determined, by either FFTs or multigrid, the contribution to the energy from this term can be calculated by summing the product of this potential with the total electron density across the mesh. Having discussed the background to the evaluation of the electron density–oriented contributions to the Hamiltonian, it remains to consider the practical consequences for the use of the methodology. The most significant point is that there will always be a numerical error in the integral of quantities involving the electron density. While the description of the electron density at the grid points is correct, the integration between adjacent points is approximate. As the grid spacing is reduced, the numerical integration becomes more precise. Rather than specifying the grid spacing directly, the fineness is controlled by a kinetic energy value, known as the mesh cutoff , for the highest-energy Fourier component that can be represented. For periodic systems, the grid spacings allowed are constrained by the requirement to be commensurate with the unit cell, so the nearest mesh cutoff above the target specified is chosen. Typical mesh cutoffs are between 80 and 400 Ry, although higher values may be required for very precise calculations. Ultimately, the value required will depend on the pseudopotentials present or basis set shape and must be tested for convergence behavior. Note that the use of partial core corrections often necessitates the use of higher mesh cutoffs, due to the larger total electron density to be integrated. The practical consequence of the numerical integration error above is that there will be a small breaking of translational invariance (i.e., the energy of a system will change slightly according to its absolute Cartesian position relative to the underlying mesh). This is referred to as space rippling or the “egg-box” effect. In addition to affecting the energy, this will also lead to numerical deviations in the

64

SIESTA: A LINEAR-SCALING METHOD FOR DFT

forces. As a result, there can be slight symmetry breaking of structures or convergence slowdown during geometry optimization if the mesh cutoff is too low. It should be noted that this issue is common to most methods that use non-atomcentered basis (or auxiliary basis) sets, although it can be hidden through explicit symmetry constraints, or reduced through the use of softer pseudopotentials/basis function shapes. A number of practical schemes to reduce the influence of the “egg box” have evolved. Obviously, increasing the mesh cutoff is one, but since the mesh dominates the computational expense for small to moderately sized systems, this is not the ideal solution. A more efficient technique is referred to as grid-cell sampling. Imagine an isolated atom being displaced relative to the underlying grid. The energy of the system will vary with the periodicity of the grid and may exhibit a behavior that to first order resembles either a simple sine or cosine wave (see Fig. 2.5). If this were the case, the energy and forces could be evaluated for two positions displaced by half of a grid spacing relative to each other and then averaged. The result would then be invariant to absolute position. While the situation for molecules and solids is more complex, with many Fourier components, the averaging over several displacements with respect to the grid points can lead to a reduction of the numerical error in the forces. This is the grid-cell sampling technique. On the face of it, this may not appear to represent a computational saving over increasing the mesh cutoff, since multiple energy/force evaluations appear to be required. However, it transpires that the breaking of translational invariance is much more significant for the forces than for the potential. Consequently, the self-consistent field procedure (see Section 2.2.5) can be performed for a single mesh position and then only the force evaluation need be conducted

–939.67

Energy (eV)

–939.675

–939.68

–939.685

–939.69 0

0.2

0.4 0.6 Fraction of mesh spacing

0.8

1

Fig. 2.5 Egg-box effect for a Ne atom with a DZP basis set and an energy shift of 0.01 Ry. The total energy is plotted as a function of atom position relative to the underlying ), mesh in fractions of the mesh spacing. The curves shown are for a cutoff of 150 Ry ( 250 Ry ( ), 450 Ry ( ), and 250 Ry with a two-point grid cell sampling ( ).

METHODOLOGY

65

for multiple grid positions, thereby representing a considerable efficiency gain. The validity of this approximation can be seen in Fig. 2.5, where the grid-cell sampling correction largely removes the oscillation for a single atom. There are several further methodologies for the reduction of space-rippling effects. For example, the basis functions and pseudopotentials can be explicitly Fourier filtered to reduce the components beyond the mesh cutoff.40 Although this guarantees almost no invariance breaking for an isolated atom, it is difficult to limit the Fourier components that arise from combinations of basis functions from different atoms when they overlap. Ultimately, the only way to ensure that translational invariance is obeyed exactly is to use atom-centered integration grids, such as the radial grid techniques that have been employed for numerical basis sets.41 In such cases it is necessary to include the derivatives associated with the movement of the integration grid and the change of weights; terms that are often neglected for simplicity in some implementations, although there can also be numerical benefits to considering the grid to be fixed in some cases. So far we have focused on the requirements to achieve linear scaling in the CPU time cost of a calculation. However, for a scheme to be useful it is also necessary for the memory usage of an algorithm to increase linearly while being small in absolute size; otherwise, this will become the bottleneck that prevents large-scale calculations from being performed. The memory usage of a SIESTA calculation can be dominated by one of two things. First, there is the storage of the matrices used in the construction of the Kohn–Sham equations and subsequent quantities, which consists of the Hamiltonian, overlap, density, and energy-density matrices. Second, storage of the nonzero orbital values at the mesh points can represent a large amount of data, especially for high mesh cutoffs, and is often the dominant memory use. Other mesh-related quantities are typically much smaller since there can be several tens of orbitals that contribute to each mesh point in a dense solid, whereas other arrays involve just one number per grid point. In cases where the storage of the orbitals on the grid becomes a limiting factor, there is a direct-phi algorithm in which orbital values are recomputed on the fly (analogous to the direct SCF concept in Gaussian methods, but for different quantities). This approach greatly reduces memory usage at the expense of additional computational cost. The key to reducing the memory usage to linear scaling is to recognize that the Hamiltonian and overlap matrices are both sparse, due to the finite basis set range. Indeed, the number of nonzero elements per row or column remains fixed as the system size increases once the dimensions of the problem exceed the maximum interaction range. To exploit this, all matrices are stored in compressed row storage format, which is a standard technique for storing just the nonzero elements of a sparse array, at the cost of storing two extra integer pointer arrays to allow mapping of the stored elements to the dense matrix representation. To reduce this overhead, the overlap matrix is presently treated as possessing the same sparsity pattern as the Hamiltonian, even though it actually has a greater number of null elements. Along similar lines, the approximation is made that the density matrix obeys the same sparsity pattern as the Hamiltonian. Although the

66

SIESTA: A LINEAR-SCALING METHOD FOR DFT

density matrix is not physically constrained to be zero where the Hamiltonian is, the matrix elements that match the nonzero terms in the Hamiltonian capture the contributions that are important for the total energy. 2.2.5 Solving the Kohn–Sham Equations

Once the Hamiltonian and overlap matrices have been constructed, the next key step in any calculation is to solve for the new density matrix and then to iterate to self-consistency. The traditional approach to this problem has been to use matrix diagonalization to determine the Kohn–Sham eigenstates and then to use the coefficients of the basis functions to construct the next density matrix in the iterative sequence. This approach has the benefit of being able to determine both the occupied and unoccupied Kohn–Sham eigenstates, making it possible to compute properties such as the bandgap and densities of states. We note, of course, that these quantities should be interpreted with care since the Kohn–Sham wavefunctions do not represent true one-electron eigenstates as a result of selfinteraction error. For periodic systems it is necessary to integrate all observables across the Brillouin zone. This is usually approximated by a sum over discrete points in reciprocal space, and most commonly a uniform grid of k -points is chosen according to the scheme of Monkhorst and Pack.42 In the case of small unit cells it is necessary to take the same approach within the SIESTA methodology. One specific feature of the actual implementation is the standard method of choosing the grid size. Here a quantity called the K-grid cutoff can be chosen as a single value with units of distance. This methodology, due to Moreno and Soler,43 exploits the relationship between reciprocal space sampling on a grid of k -points and the equivalent sampling through the use of supercells (e.g., a 2 × 2 × 2 grid of k -points allows the same phase factors to be sampled as creating a 2 × 2 × 2 supercell in real space). By specifying the real space supercell length that is desired, the equivalent reciprocal space sampling for a single cell can be determined. Through the use of a single control value it is possible to try to achieve consistent convergence across a range of different systems, provided that the bandgap and dispersion are similar. Of course, to be certain, the user must always check the convergence for each system. The SIESTA methodology is designed to target large systems containing several hundreds to thousands of atoms. Thus, by the time such dimensions are reached, it is often a good approximation to consider only the Brillouin zone center (gamma point) for sampling purposes. This greatly simplifies the calculation and leads to a dramatic increase in computational speed since the Hamiltonian and overlap matrices become real rather than complex. Hence, from this point onward the assumption will be made that the integration over the Brillouin zone can be dropped and the system will be treated at the gamma point only. Since there are many efficient machine-optimized libraries for dense matrix diagonalization, usually based on the LAPACK and BLAS routines, this approach can be highly competitive up to relatively large system sizes. However, the problem of cubic scaling and the need to work typically with dense matrices

METHODOLOGY

67

ultimately dominates the computational cost. As a result, there has been considerable research over the last two decades into alternative techniques to determine the density matrix during self-consistency.44,45 Although improvements can be made to the diagonalization approach, such as solving for only the occupied states and iterative techniques for sparse matrices,46 there is a need for more radical alternatives to achieve linear scaling. The major difficulty when working with a localized atomic orbital basis set is the need to solve the generalized eigenvalue problem: H = εS

(2.8)

which involves first transforming the problem to a standard eigenvalue equation: H = ε

(2.9)

To do this implies the multiplication of the Hamiltonian by the effective inverse of the overlap matrix, which is often achieved indirectly through the use of Cholesky decomposition. Although both the Hamiltonian and overlap matrices may be very sparse, the difficulty is that the inverse of the overlap matrix is potentially much less sparse or even dense. While reordering techniques can reduce the degree of potential fill-in that occurs,47 and other factorization schemes48 may improve the level of sparsity of an effective inverted overlap matrix, the main challenge remains how to handle the nonorthogonality of the basis set while achieving linear scaling. One of the first linear-scaling methods to be proposed was the divide-andconquer method of Yang.49 The principle of the approach is to reduce the total set of Kohn–Sham equations into a series of smaller overlapping subproblems from which the overall electron density could be constructed. For example, a partition could be created centered on each atom of the system whereby all Hamiltonian and overlap matrix elements within a cutoff distance are collected and solved using diagonalization. Provided that the cutoff radius is much smaller than the total system size, the cost of each separate diagonalization is much less than that for solving for the whole system together, and will be independent of the number of atoms for the entire problem. Hence, linear scaling is achieved while retaining the use of efficient matrix diagonalization for small problems. The remaining issue is how to reconstruct the total density from the sum of the subproblems, since the same contribution will appear in many different partitions. While first formulated in terms of the electron density itself, the divide-and-conquer scheme was later also cast in terms of the coefficients of a density matrix,50 which is more appropriate here. Accordingly, the overlapping contributions can be partitioned as follows: ρμν =

α

α α Pμν Pμν

(2.10)

68

SIESTA: A LINEAR-SCALING METHOD FOR DFT

α Pμν =

⎧ ⎪ ⎨1 1 ⎪2

⎩0

μ ∈ α, ν ∈ α μ ∈ α, ν ∈ / α or μ ∈ / α, ν ∈ α μ∈ / α, ν ∈ /α

(2.11)

where α represents a partition label. The density matrix divide-and-conquer approach above has recently been implemented in SIESTA and shown to be an effective linear-scaling solution.51 Divide and conquer, as described above, is a simple and appealing approach to achieving linear scaling and has found considerable favor in some communities.52 However, it is important to recognize the limitations. First, for reasons of simplicity, the division of the Hamiltonian into submatrices is usually made based on a distance cutoff. However, decay lengths for matrix elements and the density matrix in different systems can vary substantially according to the nature of the bandgap, atoms involved, and so on. Therefore, truncation methods that are more adaptable to the physical problem are arguably superior. Second, the prefactor for the divide-and-conquer method is relatively high because a large amount of duplicate work is being performed (i.e., the same density matrix element is being computed many times over as a result of partition overlap). Third, all the subsystems are connected by the requirement that the Fermi energy must be globally the same; otherwise, electron density would flow from one partition to another until the chemical potential was equalized. Hence, once the submatrices have been diagonalized to obtain the local eigenspectrum, the population of the states cannot be determined without knowledge of the eigenvalues for all partitions simultaneously. Consequently, either the eigenvalues and eigenvectors for all subsystems must be stored, which represents a large amount of memory, or multiple diagonalizations must be performed for each partition, thus further raising the prefactor. Because of the issues described above relating to divide and conquer, especially the second factor, there has been a search for more efficient algorithms that act on a single sparse density matrix. All methods involve dropping negligible contributions to the density matrix in one way or another, and are generally applicable to materials with a HOMO/LUMO or bandgap. Within this there are two general classes of method: those that impose truncation on the density matrix and those that invoke localization of the wavefunction, similar to divide and conquer. Considering first the former class of methods, they recognize that the density matrix can be used directly without recourse to the Kohn–Sham wavefunctions. However, in doing so, the conditions of N-representability must be observed (i.e., the density matrix must be derivable from an underlying antisymmetric N particle wavefunction).53 For an orthonormal basis set, the density matrix must therefore obey the following conditions:

• • •

Symmetry. D = D T , where D is the density matrix and D T is its transpose. Trace. Tr(D) = Ne , where Tr represents the trace of a matrix and Ne is the number of electrons. Idempotency. D 2 = D, since eigenvalues are either 0 or 1.

METHODOLOGY

69

Given these constraints, a trial density matrix can be converged to an approximation to the true density matrix by one invoking one of two broad classes of approach. In the first class, purification formulas are used to iteratively transform

an approximate density matrix, D, into one that is more nearly idempotent, D. The most widely known purification transformation is that due to McWeeny54 :

= 3D 2 − 2D 3 D

(2.12)

although this has recently been generalized to higher orders by Niklasson.55 The second class of density matrix–based methods involve minimization of an energy functional of the trial matrix, subject to the constraints above, based on the Hamiltonian. One of the best known examples is the method of Li et al.,56 with further refinements by other groups.57,58 All of the techniques above are valuable approaches to linear-scaling generation of the density matrix. However, they perform optimally for a basis set that is orthonormal. For a localized atom-centered basis there is the extra complexity of transforming the Hamiltonian or carrying the effective inverse of the overlap matrix through the formulas. For this reason, the SIESTA methodology currently employs a different class of method that focuses on the localization of the wavefunction. It is possible to perform a unitary transformation of a set of extended wavefunctions into a localized set of states known as Wannier functions. Although this is a nonunique transformation, there are well-developed approaches for this process, such as maximally localized Wannier functions.59 It should also be noted that when discussing the locality of these Wannier functions, this usually implies an exponential decay rather than strict confinement. The culmination of several developments led to the Kim–Mauri–Galli (KMG)60 order-N functional for linear-scaling construction of the Wannier functions, and thereby the density matrix. This represents the default approach for achieving true linear scaling within SIESTA. Here the Wannier functions are forced to be strictly local through the use of a cutoff radius, so the approach has much in common with the philosophy of the density matrix divide-and-conquer method, but avoids the duplicate generation of matrix elements. Each atomic center carries a number of localized Wannier functions (LWFs), such that the total number of localized states exceeds the number of occupied states. The number assigned to a given atom is specified by (Ne + 2)/2 for KMG. Within the KMG method, the orbital coefficients within the localized states are determined by minimization of a functional that depends on the Hamiltonian and overlap matrix, as well as the chemical potential, μ, of the electrons: UKMG = 2

(2δij − Sij )(Hij − μSij )

(2.13)

ij

Here the use of the distinct subscripts i and j indicate that the Hamiltonian and overlap matrices have been transformed to the basis of the localized Wannier functions according to the coefficients of the orbital basis set within the LWFs.

70

SIESTA: A LINEAR-SCALING METHOD FOR DFT

The conceptual key to achieving linear scaling is that this expression avoids the need for explicit orthogonalization, but instead, imposes an energy penalty for the deviation from orthogonality [the first term in parentheses in expression (2.13) represents a truncated polynomial expansion of the inverse of the overlap matrix]. During the minimization the localized states therefore gradually become orthonormal until this condition is met at convergence. It is important to note that this minimization is an extra iterative step that lies within each self-consistent field (SCF) cycle. The greatest challenge within the KMG approach is the determination of the chemical potential, which represents the Fermi energy of the system. Because there is no determination of eigenstates in this method, the Fermi energy is not computed directly, although techniques exist to evaluate subsets of the eigenvalue spectrum of a matrix at a considerably lower cost than full diagonalization. However, this extra calculation is generally undesirable and would have to be repeated at every step of the self-consistent field procedure, since the Fermi energy changes as a function of the density matrix. In the KMG method, the chemical potential need not be exactly equal to the true Fermi energy; it must just lie above the top of the valence band/HOMO and below the conduction band/LUMO. For an insulator, or even many semiconducting materials, the bandgap is sufficiently large and the Fermi energy is known to be in the vicinity of zero, such that it is possible to “guess” a value of the chemical potential that satisfies this requirement. Alternatively, a trial-and-error approach can be used. If the chemical potential is set too low, the number of electrons in the system will lie below the actual number, while if it is set too high, the converse will be true. Should the value lie within a band, the minimization procedure can diverge, again providing an indication that the value chosen is not suitable. Where it can be afforded, a practical scheme that avoids the difficulty in setting the chemical potential correctly is the following. First, a small number of iterations of diagonalization are performed to obtain a good approximation to the density matrix, and the Fermi energy can be determined, as well as being seen to be stable. Having written out the unconverged density matrix, the calculation can then be restarted to use the KMG scheme, taking the Fermi energy from this calculation. Although the first step may represent a considerable initial overhead for the initial geometry, the cost rapidly becomes insignificant if an extensive geometry optimization or molecular dynamics simulation is subsequently to be run. Let us now consider the convergence behavior of the minimization of the KMG functional, assuming that the chemical potential has been chosen correctly to lie within the correct energy window. In Table 2.3 the number of iterations required to achieve minimization of the KMG functional is quoted for the simple case of bulk silicon. There are several trends to note in the behavior. First, the initial minimization of the orbital coefficients within the LWFs is very slow to converge and can take over 1000 iterations. This is because the initial guess for the localized states involves the use of random coefficients to avoid artificially biasing the symmetry of the solutions. Minimization uses conjugate gradients, and therefore

METHODOLOGY

71

TABLE 2.3 Number of Iterations Required to Converge the Localized Wannier Functions at Each of the First Five SCF Iterations and the Total Number of SCF Iterations Required for Convergence for Bulk Sia RcLWF (bohr)

Iter. 1

Iter. 2

Iter. 3

Iter. 4

Iter. 5

No. of SCF Cycles

6 8 10 12 14 16

502 902 1202 902 1502 902

16 171 302 302 302 302

6 30 302 5 7 1

6 18 100 5 7 1

6 10 6 7 5 3

7 12 10 13 9 8

a

The basis set and parameters are as in Fig. 2.6.

convergence is naturally slow. Attempts at using more sophisticated minimization algorithms have, however, generally proved no more effective. Second, subsequent SCF cycles require progressively fewer minimization steps since the LWFs from the previous cycle are reused and the number of iterations drops rapidly to less than 10. Third, the number of iterations required can decrease as the radius of confinement for the LWF (RcLWF) increases, especially for the later SCF cycles. Consequently, a more accurate calculation can actually be as fast overall, so the use of very small radii to confine the LWFs is not advisable. The variation of calculation quality as a function of the radius used for the localized states is illustrated in Fig. 2.6 for the case of bulk silicon. As can be 0.075

Percentage error

0.05 0.025 0 –0.025 –0.05 –0.075 8

10

12 RcLWF (Bohr)

14

16

Fig. 2.6 Percentage error in the total energy ( ) and optimized lattice parameter ) as a function of the localization radius for bulk silicon. Calculations are based on ( a 3 × 3 × 3 supercell containing 216 atoms for a SZ basis set and an energy shift of 0.01 Ry. The mesh cutoff is 250 Ry and the converged reference is for diagonalization using the gamma point only. The converged values for the total energy per atom and single-cell ˚ respectively. lattice parameter are −106.98172 eV and 5.541 A,

72

SIESTA: A LINEAR-SCALING METHOD FOR DFT

seen, sensitivity to the localization radius varies according to the property being studied. While the energy converges to within an acceptable error (i.e., less than ambient thermal energy) relatively quickly, the error in lattice parameter is slightly larger, and the curvature-related properties, such as bulk modulus, greater still. Of course, the rate of convergence is also dependent on the bandgap, which influences the decay of the states, and therefore testing the influence of this approximation is important for each material of interest. Before concluding the topic of solving the Kohn–Sham equations, it is worth briefly mentioning two topics that are common to all numerical implementations: spin and SCF convergence acceleration. For the case where diagonalization is used to achieve self-consistency, the SIESTA code allows the user to include spin polarization where either the total spin may be fixed or the electrons allowed to flow between spin states to attain a common Fermi energy. In addition, there is the option to use noncollinear spin to describe spiral magnetic states.61 If using a linear-scaling solver, in particular the KMG form, the options for treatment of spin are more limited. Spin polarization is still allowed, but control of the spin state is achieved via the specification of two separate values of the chemical potential for alpha and beta spin. Turning to the second topic, there are a number of methods for assisting the convergence of the self-consistent field procedure that might otherwise diverge or require a larger number of iterations. The simplest technique is static mixing, which may be applied to either the Hamiltonian or the density matrix, but is applied more conventionally to the latter. Here the density matrix for a new iteration is taken to be a combination of the old density matrix with the undamped result of the current solution step, i (either diagonalization or order N ), in a proportion controlled by the mixing parameter, α: i+1 i i Din = αDout + (1 − α)Din

(2.14)

Typically, values of the mixing parameter in the range 0.05 to 0.35 are used, where a small value is used for a poorly convergent system, while the larger value is appropriate for a wide-gap material. If too large a value is used, there is a risk that the SCF procedure may start to oscillate. Even in cases that are intrinsically convergent, the iterative process may take numerous cycles to converge as a result of the damped mixing, so there are acceleration techniques to deal with this. SIESTA has the option to use either Pulay mixing62 or the Broyden–Vanderbilt–Louie–Johnson scheme,63 both of which store information from previous iterations, such as the density matrix, and then extrapolate forward. These methods can reduce the number of iterations considerably, though as a caution it should be noted that they could also prevent convergence in some problematic cases. Although there are numerous other convergence techniques, such as level shifting,64 dynamic mixing, and exponential transformation,65 these have yet to be combined with the SIESTA implementation but may be available in the future.

REFERENCES

73

2.3 FUTURE PERSPECTIVES

This chapter has sought to present a perspective on the key background aspects of the SIESTA methodology that will be of value to a new user of the technique. A complementary chapter in this volume (Chapter 11) highlights some applications of the SIESTA approach that are possible, with a focus on the area of nanoscience. Unlike other mature computational methods, the SIESTA methodology could be considered an evolving approach that may develop further in the future as we learn about the optimal methods for creating numerical basis sets in particular. In addition, implementation in the SIESTA code will develop in response to new trends and advances in the field of density functional theory, where this is compatible with linear scaling. For example, there is no reason why the method cannot be extended to encompass Hartree–Fock exchange, hybrid functionals, and localized post-HF correlation methods, as has been the case for other solid-state codes. Acknowledgments

The author would like to express his grateful thanks to all those who have been involved in the development of the SIESTA methodology and software, whose hard work and inspiration the present chapter draws on significantly, while stressing that any opinions expressed are personal ones. The Australian Research Council is also thanked for support through the Discovery Program and for an Australian Professorial Fellowship.

REFERENCES 1. 2. 3. 4. 5.

6. 7. 8. 9. 10. 11.

Hohenberg, P.; Kohn, W. Phys. Rev . 1964, 136 , B864. Dunlap, B. I.; Connolly, J. W. D.; Sabin, J. R. J. Chem. Phys. 1979, 71 , 3396. Kohn, W. Phys. Rev. Lett. 1996, 76 , 3168. Artacho, E.; S´anchez-Portal, D.; Ordejo´n, P.; Garc`ıa, A.; Soler, J. M. Phys. Status Solidi (b) 1999, 215 , 809. Bock, N.; Challacombe, M.; Chee-Kwan, G.; Henkleman, G.; Nemeth, K.; Niklasson, A.-M.-N.; Odell, A.; Schwegler, E.; Tymczak, C.-J.; Weber, V. Los Alamos National Laboratory (LA-CC 01-2. LA-CC-04-086). Shao, Y. et al., PCCP 2006, 8 , 3172. VandeVondele, J.; Krack, M.; Mohamed, F.; Parrinello, M.; Chassaing, T.; Hutter, J. Comput. Phys. Commun. 2005, 167 , 103. Soler, J. M.; Artacho, E.; Gale, J. D.; Garc`ıa, A.; Junquera, J.; Ordejon, P.; SanchezPortal, D. J. Phys. Condens. Matter 2002, 14 , 2745. Kenny, S. D.; Horsfield, A. P.; Fujitani, H. Phys. Rev. B 2000, 62 , 4899. Ozaki, T. Phys. Rev. B 2003, 67 , 155108. Bowler, D. R.; Choudhury, R.; Gillan, M. J.; Miyazaki, T. Phys. Status Solidi (b) 2006, 243 , 989.

74

SIESTA: A LINEAR-SCALING METHOD FOR DFT

12. Skylaris, C. K.; Haynes, P. D.; Mostofi, A. A.; Payne, M. C. J. Phys. Condens. Matter 2008, 20 , 064209. 13. Perdew, J. P. Physica B 1991, 172 , 1. 14. Perdew, J. P.; Kurth, S.; Zupan, A.; Blaha, P. Phys. Rev. Lett. 1999, 82 , 2544. 15. Becke, A. D. J. Chem. Phys. 1993, 98 , 5648. 16. Anisimov, V. I.; Zaanen, J.; Andersen, O. K. Phys. Rev. B 1991, 44 , 943. 17. Kleinman, L.; Bylander, D. M. Phys. Rev. Lett. 1982, 48 , 1425. 18. Hamann, D. R.; Schl¨uter, M.; Chiang, C. Phys. Rev. Lett. 1979, 43 , 1494. 19. Troullier, N.; Martins, J. L. Phys. Rev. B 1991, 43 , 1993. 20. Kerker, G. P. J. Phys. C 1980, 13 , L189. 21. Vanderbilt, D. Phys. Rev. B 1990, 41 , 7892. 22. Bl¨ochl, P. E. Phys. Rev. B 1994, 50 , 17953. 23. Bili´c, A.; Gale, J. D. Phys. Rev. B 2009, 79 , 174107. 24. Louie, S. G.; Froyen, S.; Cohen, M. L. Phys. Rev. B 1982, 26 , 1738. 25. Ahlrichs, R.; Taylor, P. R. J. Chim. Phys. Phys. Chim. Biol . 1981, 78 , 315. 26. Becke, A. D.; Dickson, R. M. J. Chem. Phys. 1990, 92 , 3610. 27. Delley, B. J. Chem. Phys. 1990, 92 , 508. 28. Sankey, O. F.; Niklewski, D. J. Phys. Rev. B 1989, 40 , 3979. 29. Junquera, J.; Paz, O.; Sanchez-Portal, D.; Artacho, E. Phys. Rev. B 2001, 64 . 30. Anglada, E.; Soler, J. M.; Junquera, J.; Artacho, E. Phys. Rev. B 2002, 66 , 205101. 31. Sanchez-Portal, D.; Ordejon, P.; Artacho, E.; Soler, J. M. Int. J. Quantum Chem. 1997, 65 , 453. 32. Causa, M.; Dovesi, R.; Pisani, C.; Roetti, C. Phys. Rev. B 1986, 33 , 1308. 33. Garc´ıa-Gil, S.; Garc´ıa, A.; Lorente, N.; Ordejon, P. Phys. Rev. B 2009, 79 , 075441. 34. Boys, S. B.; Bernardi, F. Mol. Phys. 1970, 19 , 553. 35. Greengard, L.; Rokhlin, V. J. Comput. Phys. 1987, 73 , 325. 36. Chelikowsky, J. R.; Troullier, N.; Wu, K.; Saad, Y. Phys. Rev. B 1994, 50 , 11355. 37. Brandt, A. Math. Comput. 1977, 31 , 333. 38. Briggs, E. L.; Sullivan, D. J.; Bernholc, J. Phys. Rev. B 1995, 52 , R5471. 39. Artacho, E.; Anglada, E.; Dieguez, O.; Gale, J. D.; Garc`ıa, A.; Junquera, J.; Martin, R. M.; Ordejon, P.; Pruneda, J. M.; Sanchez-Portal, D.; Soler, J. M. J. Phys. Condens. Matter 2008, 20 , 064208. 40. Anglada, E.; Soler, J. M. Phys. Rev. B 2006, 73 , 115122. 41. Becke, A. D. J. Chem. Phys. 1988, 88 , 2547. 42. Monkhorst, H. J.; Pack, J. D. Phys. Rev. B 1976, 13 , 5188. 43. Moreno, J.; Soler, J. M. Phys. Rev. B 1992, 45 , 13891. 44. Goedecker, S. Rev. Mod. Phys. 1999, 71 , 1085. 45. Bowler, D. R.; Fattebert, J. L.; Gillan, M. J.; Haynes, P. D.; Skylaris, C. K. J. Phys. Condens. Matter 2008, 20 , 290301. 46. Lehoucq, R. B.; Sorensen, D. C.; Yang, C. ARPACK Users’ Guide: Solution of LargeScale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, Society for Industrial and Applied Mathematics, Philadelphia, 1998. 47. Karypis, G.; Kumar, V. SIAM J. Sci. Comput. 1999, 20 , 359.

REFERENCES

48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65.

75

Benzi, M.; Meyer, C. D.; Tuma, M. SIAM J. Sci. Comput. 1996, 17 . Yang, W. Phys. Rev. Lett. 1991, 66 , 1438. Yang, W.; Lee, T.-S. J. Chem. Phys. 1995, 103 , 5674. Cankurtaran, B. O.; Gale, J. D.; Ford, M. J. J. Phys. Condens. Matter 2008, 20 , 294208. van der Vaart, A.; Gogonea, V.; Dixon, S. L.; Merz, Jr, K. M. J. Comput. Chem. 2000, 21 , 1494. Coleman, A. J. Rev. Mod. Phys. 1963, 35 , 668. McWeeny, R. Rev. Mod. Phys. 1960, 32 , 335. Niklasson, A. M. N. Phys. Rev. B 2002, 66 , 155115. Li, X. P.; Nunes, R. W.; Vanderbilt, D. Phys. Rev. B 1993, 47 , 10891. Millam, J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106 , 5569. Challacombe, M. J. Chem. Phys. 1999, 110 , 2332. Mazari, N.; Vanderbilt, D. Phys. Rev. B 1997, 56 , 12847. Kim, J.; Mauri, F.; Galli, G. Phys. Rev. B 1995, 52 , 1640. Garc´ıa-Su´arez, V. M.; Newman, C. M.; Lambert, C. J.; Pruneda, J. M.; Ferrer, J. J. Phys. Condens. Matter 2004, 16 , 5453. Pulay, P. Chem. Phys. Lett. 1980, 73 , 393. Johnson, D. D. Phys. Rev. B 1988, 38 , 12807. Saunders, V. R.; Hillier, I. H. Int. J. Quantum Chem. 1973, 7 , 699. Douady, J.; Ellinger, Y.; Subra, R.; Levy, B. J. Chem. Phys. 1980, 72 , 1452.

3

Large-Scale Plane-Wave-Based Density Functional Theory: Formalism, Parallelization, and Applications ERIC BYLASKA William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington

KIRIL TSEMEKHMAN University of Washington, Seattle, Washington

NIRANJAN GOVIND and MARAT VALIEV William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington

The basic density functional formalism presented in Chapter 1 is applied to the simulation of large materials, solutions, and molecules using plane-wave basis sets. This parallels the applications developed in Chapter 2 for similar systems using atomic basis sets. Much attention is focused on the pseudopotentials that describe the interaction of the atomic nuclei and their inner-shell electrons (“ions”) with the valence electrons. Methods for simulating charged systems are described, as well as the use of hybrid density functionals in simulations of chemical properties. Advances in numerical methods and software (contained in the NWChem package) are described that allow for both geometry optimization and multi-picosecond time scale Car–Parinello molecular dynamic simulations of very large systems. Sample applications including the structure of hematite and the aqueous solvation of cations are described.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

77

78

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

3.1 INTRODUCTION

The development of fast and efficient ways to calculate density functional theory (DFT) using plane-wave basis sets1 – 8 combined with parallel supercomputers7,9 – 16 has opened the door to new classes of large-scale first-principles simulations. It is now routine at this level of theory to perform simulations containing hundreds of atoms,17 and simulations containing over 1000 atoms are feasible on today’s parallel supercomputers,20 making realistic descriptions of a variety of systems possible. Several techniques are responsible for the efficiency of plane-wave DFT programs. The central feature is the representation of the electronic orbitals in terms of a plane-wave basis set. In this representation, one can take advantage of fast fourier transform (FFT) algorithms21 for fast calculations of total energies and forces. Periodic boundary conditions (PBCs) are also incorporated automatically as a result. However, the plane-wave basis sets do have an important shortcoming: their inefficient description of the electronic wavefunction in the vicinity of the atomic nucleus or core region. Valence wavefunctions vary rapidly in this region and much more slowly in the interstitial regions (or bonding regions) (see Fig. 3.1). Accurate description of the rapid variation of the wavefunction inside the atomic or core region would require very large plane-wave basis sets. The pseudopotential plane-wave (PSPW) method can be used to resolve this problem.22 – 25 In this approach the fast-varying core regions of the atomic potentials and the core electrons are removed or pseudized and replaced by smoothly varying pseudopotentials. The pseudopotentials are constructed such that the scattering properties of the resulting pseudoatoms are the same as those of the original atoms.26,27 The rationale behind the pseudopotential approach is that changes in the electronic wavefunctions during bond formation occur only in the valence region, and therefore proper removal of the core from the problem should not affect the prediction of bonding properties of the system. The projector augmented plane-wave (PAW) method developed by Bl¨ochl is a further enhancement of the pseudopotential in that it addresses some of the shortcomings encountered in a traditional PSPW approach. Since the main computational algorithms are essentially the same in the two approaches, we will not specifically discuss the PAW approach and refer the reader to comprehensive reviews.8,15,28 – 31

Fig. 3.1

Valence wavefunction.

PLANE-WAVE BASIS SET

79

3.2 PLANE-WAVE BASIS SET

Plane waves are natural for solid-state applications, since crystals are readily represented using periodic boundary conditions where the system is enclosed in a unit cell defined by the primitive lattice vectors a 1 , a 2 , and a 3 , as shown in Fig. 3.2. However, periodic plane-wave basis sets can also be used for molecular simulations as long as the unit cell is large enough to minimize the image interactions between cells. In terms of plane waves, the molecular orbitals are represented as 1 ψi (r) = √ ψi (G)eiG·r {G}

(3.1)

where is the volume of the primitive cell ( = [a 1 , a 2 , a 3 ] = a 1 · (a 2 × a 3 )). Since the system is periodic, the plane-wave expansion must consist of only the plane waves eiG·r that have the periodicity of the lattice, which can be determined using the constraint eiG·(r+L) = eiG·r

(3.2)

where L is the Bravais lattice vector (L = n1 a 1 + n2 a 2 + n3 a 3 , with n1 , n2 , n3 = integers) and G represents the wave vectors, which can be defined in terms of the reciprocal lattice vectors: N1 N2 N3 (3.3) b1 + i2 − b2 + i3 − b3 Gi1 i2 i3 = i1 − 2 2 2 a3 a2 a3 a2

a1

a1

Periodic Boundaries

Fig. 3.2 Unit cell in periodic boundary conditions. The solid arrows represent the Bravais lattice vectors.

80

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

where N1 , N2 , and N3 are chosen sizes of the lattice vector grid, which can range from 1 to ∞; i1 , i2 , and i3 are integers defined in the ranges of 1 · · · N1 , 1 · · · N2 , and 1 · · · N3 , respectively, and b1 = 2π

a2 × a3

b2 = 2π

a3 × a1

b3 = 2π

a1 × a2

(3.4)

are the primitive reciprocal lattice vectors. A real space grid that is dual to the reciprocal lattice grid can be defined and is given by i1 1 i2 1 i3 1 a1 + a2 + a3 ri1 i2 i3 = − − − (3.5) N1 2 N2 2 N3 2 The transformation between the reciprocal and real space representations is achieved via the discrete Fourier transform: N3 N1 N2 1 F (Gj1 j2 j3 )eiGj1 j2 j3 ·ri1 i2 i3 f (ri1 i2 i3 ) = √ j =1 j =1 j =1

√

1

2

F (Gi1 i2 i3 ) = N1 N2 N3

3

N3 N1 N2

f (rj1 j2 j3 )e

(3.6)

−iGj1 j2 j3 ·ri1 i2 i3

j1 =1 j2 =1 j3 =1

These transformations can be calculated efficiently via fast Fourier transform (FFT) algorithms.21 In typical plane-wave calculations, the plane-wave expansion is truncated in that only the reciprocal lattice vectors whose kinetic energy is lower than a predefined maximum cutoff energy, 1 2 2 |G|

(3.7)

< Ecut

are kept in the expansion, while the rest of the coefficients are set to zero. The density is also expanded using plane waves, ρ(r) =

i

ψ∗i (r)ψi (r) =

ρ(G)eiG·r

(3.8)

G

Since the density is the square of the wavefunctions, it can vary twice as rapidly. Hence, for translational symmetry to be formally maintained, the density should contain eight times more plane waves than the corresponding wavefunction expansion. Often, the density cutoff energy is chosen to be the same as the wavefunction cutoff energy; this approximation is known as dualing. An added complication arises in the calculation of crystalline systems. In these systems the orbitals may have long-wavelength contributions that span over a large number of primitive unit cells. To account for the infinite number of electrons in the periodic system, an infinite number of k-points are required.

PSEUDOPOTENTIAL PLANE-WAVE METHOD

81

The Bloch theorem, however, helps restate this problem of calculating an infinite number of wavefunctions to one of calculating a finite number of wavefunctions at an infinite number of k-points or BZ points: eik·r ψi (G)eiG·r ψi (r) = √ G

(3.9)

Since the occupied states at each k-point contribute to the electronic potential, an infinite number of calculations are required in principle. However, experience tells us that wavefunctions at k-points that are nearby are almost identical. As a result, one can redefine the k-point summations or integrals in the DFT expressions to those that just span only a small set of special k-points in the Brillouin zone. There are a number of prescriptions to generate these special points. Since a detailed discussion of the various prescriptions is beyond the scope of this chapter, we refer the reader to more comprehensive papers and reviews.1,32 – 34 Obviously, for molecular systems there is no need for k-point sampling. Systems with large unit cells (disordered systems) and large bandgap systems also do not require or require a limited k-point sampling because the long-wavelength components are typically contained within the unit cell as in the former, or the electronic states are localized as in the latter. In this work we restrict ourselves to the -point (k = 0), since we are interested in isolated systems and systems with large unit cells.

3.3 PSEUDOPOTENTIAL PLANE-WAVE METHOD

The pseudopotential plane-wave method (PSPW) has its roots in the work on orthogonalized plane waves35 and core state projector methods,23 and empirical pseudopotentials have been used for some time in plane-wave calculations.25,36 – 38 However, this method was not considered entirely reliable until the development of norm-conserving pseudopotentials.26,39 – 41 It is currently a very popular method for solving DFT equations. In particular, PSPW can perform ab initio molecular dynamics very efficiently,3 and treat unit cells up to a couple of thousand atoms.4,6,7,17 Another advantage of PSPW methods is their transferability across a wide range of systems. In this section we describe implementation of the norm-conserving PSPW method. Formulas for the total energy, wavefunction gradient, and nuclear gradients are given in terms of a plane-wave basis set at the -point. 3.3.1 Pseudopotentials

Pseudopotentials (effective core potentials) are based on two observations. First, in almost any system one can identify a set of core orbitals which change little from their atomic counterparts. Second, the remainder, or valence orbitals, acquire their oscillating behavior as a result of their orthogonality to the core orbitals. This also keeps valence electrons away from the core. In the

82

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

pseudopotential approximation the original atoms that constitute a given chemical system are modified by removing core states and replacing their effect via a repulsive pseudopotentials. This removes the rapid oscillations from the atomic valence orbitals and allows efficient application of plane-wave basis set expansion. The resulting pseudoatoms will in general acquire a nonlocal potential term. There have been many ways to define pseudopotentials.1,23,24,27,40 – 58 The original procedure of Phillips and Kleinman formed pseudopotentials from pseudo wavefunctions in which atomic core wavefunctions were added to the valence wavefunctions.23 Unfortunately, this procedure and related later developments44 – 46 resulted in “hard-core” potentials that contained singularities. These pseudopotentials were not useful in plane-wave calculations, since the nonregularized singularities could not be expanded using a reasonable number of plane waves. At about the same time, “soft-core” empirical pseudopotentials were developed.24,25,36 – 38 These potentials, which were made up of smooth functionals with a few parameters, were fitted to reproduce one-electron eigenvalues and orbital shapes. Such soft-core pseudopotentials were readily expanded using plane waves. However, pseudopotentials generated in this way were not transferable, yielding pseudowavefunctions that were different from the true valence wavefunctions by a few percent outside the core. Later it was realized that soft-core pseudopotentials needed to maintain norm conservation for them to be transferable.26,39 – 41 The principle of norm conservation states that if the charge of the real valence densities and the pseudovalence densities are identical inside the core region, the real valence wavefunction and pseudowavefunction will be identical outside the core region. This procedure was refined over the years and now most soft-core pseudopotentials are designed to have the following properties54 :

• • • • • •

The valence pseudowavefunction generated from the pseudopotentials should not contain nodes. The pseudowavefunctions near zero approach ϕ˜ l (r) → r l+1 . This criterion removes the singularities from the pseudopotential. Real and pseudovalence eigenvalues agree for a chosen “prototype” atomic = εPP configuration (εAE l l ). Real and pseudoatomic valence wavefunctions agree beyond a chosen core radius r c . Real and pseudovalence charge densities agree for r > r c . Logarithmic derivatives and the first energy derivatives agree for r > r c .

These types of pseudopotentials are called norm-conserving pseudopotentials. Here we review briefly the construction of pseudopotentials suggested by Troullier and Martins.54 The first step is to solve the radial Kohn–Sham equation self-consistently for a given atom: l(l + 1) 1 d2 + + V (r) ϕnl (r) = εnl ϕnl (r) (3.10) − AE 2 dr 2 2r 2

83

PSEUDOPOTENTIAL PLANE-WAVE METHOD

to obtain a set of radial atomic orbitals, {ϕnl }. The self-consistent potential VAE (r) is given by Z ρ(r ) dr + Vxc (ρ(r)) (3.11) VAE (r) = − + r |r − r | where the density, ρ(r), is given by the sum of the occupied orbital densities, ϕnl (r) 2 ρ(r) = fnl (3.12) r nl

and Vxc (ρ(r)) is the exchange–correlation potential. In Eq. (3.12), fnl is the occupancy of the nl state. Pseudopotential construction starts by introducing a smooth pseudovalence wavefunction, ϕ˜ l (r), such that it and at least one derivative continuously approaches the all-electron valence wavefunction, ϕlAE (r), beyond a chosen cutoff radius rcl . In addition, to avoid a hard-core pseudopotential (i.e., a singularity in the pseudopotential), the pseudowavefunctions near zero have to approach ϕ˜ l (r) → r l+1 . The actual functional form of ϕ˜ l (r) could be chosen in many different ways. Troullier and Martins suggested the following form for the pseudowavefunctions: ϕlAE (r) if r ≥ rcl (3.13) ϕ˜ l (r) = l+1 p(r) r e if r < rcl where p(r) is a polynomial of order 12: p(r) =

6

cn r 2n

(3.14)

n=0

The seven coefficients are then determined using the following constraints:

• • •

Norm conservation with the core Continuity of the pseudowavefunction and its first four derivatives at rcl The curvature of the screened pseudopotential at the origin defined to be zero

An explicit procedure to do this can be found in the paper by Troullier and Martins.54 The next step is to generate the screened pseudopotentials, which are easily obtained by inverting the radial Schr¨odinger equation: Vlscr (r) = εl −

l(l + 1) 1 d2 + ϕ˜ l (r) 2r 2 2ϕ˜ l (r) dr 2

⎧ ⎨VAE (r) 2 = ⎩εl + l + 1 p (r) + p (r) + [p (r)] r 2 2

if r ≥ rcl if r < rcl

(3.15)

84

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

Three important properties of pseudopotentials result from Eq. (3.15). First, the pseudopotential will not be continuous if the pseudowavefunction does not have at least two continuous derivatives. Second, a hard-core singularity will be present in Eq. (3.15) if ϕ˜ l (r) = r l+1 at zero. Third, the pseudopotentials may contain discontinuities if the pseudowavefunctions have nodes. For rare gases, where all the electrons are in the core, these are the correct pseudopotentials to use. However, in cases where one wants to include valence electrons in a calculation, the screened potentials must be unscreened to remove the effects of the valence electrons from the pseudopotential, thus generating an ionic pseudopotential. This is done by subtracting off the Hartree and exchange–correlation potentials that are calculated from the valence pseudowavefunctions from the screened pseudopotential: ∞ 4π r ion scr 2 Vl (r) = Vl (r) − ρ˜ (r )r dr − 4π ρ˜ (r )r dr − Vxc (˜ρ(r)) (3.16) r 0 r where ρ˜ (r) =

l

ϕ˜ l (r) 2 fl r

(3.17)

In Section 3.4.8, fl is the occupancy of the valence state l. Based on these atomic pseudopotentials, the pseudopotential for the entire system takes the form

Vpsp (r, r ) =

lmax l

∗ Ylm (ˆr)(Vlion (|r|)δ(|r| − |r |))Ylm (ˆr )

(3.18)

l=0 m=−l

where Ylm (ˆr) are spherical harmonic functions. Because of the explicit angular dependence of the pseudopotentials, the formula for applying ionic pseudopotentials of Eq. (3.18) to nonspherical systems is fairly difficult. In this semilocal form, the pseudopotential is computationally difficult to calculate with a planewave basis set, since the kernel integration is not separable in r and r . This form of the pseudopotential is usually simplified by rewriting the potential kernel into a separable form suggested by Kleinman and Bylander,59 which was later shown by Bl¨ochl60 to be the first term of a complete series expansion using atomic pseudowavefunctions. Equation (3.18) rewritten within the Kleinman–Bylander form is KB Vpsp (r, r ) = Vlocal (r) +

lmax l

∗ Plm (r)hl Plm (r )

(3.19)

l=0 m=−l

where the atom-centered projectors Plm (r) are of the form

Plm (r) = Vlion (|r|) − Vlocal (|r|) ϕ˜ l (|r|)Ylm (ˆr)

(3.20)

PSEUDOPOTENTIAL PLANE-WAVE METHOD

85

and the coefficient hl = 4π

∞ 0

[Vlion (r)

− Vlocal (r)]ϕ˜ l (r)r dr 2

−1 (3.21)

where ϕ˜ l (r) are the zero radial node pseudowavefunctions corresponding to Vlion (r). The choice of the local potential, Vlocal (r), is somewhat arbitrary but is usually chosen to be the highest angular momentum pseudopotential.27,54 When a larger series expansion atomic wavefunction is used,49,60 it is easy to show that Eq. (3.19) will have the general form Vpsp (r, r ) = Vlocal (r) +

lmax l n max n max

Pnlm (r)hn,n Pn∗ lm (r ) l

(3.22)

l=0 m=−l n=1 n =1

It is known that the norm-conservation condition results in harder pseudopotentials for some elements. For example, the p states in the first-row elements (oxygen, 2p) and the d states in the second-row transition elements (copper, 3d) do not have core counterparts of the same angular momentum. As a result, these states are compact and close to the core compared to the other valence states, resulting in higher plane-wave cutoffs. The ultrasoft pseudopotentials developed by Vanderbilt52,61 relax the norm-conservation condition by generalizing the norm-conservation sum rule. This results in pseudopotentials that are smoother and consequently require a lower plane-wave cutoff. We do not discuss the details of these pseudopotentials in this chapter and refer the reader to more comprehensive reviews.7,8,28,31,62 3.3.2 Total Energy

The total energy in the pseudopotential plane-wave method can be written as a sum of kinetic, external (i.e., pseudopotential), electrostatic, and exchange and correlation energies: Etotal = Ekinetic + Epseudopotential + Eelectrostatic + Exc

(3.23)

The kinetic energy can be written Ekinetic =

1 2

fi G2 |ψi (G)|

2

(3.24)

i,G

where fi are the occupation numbers. To simplify our presentation here we restricted ourselves to spin-unpolarized systems, with fi = 2. The extension to spin-polarized systems is straightforward and will not be discussed here. The pseudopotential energy Epseudopotential is given as a sum of local and nonlocal contributions: local nonlocal + Epseudopotential Epseudopotential = Epseudopotential

(3.25)

86

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

The local portion of the pseudopotential energy can be evaluated as I I local Epseudopotential = V local (r)ρ(r) dr = V local (G)ρ∗ (G) I

(3.26)

I,G

The valence electron density in reciprocal space ρ(G) is obtained from its real space representation, ρ(r) = fn |ψn (r)|2 , using a fast Fourier transform. The local potential is defined to be periodic and is represented as a sum of piecewise functions on the Bravais lattice by I I V local (r) = Vlocal (|r − RI − L|) (3.27) L I (r) where RI is the location of atom I, L is a Bravais lattice vector, and Vlocal is the radial local potential for the ion defined in Section 3.3.1. The local pseudopotential in reciprocal space is found by a spherical Bessel transform ∞ 4π I I V local (G) = √ eiG·RI Vlocal (r)j0 (r)r 2 dr (3.28) 0

is the spherical Bessel function. where j0 (r) = sin(r) r The nonlocal part of the pseudopotential energy is given by nonlocal I = fi ψ∗i (G)VˆNL (G, G )ψi (G ) Epseudopotential i

I

(3.29)

G,G

where I VˆNL (G, G ) =

I I∗ Plm (G)hIl Plm (G )

(3.30)

lm I (G) is the reciprocal space representation of the nonlocal projector [e.g., and Plm Eq. (3.20)], which can be obtained using the spherical Bessel transform ∞ 4π −iG·RI −l I I ˆ Plm (G) = √ e i Ylm (G) Plm (r)jl (r)r 2 dr (3.31) 0

The electron–electron repulsion energy can be written as e−e = Eelectrostatic

1 2

VH (r)ρ(r)dr

=

1 2

G

ρ(G)VH∗ (G)

where the Hartree potential, VH (r), is defined as ρ(r − L) dr VH (r) = + L| |r − r L

(3.32)

(3.33)

PSEUDOPOTENTIAL PLANE-WAVE METHOD

and in reciprocal space it is calculated as ⎧ ⎨ 4π ρ(G) VH (G) = G2 ⎩ 0

G = 0

87

(3.34)

G=0

The ion–ion electrostatic energy for a periodic system can be facilitated using the Ewald decomposition63 : 1 4π |G|2 ion-ion = exp −i Eelectrostatic 2 |G|2 4ε G=0 ⎡ ⎤ ⎣ ZI exp(iG · RI )ZJ exp(−iG · RJ )⎦ I,J

+

1 2 L

ZI ZJ

I,J ∈|RI −RJ +L|=0

erf(ε|RI − RJ + L|) |RI − RJ + L|

2 ε 2 π −√ Z − ZI π I I 2ε2 I

(3.35)

where ε is a constant (typically on the order of 1) and L is a lattice vector. The exchange–correlation energy Exc with LDA or GGA approximation is given by Exc = fxc (ρ(r), |∇ρ(r)|)dr

fxc (ρ(ri1i2i3 ), |∇ρ(ri1i2i3 )|) ≈ Nr

(3.36)

i1i2i3

where fxc is the exchange–correlation energy density, is the volume of the unit cell, and N is the number of real-space grid points in the FFT grid ri1i2i3 . 3.3.3 Electronic Gradient

During the course of total energy minimization or Car–Parrinello molecular dynamics simulation it is required to calculate the electron gradient, defined as Si =

δEtotal δψ∗i

(3.37)

Part of the electron gradient is evaluated in reciprocal space and the other in real space: Si = SiG + Sir

(3.38)

88

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

The reciprocal-space portion contains contributions from the kinetic and nonlocal pseudopotential energy terms: nonlocal ∂Epseudopotential ∂Ekinetic + ∂ψ∗i (G) ∂ψ∗i (G) 1 I VˆNL = G2 ψi (G) + (G, G )ψi (G ) 2 I

SiG (G) =

(3.39)

G,G

The real-space portion is given by ∂ local e-e + E + E E xc pseudopotential electrostatic ∂ψ∗i (r) I = VH (r) + V local (r) + Vxc (r) ψi (r)

Sir (r) =

(3.40)

I I

where VH (r) and V local (r) are the Hartree potential and the local pseudopotential, respectively. The exchange–correlation potential is given by64 Vxc (ri1i2i3 ) = =

δExc δρ(ri1i2i3 )

1 ∂fxc ∇ρ(r ) ∂fxc iG·(ri1i2i3 -r ) − e iG · ∂ρ(ri1i2i3 ) N |∇ρ(r )| ∂∇ρ(r )

(3.41)

G,r

Equivalently, all the real-space expressions above can be derived from a completely reciprocal space representation using the convolution theorem. The real-space forms above are, however, considerably more efficient to compute. 3.3.4 Atomic Forces

The force acting on the atoms in the system is defined as FI = −

∂Etotal ∂RI

(3.42)

Only the pseudopotential and ion–ion electrostatic energies contribute to the force: I I + Fion-ion FI = Fpseudopotential

The force due to the pseudopotential is give by I =− Fpseudopotential

local ∂Epseudopotential

∂RI

−

nonlocal ∂Epseudopotential

∂RI

(3.43)

CHARGED SYSTEMS

=i

Gρ∗ (G)V local (G) I

G

− 2 Re

where ∇RI

89

i

I

lm

I ψ∗i (G)Plm (G)

hl ∇RI

G

I∗ Plm (G )ψi (G )

(3.44)

G

G

I∗ I∗ Plm (G )ψi (G ) = i G G Plm (G )ψi (G ).

The force due to the ion–ion interaction is given by I =− Fion-ion

=−

ion-ion ∂Eelectrostatic ∂RI

ZI ZJ (RI − RJ + L)

L J ∈|RI −RJ +L|=0

2 exp(−ε2 |RI − RJ + L|2 ) erf(ε|RI − RJ + L|) +√ × |RI − RJ + L|3 |RI − RJ + L|2 πε |G|2 1 4π G 2 exp − ZI + |G| 4ε G=0 × Im exp(iG · RI ) ZJ exp(−iG · RJ ) (3.45)

J

3.4 CHARGED SYSTEMS

As we have discussed so far, plane waves are ideal to describe systems that are intrinsically periodic. However, periodic and aperiodic systems are very different within a periodic boundary condition (PBC) framework and this is compounded further if the system is charged (e.g., charged defects, charged ions). The electrostatic energy in these systems is, in principle, divergent. A standard approach to dealing with this issue is to impose a charge-neutrality condition via a uniform charge background. This implicitly introduces a jellium background. Makov and Payne66 have shown that this procedure results in errors which go as L−1 for charged systems and L−3 for isolated neutral systems in three dimensions, where L is size of a cubic unit cell. One approach to minimizing these errors is to use the scheme developed by Leslie and Gillan65 and improved by Makov and Payne.66 They derived an analytic expression for the electrostatic correction between charged unit cells as follows: q 2 α 2πqQ 1 + O − (3.46) EMakov-Payne = Etotal − 2L 3L3 L5

90

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

where Etotal is the calculated energy of the charged cell, α is the Madelung constant for the lattice, q is the total charge of the cell, and Q is the quadrupole moment of the cell, given by r 2 ρ(r) dr (3.47) Q=

Another approach for charged systems is via free-space boundary conditions. Provided that the density has decayed to zero at the edge of the supercell, freespace boundary conditions can be implemented by restricting the integration to just one isolated supercell, , 1 ρ(r)g(r, r )ρ(r ) dr dr ECoulomb = 2 VH ( r) = g(r, r )ρ(r ) dr (3.48)

This essentially defines a modified Coulomb interaction ⎧ ⎨ 1 for r, r ∈ g(r, r ) = |r − r | ⎩ 0 otherwise

(3.49)

Hockney and Eastwood showed that an interaction of the form of Eq. (3.49) could still be used in conjunction with the fast Fourier transform convolution theorem.67,68 In their algorithm, the interaction between neighboring supercells is removed by padding the density with an external region of zero density, or in the specific case of a density defined in cubic supercell of length L, the density is extended to a cubic supercell of length 2L, where the original density is defined as before on the [0, L]3 domain and the remainder of the [0, 2L]3 domain is set to zero. The grid is eight times larger than the conventional grid. The Coulomb potential is calculated by convoluting the density with the Green’s function kernel on the extended grid. The density on the extended grid is defined by expanding the conventional grid to the extended grid and putting zeros where the conventional grid is not defined. After the aperiodic convolution, the free-space potential is obtained by restricting the extended grid to the conventional grid. In his original work, Hockney suggested that the cutoff Coulomb kernel could be defined by ⎧ constant ⎪ for |ri,j,k | = 0 ⎪ ⎨ h (3.50) g(ri,j,k ) = 1 ⎪ ⎪ otherwise ⎩ |ri,j,k | where h3 is the constant volume of subintervals, defined by the unit cell divided by the number of conventional FFT grid points.67 Hockney suggested a constant

CHARGED SYSTEMS

at |r| = 0 to be between 1 and 3. Barnett and defined the constant to be69 ⎧ ⎪ ⎨2.380077 1 1 dr ≈ 0.910123 ⎪ h2 h 3 r ⎩1.447944

91

Landman in their implementation for SC lattice for FCC lattice for BCC lattice

(3.51)

Regardless of the choice of the constant, the singular nature of g(r) in real space can lead to significant numerical error. James addressed this problem somewhat by expanding the Coulomb kernel to higher orders in real space.70 The convolution theorem suggests that defining g(r) in reciprocal space will lead to much higher accuracy. A straightforward definition in reciprocal space is guniform (G)eiG·r g(r) = G

1 guniform (G) = 3 h

e−i(G•r/2 ) dr r

(3.52)

where is the volume of the extended unit cell and h3 is the volume of the unit cell divided by the number of conventional FFT grid points. The reciprocal space definition gains accuracy because the singularity at r = r in Eq. (3.48) is integrated out analytically. Even when Eq. (3.52) is used to define the kernel, a slight inexactness in the calculated electron–electron Coulomb energy will always be present, due to the discontinuity introduced in the definition of the extended density where the extended density is forced to be zero in the extended region outside . However, this discontinuity is small, since the densities we are interested in decay to zero within , thus making the finite Fourier expansion of the extended densities extremely close to zero in the extended region outside . Equation (3.52) could be calculated numerically; however, we have found that alternative definitions can be used with little loss of numerical accuracy. In an earlier work71,72 we suggested that the cutoff Coulomb kernel could be defined as ⎧ ga (G)eiG·r for |r| ≤ Rmax − δ ⎪ ⎪ ⎨ G g(r) = ⎪ 1 ⎪ ⎩ otherwise |r| ⎧ 2π(Rmax )2 ⎪ ⎪ for G = 0 ⎨ h3 ga (G) = ⎪ ⎪ ⎩ 4π [1 − cos(G2 Rmax )] otherwise h3 G2

92

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

⎧ L (simple cubic) ⎪ ⎪ ⎪√ ⎪ ⎪ ⎨ 2 L (face-centered cubic) Rmax = 2 ⎪ ⎪ √ ⎪ ⎪ ⎪ 3 ⎩ L (body-centered cubic) 2 δ = small constant

(3.53)

Other forms have been suggested and could also be used.7,73 – 75 The Fourierrepresented kernels improve the integration accuracy by removing the singularity at |r − r | in a trapezoidal integration. A disadvantage of the kernel defined by Eq. (3.53) is that only regular-shaped cells can be used. To extend this method to irregular-shaped cells, short- and long-range decomposition can be used15 : g(r) = gshortrange (r) + glongrange (r) gshortrange (G) eiG·r gshortrange (r) = ⎧ 4π 2 2 ⎪ ⎨ 3 2 (1 − e−(|G| /4ε ) ) gshortrange (G) = h G ⎪ ⎩ π h3 ε2 ⎧ erf(εr) ⎪ ⎪ for r = 0 ⎨ r glongrange (r) = 2ε ⎪ ⎪ ⎩√ for r = 0 π

for G = 0 for G = 0

(3.54)

We have found this kernel to give very high accuracy, even for highly noncubic supercells. Marx and Hutter recently proposed the use of this kernel as well.7 Other kernel definitions are possible (e.g., using short- and long-range decomposition based on a Lorentzian).74 Other schemes involve the use of countercharges, represented by Gaussian densities, whose potential can be derived analytically. Since a detailed discussion of the various approaches to this problem is beyond the scope of this chapter, we refer the reader to various papers on the subject.65,66,76 – 78 3.5 EXACT EXCHANGE

A number of failures are known to exist in DFT (see Chapter 1), such as underestimating bandgaps, the inability to localize excess spin density, and underestimating chemical reaction barriers. These problems are a consequence of having to rely on computationally efficient approximations to the exact exchange–correlation functional (e.g., LDA and GGA) used by plane-wave DFT programs—that is an accuracy–performance trade-off. It is generally agreed

EXACT EXCHANGE

93

that the largest error in these approximations is their failure to completely cancel out the orbital self-interaction energies, or in plain terms that electrons partially “see” themselves.79,80 In the Hartree–Fock approximation, the exchange energy is calculated exactly and no self-interaction is present; however, by definition all electron correlation effects are missing from it. In all practical implementations of DFT the exchange energy is calculated approximately, and cancellation of the self-interaction is incomplete. Experience has shown that many of the failures associated with the erroneous self-interaction term can be corrected by approaches in which DFT exchange–correlation functionals are improved by inclusion of the nonlocal exchange term (hybrid-DFT, e.g., B3LYP and PBE081 ),82 Ex-exact = −

σ ρij (r)ρσij (r ) 1 dr dr 2 σ=↑,↓ n m |r − r |

(3.55)

were the overlap densities are given by σ ρσij (r) = ψσ∗ i (r)ψj (r)

(3.56)

Using the expanded Bloch states83 representation eik·r σ ψik (G)eiG·r ψσik (r) = √ G

(3.57)

the exchange term takes the form Ex-exact =

−1 2 dk dl 2 8π3 σ=↑,↓ BZ BZ 4π σ σ ρ (−G)ρik;j l (G) |G − k + l|2 j l;ik n m

(3.58)

G

where ρσik;j l (G) =

σ ψσ∗ ik (G )ψj l (G + G)

(3.59)

G

As pointed out by Gygi and Baldereschi84 – 86 and others,87 – 91 this expression must be evaluated with some care, especially for small Brillouin zone samplings and small unit cell size, because of the singularity at G − k + l = 0. A better alternative for the evaluation of Ex-exact for -point (k = 0) calculations with large unit cells can be found in terms of localized Wannier orbitals.92,93

94

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

The standard approach for the generation of Wannier orbitals using unitary transformation over k, σ wi (r − L) = e−ik·L ψσik (r)dk (3.60) 8π3 BZ is not applicable for the -point case. Instead, one can follow a Marzari–Vanderbilt localization procedure (which is the counterpart of the Foster–Boys transformation for finite systems)92 – 94 forming linear combinations of ψσik=0 (r) over different n to produce a new set of -point σ Bloch functions, w ik=0 (r). These new periodic orbitals are extremely localized within each cell for nonmetallic systems with sufficiently large unit cells93 σ (see Fig. 3.3). In that case w ik=0 (r) can be represented as a sum of piecewise σ localized functions, wi (r − L), on the Bravais lattice σ w ik=0 (r) =

wiσ (r − L)

(3.61)

L

with the exchange term per unit cell written as Ex-exact = −

1 2 i

Fig. 3.3 (color online) SiO2 crystal.

j

wi∗ (r)wj (r)wj∗ (r )wi (r ) |r − r |

dr dr

(3.62)

Periodic localized function wik=0 (r) for a 72-atom unit cell of a

WAVEFUNCTION OPTIMIZATION FOR PLANE-WAVE METHODS

95

Evaluation of this integral in a plane-wave basis set requires some care, since representing overlap densities [wi∗ (r)wj (r)] with a plane-wave expansion [i.e., ∗ w i (r)w j (r)] will result in the inclusion of redundant periodic images. Interactions between such images can be eliminated95,96 by replacing the standard Coulomb kernel, 1/r, in Eq. (3.13) by the following cutoff Coulomb kernel: Nc +2

fcutoff (r) =

1 − [1 − e−(r/Rc ) r

]Nc

(3.63)

where Nc and Rc are adjustable parameters. This kernel decays rapidly to zero at distances larger than Rc . Hence, Eq. (3.62) can be transformed to σ 1 wσ∗ Ex-exact = − 2 i (r)w j (r)fcutoff σ=↑,↓

i

j

σ∗

σ (|r − r |)wj (r )w i (r ) dr dr

(3.64)

That is, replacing wi (r) with w i (r), combined with using Eq. (3.14), in Eq. (3.13) will give the same energy, since the cutoff Coulomb interaction is nearly 1/r with itself and zero with its periodic images. The parameter Rc must be chosen carefully. It has to exceed the size of each Wannier orbital to include all of the orbital in the integration, while concurrently having 2Rc be smaller than the shortest linear dimension of the unit cell to exclude periodic interactions. Finally, we note that when one uses the cutoff Coulomb kernel, localized orbitals are not needed to calculate the exchange term since Eq. (3.62) can be unitary transformed, resulting in σ σ∗ σ Ex-exact = − 12 ψσ∗ i (r)ψj (r)fcutoff (|r − r |)ψj (r )ψi (r ) dr dr

σ=↑,↓

i

j

(3.65)

and δEx-exact =− ψσj (r) σ∗ δψi (r)

σ fcutoff (|r − r |)ψσ∗ j (r )ψi (r ) dr

(3.66)

j

We note that while using the localized functions here is not required in this formulation, one should still evaluate the set of maximally localized Wannier functions in order to estimate their extent and, consequently, the minimal size of the unit cell. 3.6 WAVEFUNCTION OPTIMIZATION FOR PLANE-WAVE METHODS

In DFT calculations it is necessary to determine the set of orthonormal oneelectron wavefunctions {ψi } that minimize the Kohn–Sham energy functional. There are two classes of methods available for optimizing the Kohn–Sham energy functional: the self-consistent field approach and the direct minimization approach.

96

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

3.6.1 Self-Consistent Field Method

The steps involved in the self-consistent field procedure are as follows: 1. Set the iteration number m = 0 and choose an initial set of trial molecular orbitals {ψn } and input charge ρ(r); for example, ρ(0) (r) =

occ

|ψi (r)|2

i=1

2. Use the input charge density to construct an effective potential which is a sum of the Hartree and exchange–correlation potentials, respectively: Veff (r) = VH ρ(m) , r + Vxc ρ(m) , r 3. Generate a new set of molecular orbitals by solving the linearized Kohn–Sham equations via an iterative scheme: I I 1 2 (V local (r) + VˆNL ) + Veff (r) ψi (r) = εi ψi (r) −2∇ + I

4. Use the new set of molecular orbitals to construct an output density: ρ(m) out (r) =

occ

˜ n (r)|2 |ψ

n=1

5. Generate a new input density by mixing the output density with the previous input density: ρ(m+1) (r) ⇐ ρ(m) , ρ(m) out 6. If self-consistency is not achieved, m = m + 1; go to step 2. In this scheme, self-consistency is achieved when the distance between the input and output charge densities is zero: D[ρout , ρ] = ρout − ρ|ρout − ρ (3.67) For plane-wave methods, where the molecular orbitals are expanded using ∼10,000 to several million basis functions, an efficient iterative method for diagonalizing the Kohn–Sham Hamiltonian is needed. Many iterative methods have been developed4,6,97 – 101 and several good reviews on the subject are available in the literature. Two of the more popular algorithms used for plane-wave methods include the conjugate gradient algorithm applied to plane-wave calculations proposed by Teter et al.99 and the residual minimization method direct inversion

WAVEFUNCTION OPTIMIZATION FOR PLANE-WAVE METHODS

97

in the iterative subspace (RMM-DIIS) proposed by Pulay.97,98 A preconditioning scheme is generally used with these methods.4,6,7,99 An important step in the self-consistent field procedure is the generation of a new trial density, ρ(m+1) , from prior input, ρ(m+1) , and output, ρ(m) out , densities. A simpleminded iteration, ρ(m+1) = ρ(m) out

(3.68)

in which the input density is replaced by the output density will usually result in the development of charge oscillations which cause the algorithm to diverge. The simplest way to control these oscillations is to dampen them during the iteration process by a simple mixing algorithm, ρ(m+1) = (1 − α)ρ(m) + αρ(m) out

(3.69)

where α is a parameter between [0,1]. In many cases convergence can be achieved by using a suitable choice of α (e.g., 0.1 ≤ α ≤ 0.5). Several other iteration schemes have been developed besides simple mixing.6,97,102 – 113 3.6.2 Direct Methods

An alternative approach is to treat the DFT energy functional as an optimization problem and minimize it directly.4,7,114 – 116 Interest in this method began with the introduction of the Car–Parrinello algorithm.3 These methods stand out in that they rarely, if ever, fail to achieve self-consistency. The simplest of this class of methods is the fixed-step steepest descent algorithm, which is effectively the Car–Parrinello algorithm (see Section 3.7) with the velocity set to zero at every step in the iteration. Orthonormality constraints are handled by Lagrange multipliers. A significantly more powerful approach is the conjugate gradient method on the Grassmann manifold developed by Edelman et al.117 This method is very fast and has been shown to demonstrate superlinear speedup near the minimum. In this algorithm, the set of wavefunctions ψi are written in terms of a tall and skinny N basis × N e matrix: ⎤ ⎡ ψ1 (φ1 ) ψ2 (φ1 ) ··· ψNe (φ1 ) ⎢ ψ1 (φ2 ) ψ2 (φ2 ) ··· ψNe (φ2 ) ⎥ ⎥ ⎢ ⎢ ψ1 (φ3 ) ψ2 (φ3 ) ··· ψNe (φ3 ) ⎥ (3.70) Y =⎢ ⎥ ⎥ ⎢ .. .. .. .. ⎦ ⎣ . . . . ψ1 (φNbasis ) ψ2 (φNbasis ) · · · ψNe (φNbasis ) where the matrix is written in terms of the orthonormal basis φj (r) (or eiGj ·r for a plane-wave basis) by

Nbasis

ψi ( r ) =

j =1

ψi (φj )φj ( r )

(3.71)

98

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

and obeys the orthogonality constraint Y t Y = I . The following steps illustrate this algorithm: 1. Given an initial Y0 such that Y0t Y0 = I , calculate the tangent residual: G0 = (1 −

Y0 Y0t )

δE δY t

Y =Y0

2. Set H0 = −G0 and Enew = Etotal (Y0 ). 3. Find the compact singular value decomposition of H0 : H0 → U V t 4. Minimize Etotal (Y (θ)) in the following geodesic line parameterized by θ: Y (θ) = Y V cos (θ) V t + U sin (θ) V t 5. Set Y1 = Y (θ), Eold = Enew , and Enew = Etotal (Y1 ). 6. Calculate the tangent residual: δE t G1 = (1 − Y1 Y1 ) δY t Y =Y1 7. Parallel-transport the previous search direction along the geodesic: T0 = [−Y0 V sin (θ) + U cos (θ)] V t 8. Compute the new search direction, H1 = −G1 +

Tr[G1 , G1 ] T0 Tr[G0 , G0 ]

9. Set Y0 = Y1 , G0 = G1 , and H0 = H1 . 10. If Eold − Enew > tolerance, go to step 3. 3.7 CAR–PARRINELLO MOLECULAR DYNAMICS

The development of fast and efficient ab initio molecular dynamics methods (AIMD), such as Car–Parrinello molecular dynamics,3 has opened the door to the study of strongly interacting many-body systems by direct dynamics simulation without the introduction of empirical interactions. In AIMD simulations the electronic degrees of freedom are continuously updated at each step in the simulation and all the changes in the electronic structure are properly accounted for. The forces are calculated as derivatives of the total energy calculated with respect to the atomic positions. Hence, the dynamical simulation automatically includes

CAR–PARRINELLO MOLECULAR DYNAMICS

99

all many-body interactions and effects, such as changes in coordination, bond saturation, and polarization. Applications for this first-principles method include the calculation of free energies, search for global minima, explicit simulation of solvated molecules, and so on. This important generalization of molecular dynamics methods to include the essential physics of the interactions of complex systems comes at a considerable price. However, with present-day algorithms and parallel supercomputers, simulations of hundreds atoms for a time scale of several picoseconds are feasible. Although this is far less, both in numbers of particles and in time, than is possible with conventional MD, AIMD simulations might be the only option for systems with complex chemistry where even qualitative interpretation requires proper description of interatomic interactions. In the Car–Parrinello version of AIMD the electronic and ionic degrees of freedom are updated simultaneously. This is accomplished by introducing a fictitious electronic kinetic energy functional ˙ ∗i (r)ψ ˙ i (r) dr μ ψ (3.72) KE({ψi }) = 12 i

where μ is a fictitious mass assigned to electron degrees of freedom. The equations of motion for the ion, RI , and the Kohn–Sham orbitals, ψi , are found by taking the first variation of the auxiliary Lagrangian: 1 ˙ I |2 ˙ ∗i (r)ψ ˙ i (r) dr + 1 μ ψ MI |R L({ψi }, {RI }) = 2 2 i I − Etotal ({ψi }, {RI }) + ψ∗i (r)ψj (r) dr − δi,j j,i i,j

(3.73) The resulting equations of motion are ¨ i (r) = −H ψi (r) + μψ

ψj (r)j,i

(3.74)

j

¨ I = FI MI R

(3.75)

δEtotal = H ψi (r) δψ∗i (r)

(3.76)

where

Given the equations of motion (Sections 3.3.3 and 3.3.4), the electronic and ionic degrees of freedom can be integrated using the Verlet algorithm: ⎡ ⎤ 2 (t) ⎣−H ψti (r) + ⎦ (3.77) (r) = 2ψti (r) − ψt−t (r) + ψtj (r)t+t ψt+t i i j,i μ j

100

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

Rt+t = 2RtI − Rt−t + I I

(t)2 FI MI

(3.78)

is determined by the orthogonality constraint The matrix t+t j,i

(r)ψt+t (r) dr = δi,j ψ∗t+t i i

(3.79)

This constraint yields the matrix Riccatti equation [to simplify the following t (r) + equations the following symbols are used: ψi (r) = 2ψti (r) − ψt−t i t 2 2 (t /μ)H ψi (r), α = t /μ]:

ψ∗t+t (r)ψt+t (r) dr i j ! ∗t ψ∗t i,k ψ∗t = i (r) − α ψi (r) + k (r)

I=

k

ψtj (r)

−α

t ψj (r)

+

!

ψtl (r)l,j

dr

l

= A + Xt B + B t X + Xt CX

(3.80)

where Xij = αij and the matrices Ai,j , Bi,j , and Ci,j are given by Aij = Bij = Cij =

∗t

t

t {ψ∗t i (r) − α[ψi (r)]}{ψj (r) − α[ψj (r)]} dr t

(3.81)

t [ψ∗t i (r)]{ψj (r) − α[ψj (r)]} dr

(3.82)

t ψ∗t i (r)ψj (r) dr

(3.83)

Bl¨ochl28 suggested the following iteration for solving this matrix Riccatti equation: A(0) = A (n)

A

=A

(n+1) = Xrs

(n−1)

(3.84) +X

(n−1)t

B + BX

(n−1)

t Urit Uij (A(n) j k − δj k )Ukl Uls i,j,k,l

bi + bl

+X

(n−1)t

CX

(n−1)

(3.85) (3.86)

where the eigenvalues b and the unitary matrix U are obtained from diagonalizing Uilt bl Ulj . Bij = l

PARALLELIZATION

101

3.8 PARALLELIZATION

During the course of a total energy minimization or molecular dynamics simulation the electron gradient δEtotal /δψ∗i [Eq. (3.37)] needs to be calculated as efficiently as possible. For a pseudopotential plane-wave calculation the main parameters that determine the cost of a calculation are Ng , Ne , Na , and Nproj , where Ng is the size of the three-dimensional FFT grid, Ne is the number of occupied orbitals, Na is the number of atoms, and Nproj is the number of projectors per atom. In most plane-wave DFT programs the solution of eigenvalue equations is typically approached by means of a conjugate gradient algorithm or, for dynamics, a Car–Parrinello algorithm that requires many evaluations of the electron gradient. The operation counts for each part of the electron gradient are shown in Fig. 3.4. The three (or four) major computational pieces of the gradient are: 1. The Hartree potential VH , including the local exchange and correlation potentials Vx + Vc . The main computational kernel in these computations is the calculation of Ne three-dimensional FFTs. 2. The nonlocal pseudopotential, VˆNL . The major computational kernel in this computation can be expressed by the following matrix multiplications: W = Pt · Y, and Y2 = P · W, where P is an Ng × (Nproj · Na ) matrix, Y and Y2 are Ng × Ne matrices, and W is an (Nproj · Na ) × Ne matrix. We note that for most pseudopotential plane-wave calculations, Nproj · Na ≈ Ne . 3. Enforcing orthogonality. The major computational kernels in this computation are following matrix multiplications: S = Yt · Y and Y2 = Y · S, where Y and Y2 are Ng × Ne matrices, and S is an Ne × Ne matrix. 4. When exact exchange is included, the exact exchange operator Kij ψj . The major computational kernel in this computation involves the calculation of (Ne +1) · Ne three-dimensional FFTs.

Fig. 3.4 (color online)

Operation count of H ψ in a plane-wave DFT simulation.

102

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

There are several ways to parallelize a plane-wave Hartree–Fock and DFT program.7,9,11,12,15,18 For many solid-state calculations the computation can be distributed over the Brillouin zone sampling space.11 This approach cannot be used for -point (k = 0) calculations with large unit cells. Another approach is to distribute the one-electron orbitals across processors.12 The drawback of this method is that orthogonality parts of the computation will involve a lot of message passing. Furthermore, this method will not work for simulations with very large cutoff energy requirements (i.e., using large numbers of plane waves to describe the one-electron orbitals) on parallel computers that have nodes with a small amount of memory, because a complete one-electron must be stored on each node. Hence this approach is not practical for Car–Parrinello simulations with large unit cells; however, this approach can work well for simulations with modest-size unit cells and with small cutoff energies, when used in combination with minimization algorithms that perform orthogonalization sparingly (e.g., RMM-DIIS). Another straightforward way is to do a spatial decomposition of the oneelectron orbitals.7,9,15 This approach is versatile, easily implemented, and is well suited for performing Car–Parrinello simulations with large unit cells and cutoff energies. However, a parallel three-dimensional fast Fourier transform (FFT) 1/3 must be used, which is known not to scale beyond ∼Ng processors (or processor groups), where Ng is the number of FFT grid points. In Fig. 3.5, an example of timings versus the number of CPUs for this type of parallelization is shown. These simulations were taken from a Car–Parrinello simulation of the hydrated uranyl cation UO2 2+ + 122H2 O using the plane-wave DFT module (PSPW) in NWChem.118 These calculations were performed on all four cores on the quadcore Cray-XT4 system (NERSC Franklin), composed of a 2.3-GHz single-socket quad-core AMD Opteron processor (Budapest). The NWChem program was compiled using a Portland Group FORTRAN 90 compiler, version 7.2.4, linked with the Cray MPICH2 library, version 3.02,for message passing. The performance of the program is reasonable with an overall parallel efficiency of 84% on 128 CPUs, dropping to 26% on 1024 CPUs. However, not every part of the program scales in exactly the same way. For illustrative purposes, the timings of the FFTs, nonlocal pseudopotential, and orthogonality are also shown. The efficiency of the FFTs are by far the biggest bottleneck in this implementation. At smaller processor sizes the inefficiency of the FFTs are damped out, due to the fact that these parts of the code make up less than 5% of the overall computation, and the largest part of the calculation is the nonlocal pseudopotential evaluation. Ultimately, however, the lack of scalability of the three-dimensional FFT algorithm 1/3 beyond the ∼Ng processor prevails, causing the simulation not to speed up. Recently, Gygi et al. have come up with an approach that can be used to improve the overall efficiency of a plane-wave DFT program.18 In this approach, both the spatial and the orbitals are distributed in a two-dimensional processor geometry, as shown in Fig. 3.6. Using simple scaling arguments, it can be shown that with this decomposition the algorithms will require only O(log(p1 ) + O(log(p2 ) communications per CPU as opposed to O(log(P )) communications per CPU for algorithms in which only the spatial or orbital dimensions are

PARALLELIZATION

103

Fig. 3.5 (color online) Overall and component timings and component from AIMD simulations of UO2+ 2 + 122H2 O using one-dimensional processor geometry. Overall best timings are also shown for a two-dimensional processor grid. Timings are from calculations on the Franklin Cray-XT4 computer system at NERSC.

Fig. 3.6 (color online) Parallel distribution (shown on the left), implemented in most plane-wave DFT software. Each of the one-electron orbitals is identically spatially decomposed. The two-dimensional parallel distribution suggested by Gygi et al.18 is shown on the right.

104

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

distributed (a processor for where the total number of processors, P , can be written as P = p1 p2 ). The overall performance of our plane-wave DFT simulations were found to improve considerably using this new approach. Using the optimal processor geometries, the running time per step took 2699 s (45 min) for 1 CPU down to 3.7 s with a 70% parallel efficiency on 1024 CPUs. The fastest running time found was 1.8 s with 36% parallel efficiency on 4096 CPUs. As shown in Fig. 3.7, these timings were found to be very sensitive to the layout of the two-dimensional processor geometry. For 256, 512, 1024, and 2048 CPUs, the optimal processor geometries were 64 × 4, 64 × 8, 128 × 8, and 128 × 16 processor grids, respectively. The timings of the FFTs, nonlocal pseudopotential, and orthogonality are also shown in Fig. 3.7. Not every part of the program scaled perfectly. The parallel efficiency of several other key operations depends strongly on the shape of the processor geometry. It was found that distributing the processors over the orbitals significantly improved the efficiency of the FFTs and the nonlocal pseudopotential, while distributing the processors over the spatial dimensions favored the orthogonality computations. The two-dimensional processor geometry method can also be used to parallelize the computation of the exact exchange operator. This operator has a cost of O(Ne 2 · Ng · log(Ng )), and when it is included in a plane-wave DFT calculation it is by far the most demanding term. The exchange term is well suited for this method. Whereas if only the spatial or orbital dimensions are distributed, the exchange term does not scale well. When only the spatial dimensions are distributed, each of the Ne (Ne + 1) FFT are computed one at a time, using the entire machine for each evaluation The drawback of this approach is that we are underutilizing the resources; parallel efficiency is effectively bounded to ∼Ng 1/3 processors. When only the orbital dimensions are distributed, the parallelization is realized by multicasting the O(Ne ) orbitals to set up the O(Ne 2 ) wavefunction pairs. This multicast is followed by a multireduction which reverses the pattern. We note that with this type of

Fig. 3.7 (color online) Overall and component timings in seconds for UO2+ 2 + 122H2 O plane-wave DFT simulations at various processor sizes (Np ) and processor grids (nj , ni = Np /nj ). Timings are from calculations on the Franklin Cray-XT4 computer system at NERSC.

PARALLELIZATION

105

algorithm one could readily fill a very large parallel machine by assigning each a few FFTs to each processor. However, to obtain reasonable performance from this algorithm it is vital to mask latency, since the interconnects between the processors will be flooded with O(Ne ) streams, each on long messages comprising Ng floating-point words of data. When both the spatial and orbital dimension are distributed, only the parallel three-dimension FFTs along the processor grid columns need to be computed. Compared with a multicast across all processors the benefit of this approach is to reduce latency costs, since broadcasting is done across the rows of the two-dimensional processor grid only. The overall best timings for hybrid-DFT calculations of an 80-atom supercell of hematite (Fe2 O3 ) with an FFT grid of Ng = 723 (Ne up = 272, Ne down = 272), and a 160-atom supercell of hematite (Fe2 O3 ) with an FFT grid of Ng = 144 × 72 × 72 (Ne up = 544 and Ne down = 544) (wavefunction cutoff energy = 100 Ry and density cutoff energy = 200 Ry) and orbital occupations of Ne up = 272 and Ne down = 272 are shown in Fig. 3.8. The overall best timing per step found for the 80-atom supercell was 3.6 s on 9792 CPUs, and for the 160-atom supercell

Fig. 3.8 (color online) Overall fastest timings taken for an 80- and 160-atom Fe2 O3 hybrid-DFT energy calculations. Timings are from calculations on the Franklin Cray-XT4 computer system at NERSC.

106

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

of hematite, was 17.7 s on 23,936 CPUs. The timings results are somewhat uneven, since limited numbers of processor grids were tried at each processor size. However, even with this limited amount of sampling, these calculations were found to have speedups to at least 25,000 CPUs. We expect that further improvements will be obtained by trying more processor geometry layouts. 3.9 AIMD SIMULATIONS OF HIGHLY CHARGED IONS IN SOLUTION

An understanding of the structure and dynamics of the water molecules in the hydration shells surrounding ions is essential to the interpretation of many chemical processes in aqueous solutions. X-ray and neutron scattering results have been reported which provide direct results about shell structure for many ionic species.119,120 Information about the dynamics of water molecules in this region has also been obtained from other probes, such as NMR, infrared spectroscopy, and inelastic neutron scattering.119,120 For singly charged ions (Na+ , Li+ ), a structured first hydration shell can be identified. The residence time in this shell is short (e.g. j

A −r /F 1 − e ij σi ,σj rij

(4.8)

Here rij is the distance between the N electrons i and j , the σ subscript is the spin label, and the parameter F is√chosen so that √ the electron–electron cusp conditions are obeyed (i.e., F↑↑ = 2A and F↑↓ = A). The value of A could be optimized using variance minimization or whatever. For systems with both electrons and nuclei present, one can write a standard Jastrow with all three terms (ignoring the spin dependence for clarity) as follows: J (R, {rI }) =

N i >j

u(rij ) +

NI N i=1 I =1

χI (riI ) +

NI N

fI (rij , riI , rj I )

(4.9)

i > j I =1

Other terms, such as an extra plane-wave expansion in the electron–electron separation for periodic systems or an additional three-body term, are part of our standard Jastrow54 and can be useful in certain circumstances but are not usually necessary. For particles with attractive interactions one finds that the usual Slater–Jastrow form is not appropriate, and in order to get a better description of exciton formation one might use a determinant of “pairing orbitals” instead.57 A further recent advance by members of our group has been the development of a completely general functional form for the Jastrow factor which allows the inclusion of arbitrary higher-order terms (depending on, for example, the separation of four or more particles); this has now been implemented in our code.58 To convince yourself that the Slater–Jastrow function is doing what it should, consider Fig. 4.2. These are the results of simple VMC calculations of the spindependent pair correlation function (PCF) in a silicon crystal with an electron fixed at a bond center.21 The figure on the left is for parallel spins and corresponds to the Fermi or exchange hole. The figure on the right is for antiparallel spins and corresponds to the correlation hole; note that the former is much wider and deeper than the latter. We have here then a pretty illustration of the different levels of theory that we use. In Hartree theory (where we use a Hartree product of all the orbitals as a wavefunction, and which thus corresponds to entirely uncorrelated electrons), both PCFs would have a value of 1 everywhere. In Hartree–Fock theory, the left-hand plot would look very similar, but the antiparallel PCF on the right would be 1 everywhere. The energy lowering over Hartree theory caused by the fact that parallel spin electrons tend to avoid each other is essentially the exchange energy, which correctly has a negative sign. It is slightly sobering to note that the entire apparatus of quantum chemistry (an expansion in billions of determinants) is devoted to modeling the little hole on the right and thereby evaluating the correlation energy. In QMC our quantum of solace comes from

WAVEFUNCTIONS AND THEIR OPTIMIZATION

133

Fig. 4.2 (color online) VMC plots of the pair correlation function for (on the left) parallel spins and (on the right) antiparallel spins using a Slater–Jastrow wavefunction. The data are shown for crystalline silicon in the (110) plane passing through the atoms and shows the pair correlation functions around a single electron fixed at a bond center. The atoms and bonds in the (110) plane are represented schematically. (From Ref. 20, with permission. Copyright © 1997 by The American Physical Society.)

our compact representation; with a Slater–Jastrow function we can do the same thing in VMC using a simple polynomial expansion involving a few tens of parameters, and if this is not accurate enough we can make the necessary minor corrections to it using the DMC algorithm. However, we do not know a priori what the shape of the hole is, and we must therefore optimize the various parameters in the Slater–Jastrow function in order to find out. The usual procedure is to leave the Slater determinant part alone and optimize the Jastrow factor. With a full inhomogeneous Jastrow such as that of Eq. (4.9), we generally optimize the coefficients of the various polynomial expansions (which appear linearly in the Jastrow factor) and the cutoff radii of the various terms (which are nonlinear). The linearity or otherwise of these terms clearly has a bearing on their ease of optimization. There is, of course, no absolute prohibition on optimizing the Slater part and one might also envisage, for example, optimization of the coefficients of the determinants of a multideterminant wavefunction, or even the orbitals in the Slater determinants themselves (although the latter is quite difficult to do in general, and often pointless). A higher-order technique called backflow , to be explained in a subsequent section, also involves functions with optimizable parameters. We thus turn our attention to the technicalities of the optimization procedure. Now, optimization of the wavefunction is clearly a critical step; it is also a numerically difficult one. It is apparent that the parameters appear in many different contexts, they need to be optimized in the presence of noise, and there can be a great many of them. As has already been stated, there are two basic

134

QUANTUM MONTE CARLO

approaches. Until recently, the most widely used was the optimization of the variance of the energy, [Tα (R)]2 [ELα (R) − EVα ]2 dR 2 (4.10) σE (α) = α 2 [T (R)] dR where EV is the variational energy, with respect to the set of parameters {α}. Now, of course, there is no reason that one may not optimize the energy directly, and because wavefunctions corresponding to the minimum energy turn out to have more desirable properties, this has become the preferred approach in the last few years. Historically, variance minimization was much more widely used60,61 —not just for trivial reasons such as the variance having a known lower bound of zero—but most important because of the difficulties encountered in designing a robust, numerically stable algorithm to minimize the energy, particularly in the case of large systems. First, I briefly summarize how a simple variance minimization is done. Beginning with an initial set of parameters α0 (generated, for example, simply by 2 (α) with zeroing the Jastrow polynomial coefficients), we proceed to minimize σE respect to them. A correlated-sampling approach turns out to be most efficient. α First, a set of some thousands of configurations distributed according to |T 0 |2 is generated. Practically speaking, a configuration in this sense is just a snapshot of the system taken at intervals during a preliminary VMC run and consists of the current particle positions and the associated interaction energies written on a line of a file. We then calculate the variance in the energies for the fully sampled set of configurations. This is the objective function to be minimized. Now, unfortunately, every time we modify the parameters slightly, the wavefunction changes and our configurations are no longer distributed according to the square α of the current Tα , but to the square of the initial wavefunction T 0 . In principle, therefore, we should regenerate the configurations, a relatively expensive procedure. The correlated sampling is what allows us to avoid this; we reuse the initial set of configurations simply by including appropriate weights w in the formula for the variance: α [T 0 (R)]2 wαα0 [ELα (R) − EV (α)]2 dR 2 (α) = (4.11) σE α [T 0 (R)]2 wαα0 dR where

EV (α) =

α

[T 0 (R)]2 wαα0 ELα (R) dR α [T 0 (R)]2 wαα0 dR

(4.12)

WAVEFUNCTIONS AND THEIR OPTIMIZATION

135

α

and the weight factors wα 0 are given simply by wαα0 =

[Tα (R)]2 α [T 0 (R)]2

(4.13)

2 (α) is minimized. This may be done The parameters α are then adjusted until σE using standard algorithms which perform an unconstrained minimization of a sum of m squares of functions that contain n variables (where m ≥ n) without requiring the derivatives of the objective function (see, e.g., Ref. 59). Although in principle we do not need to regenerate the configurations at all, one finds in practice that it usually pays to recalculate them occasionally when the wavefunction strays very far from its initial value. Generally, this needs to be done only a couple of times before we obtain complete convergence within the statistical noise. There is a problem, however. Thus far we have described the optimization of what is known as the reweighted variance. In the limit of perfect sampling, the reweighted variance is equal to the actual variance, and is therefore independent of the initial parameters and the configuration distribution, so that the optimized parameters would not change over successive cycles. The problem arises from the fact that the weights may vary rapidly as the parameters change, especially for large systems. This can lead to severe numerical instabilities. For example, one or a few configurations acquire an exceedingly large weight, incorrectly reducing the estimate of the variance almost to zero. Somewhat surprisingly, perhaps, it usually turns out that the best solution to this is to do without the weights at all; that is, we minimize the unreweighted variance. We can do this because the minimum value of the variance (zero) is obtained only if the local energy is constant throughout configuration space, and this is possible only for eigenfunctions of the Hamiltonian. This procedure turns out to have a number of advantages beyond improving the numerical stability. The self-consistent minimum in the unreweighted variance almost always turns out to give lower energies than the minimum in the reweighted variance. (For some examples of this for model systems, see Ref. 62.) It was recognized only relatively recently62 that one can obtain a huge speedup in the optimization procedure for parameters that occur linearly in the Jastrow, that is, for Jastrows expressible as α αn fn (R). These are the most important optimizable parameters in almost all wavefunctions that we use. The reason this can be done is that the unreweighted variance can be written analytically as a quartic function of the linear parameters. This function usually has a single minimum in the parameter space, and as the minima of multidimensional quartic functions may be found very rapidly, the optimization is extraordinarily efficient compared to the regular algorithm, in particular because we no longer need to generate large numbers of configurations to evaluate the variance. The main nonlinear parameters in the Jastrow factor are the cutoff lengths where the function is constrained to go to zero. These are important variational parameters, and some attempt to optimize them should always be made. We normally recommend that

136

QUANTUM MONTE CARLO

a (relatively cheap) calculation using the standard variance minimization method should be carried out in order to optimize the cutoff lengths, followed by an accurate optimization of the linear parameters using the fast minimization method. For some systems, good values of the cutoff lengths can be supplied immediately (e.g., in periodic systems at high density with small simulation cells, the cutoff length Lu should be set equal to the Wigner–Seitz radius of the simulation cell). Let us now move on to outlining the theory of energy minimization. We know that except in certain trivial cases the usual trial wavefunction forms cannot in general provide an exact representation of energy eigenstates. The minima in the energy and variance therefore do not coincide. Energy minimization should thus produce lower VMC energies, and although it does not necessarily follow that it produces lower DMC energies, experience indicates that more often than not, it does. It is also normally stated that the variance of the DMC energy is more or less proportional to the difference between the VMC and DMC energies,63,64 so one might suppose that energy-optimized wavefunctions may be more efficient in DMC calculations. For a long time, efficient energy minimization with QMC was extremely problematic. The methods that have now been developed are based on a well-known technique for finding approximations to the eigenstates of a Hamiltonian. One expands the wavefunction in some set of basis states, T (R) = N i=1 ai φi (R). Following calculation of the Hamiltonian and overlap = φi |Hˆ |φj and Sij = φi |φj , the two-sided eigenprobmatrix elements, Hij lem j Hij aj = E j Sij aj may be solved through standard diagonalization techniques. People have tried to do this in QMC directly,65 but it is apparent that the number of configurations used to evaluate the integrals converges slowly because of statistical noise in the matrix elements. As shown in Ref. 66, however, far fewer configurations are required if the diagonalization is first reformulated as a least-squares fit. Let us assume that the result of operating with Hˆ on any basis state φi is just some linear combination of all the functions φi (technically speaking, the set {φi } is then said to span an invariant subspace of Hˆ ). We may thus write (for all i) Hˆ φi (R) =

N

Aij φj (R)

(4.14)

j =1

To compute the required eigenstates and associated eigenvalues of Hˆ , we then simply diagonalize the Aij matrix. Within a Monte Carlo approach we could evaluate the φi (R) and Hˆ φi (R) for N uncorrelated configurations generated by a VMC calculation and solve the resulting set of linear equations for the Aij . For problems of interest, however, the assumption that the set {φi } spans an invariant subspace of Hˆ does not hold, and there exists no set of Aij that solves Eq. (4.14). If we took N configurations and solved the set of N linear equations, the values of Aij would depend on which configurations had been chosen. To overcome this problem, a number of configurations M N is sampled to obtain

DIFFUSION MONTE CARLO

137

an overdetermined set of equations which can be solved in a least-squares sense using the singular value decomposition technique. In Ref. 66 it is recommended that Eq. (4.14) be divided by T (R) so that in the limit of perfect sampling the scheme corresponds precisely to standard diagonalization. The method of Ref. 66 is pretty good for linear parameters. How might we generalize it for nonlinear parameters? The obvious way is to consider the basis of the initial trial wavefunction (φ0 = T ) and its derivatives with respect to the variable parameters, φi = ∂T /∂ai |a 0 . The simplest such algorithm is, in fact, i unstable, and this turns out to be because the implied first-order approximation is often not good enough. To overcome this problem, Umrigar et al. introduced a stabilized method67,68 that works well and is quite robust (the details need not concern us here). The VMC energies given by this method are usually slightly lower than those obtained from variance minimization. David Ceperley once asked: “How many graduate students’ lives have been lost optimizing wavefunctions?”69 That was in 1996. To give a more twentyfirst century feeling for the time scale involved in optimizing wavefunctions, I can tell you about the weekend a few years back when I added the entire G2-1 set70,71 to the examples included with the CASINO distribution. This is a standard set of 55 molecules with various experimentally well-characterized properties intended for benchmarking of different quantum chemistry methods (see, e.g., Ref. 72). Grossman has published the results of DMC calculations of these molecules using pseudopotentials,16 and we have now done the same with all-electron calculations.73,74 It took a little over three days using only a few single-processor workstations to create all 55 sets of example files from scratch, including optimizing the Jastrow factors for each molecule. Although if one concentrated very hard on each individual case, one might be able to pull a little more energy out of a VMC simulation, the optimized Jastrow factors were all good enough to be used as input to DMC simulations. The entire procedure of variance minimization can be, and in CASINO is, thoroughly automated, and provided that a systematic approach is adopted, optimizing VMC wavefunctions is not the complicated time-consuming business that it once was. This is certainly the case if one requires the optimized wavefunction only for input into a DMC calculation, in which case one need not be overly concerned with lowering the VMC energy as much as possible. I suggest that the process is sufficiently automated these days that graduate students are better employed elsewhere; certainly we have not suffered any fatalities here in Cambridge. 4.4 DIFFUSION MONTE CARLO

Let us imagine that we are ignorant, or have simply not been paying attention in our quantum mechanics class, and that we believe that the wavefunction of the hydrogen atom has the shape of a big cube centered on the nucleus. If we tried to calculate the expectation value of the Hamiltonian using VMC, we would obtain an energy that was substantially in error. What DMC does, in essence, is to automatically correct the shape of the guessed square box wavefunction so

138

QUANTUM MONTE CARLO

that it looks like the correct exponentially decaying one before calculating the expectation value. In principle it can do this even though our formula for the VMC wavefunction that we have spent so long justifying turns out not to have enough variational freedom to represent the true wavefunction. This is clearly a nice trick, particularly when—as is more usual—we have very little practical idea of what the exact many-electron wavefunction looks like. As one might expect, the DMC algorithm is necessarily rather more involved than that for VMC. I think that an approachable way of understanding it is to focus on the properties of quantum mechanical propagators, so we begin by reminding ourselves about these. Let’s say that we wish to integrate the time-dependent Schr¨odinger equation, i

2 2 ∂(R, t) =− ∇ (R, t) + V (R, t)(R, t) = Hˆ (R, t) ∂t 2m

(4.15)

where R = {r1 , r2 , . . . , rN }, V is the potential energy operator, and ∇ = (∇1 , ∇2 , . . . , ∇N ) is the 3N -dimensional gradient operator. Integrating this is equivalent to wanting a formula for , and to find this, we must invert this differential equation. The result is an integral equation involving the propagator K: (R, t) =

K(R, t; R , t )(R , t ) dR

(4.16)

The propagator is interpreted as the probability amplitude for a particle to travel from one place to another (in this case, from R to R) in a given time t − t . It is a Green’s function for the Schr¨odinger equation. We see that the probability amplitude for a particle to be at R sometime in the future is given by the probability amplitude of it traveling there from R —which is just K(R, t; R , t )—weighted by the probability amplitude of it actually starting at R in the first place—which is (R , t )—summed over all possible starting points R . This is a straightforward concept. How might we calculate the propagator? A typical way might be to use the Feynman path-integral method. For given start and end points R and R, one gets the overall amplitude by summing the contributions of the infinite number of all possible “histories” or paths that include those points. It doesn’t matter why for the moment (look it up!), but the amplitude contributed by a particular history is proportional to eiScl / where Scl is the classical action of that history (i.e., the time integral of the classical Lagrangian 12 mv 2 − V along the corresponding phasespace path of the system). The full expression for the propagator in Feynman’s method may then be written as K F (R, t; R , t ) = N

all paths

exp

t i Lcl (t

) dt

t

(4.17)

DIFFUSION MONTE CARLO

139

An alternative way to calculate the propagator is to use the de Broglie–Bohm pilot-wave interpretation of quantum mechanics,52 where the electrons both objectively exist and have the obvious definite trajectories derived from a straightforward analysis of the streamlines of the quantum mechanical probability current. From this perspective we find that we can achieve precisely the same result as that obtained using the Feynman method, by integrating the quantum Lagrangian Lq (t) = 12 mv 2 − (V + Q) along precisely one path—the path that the electron actually follows—as opposed to linearly superposing amplitudes obtained from the classical Lagrangian associated with the infinite number of all possible paths. Here Q is the quantum potential , which is the potential energy function of the quantum force (the force that the wave field exerts on the electrons). It is easy to show that the equivalent pilot-wave propagator is

t 1 i

exp Lq (t ) dt K (R, t; R , t ) = J (t)12 t B

(4.18)

where J is a simple Jacobian factor. This formula should be contrasted with Eq. (4.17). One should also note that because de Broglie–Bohm trajectories do not cross, one need not sum over all possible starting points R to compute (R, t)—one simply uses the R that the unique trajectory passes through. What is the connection of all this with the diffusion Monte Carlo method? Well, in DMC an arbitrary starting wavefunction is evolved using a (Green’s function) propagator just like the ones we have been discussing. The main difference is that the propagation occurs in imaginary time τ = it as opposed to real time t. For reasons that will shortly become apparent, this has the effect of “improving” the wavefunction (i.e., making it look more like the ground state as imaginary time passes). For technical reasons, it also turns out that the propagation has to take place in a sequence of very short hops in imaginary time, so our evolution equation now looks like this: (R, τ + δτ) =

K DMC (R, R , δτ)(R , τ) dR

(4.19)

The evolving wavefunction is not represented in terms of a basis set of known analytic functions but by the distribution in space and time of randomly diffusing electron positions over an ensemble of copies of the system (“configurations”). In other words, the DMC method is a stochastic projector method whose purpose is to evolve or project out the solution to the imaginary-time Schr¨odinger equation from an arbitrary starting state. We shall write this equation—which is simply what you get by taking the regular time-dependent equation and substituting τ for the time variable it —in atomic units as −

1 ∂ DMC (R, τ) = − ∇ 2 (R, τ) + (V (R) − ET )(R, τ) ∂τ 2

(4.20)

140

QUANTUM MONTE CARLO

Here the real variable τ measures the progress in imaginary time, and for purposes to be revealed presently, I have included a constant ET , an energy offset to the zero of the potential which affects only the wavefunction normalization. How, then, does propagating our trial function in imaginary time “improve” it? For eigenstates, the general solution to the usual time-dependent Schr¨odinger ˆ equation is clearly φ(R, t) = φ(R, 0)e−i(H −ET )t . By definition, we may expand an arbitrary “guessed” (R, t) in terms of a complete set of these eigenfunctions of the Hamiltonian Hˆ : (R, t) =

∞

cn φn (R)e−i(En −ET )t

(4.21)

n=0

On substituting it with imaginary time τ the oscillatory time dependence of the complex exponential phase factors becomes an exponential decay: (R, τ) =

∞

cn φn (R)e−(En −ET )τ

(4.22)

n=0

Let us assume that our initial guess for the wavefunction is not orthogonal to the ground state (i.e., c0 = 0). Then if we magically choose the constant ET to be the ground-state eigenvalue E0 (or, in practice, keep very tight control of it through some type of feedback procedure), it is clear we should eventually get imaginary-time independence of the probability distribution, in the sense that as τ → ∞, our initial (R, 0) comes to look more and more like the stationary ground state φ0 (R) as the contribution of the excited-state eigenfunctions dies away: (R, τ) = c0 φ0 +

∞

cn φn (R)e−(En −E0 )τ

(4.23)

n=1

So now we know why we do this propagation. How, in practice, do we find an expression for the propagator K? Consider now the imaginary-time Schr¨odinger equation in two parts: 1 ∂(R, τ) = ∇ 2 (R, τ) ∂τ 2 ∂(R, τ) = −(V (R) − ET )(R, t) ∂τ

(4.24) (4.25)

These two formulas have the form of the usual diffusion equation and of a rate equation with a position-dependent rate constant, respectively. The appropriate propagator for the diffusion equation is well known; it is a 3N -dimensional Gaussian with variance δτ in each dimension. The propagator for the rate equation is also known; it gives a branching factor which can be interpreted as a positiondependent weight or stochastic survival probability for a member of an ensemble.

DIFFUSION MONTE CARLO

141

Multiplying the two together to get the following propagator for the imaginarytime Schr¨odinger equation is an approximation, the short-time approximation, valid only in the limit of small δτ (which is why we need to do the evolution as a sequence of short hops): K

DMC

1 |R − R |2 (R, R , δτ) = exp − (2πδτ)3N/2 2δτ

V (R) + V (R ) − 2ET exp −δτ 2

(4.26)

Let us then summarize with a simple example how the DMC algorithm works. If we interpret as a probability density, the diffusion equation ∂/∂τ = 12 ∇ 2 represents the movement of N diffusing particles. If we turn this around, we may decide to represent (x, τ) by an ensemble of such sets of particles. Each member of such an ensemble will be called a configuration. We interpret the full propagator K DMC (R, R , δτ) as the probability of a configuration moving from R to R in a time δτ. The branching factor in the propagator will generally be interpreted as a stochastic survival probability for a given configuration rather than as a simple weight, as the latter is prone to numerical instabilities. This means that the configuration population becomes dynamically variable; configurations that stray into regions of high V have a good chance of being killed (removed from the calculation); in low-V regions, configurations have a high probability of multiplying (i.e., they create copies of themselves, which then propagate independently). It is solely this branching or reweighting that “changes the shape of the wavefunction” as it evolves. So, as we have seen, after a sufficiently long period of imaginary-time evolution, all the excited states will decay away, leaving only the ground-state wavefunction, at which point the propagation may be continued to accumulate averages of interesting observables. As a simple example, consider Fig. 4.3. Here we make a deliberately bad guess that the ground-state wavefunction for a single electron in a harmonic potential well is a constant in the vicinity of the well and zero everywhere else. We begin with seven copies of the system or configurations in our ensemble; the electrons in this ensemble are initially randomly distributed according to the uniform probability distribution in the region where the trial function is finite. The particle distribution is then evolved in imaginary time according to the scheme developed above. The electrons are subsequently seen to become distributed according to the proper Gaussian shape of the exact ground-state wavefunction. It is evident from the figure that the change in shape is produced by the branching factor occasionally eliminating configurations in high-V regions and duplicating them in low-V regions. This “pure DMC” algorithm works very well in a single-particle system with a nicely behaved potential, as in the example. Unfortunately, it suffers from two very serious drawbacks which become evident in multiparticle systems with divergent Coulomb potentials.

142

QUANTUM MONTE CARLO

Fig. 4.3 Figure 4.3: Schematic illustration of the DMC algorithm for a single electron in a harmonic potential well, showing the evolution of the shape of the wavefunction due to propagation in imaginary time. (From Ref. 5, with permission. Copyright © 2001 by The American Physical Society.)

The first problem arises due to our assumption that is a probability distribution— necessarily positive everywhere—even though the antisymmetric nature of multiparticle fermionic wavefunctions means that it must have both positive and negative parts separated by a nodal surface, that is, a (3N − 1)-dimensional hypersurface on which it has the value zero. One might think that two separate populations of configurations with attached positive and negative weights might get around this problem (essentially, the well-known fermion sign problem), but in practice there is a severe signal-to-noise issue. It is possible to construct formally exact algorithms of this nature which overcome some of the worst practical problems,75 but to date all seem highly inefficient, with poor system-size scaling. The second problem is less fundamental but in practice very severe. The required rate of removing or duplicating configurations diverges when the

DIFFUSION MONTE CARLO

143

potential energy diverges (which occurs whenever two particles are coincident) due to the presence of V in the branching factor of Eq. (4.26). This leads to stability problems and poor statistical behavior. These problems may be dealt with at the cost of introducing the most important approximation in the DMC algorithm: the fixed-node approximation.76 We say, in effect, that particles may not cross the nodal surface of the trial wavefunction T ; that is, there is an infinite repulsive potential barrier on the nodes. This forces the DMC wavefunction to be zero on that hypersurface. If the nodes of the trial function coincide with the exact nodes, such an algorithm will give the exact ground-state energy (it is, of course, well known that the exact de Broglie–Bohm particle trajectories cannot pass through the nodal surface). If the trial function nodes do not coincide with the exact nodes, the DMC energy will be higher than the ground-state energy (but less than or equal to the VMC energy). The variational principle thus applies. To make such an algorithm efficient we must introduce importance sampling, and this is done in the following way. We require that the imaginary-time evolution produces the mixed distribution f = T rather than the pure distribution. Substituting this into the imaginary-time Schr¨odinger equation, Eq. (4.20), we obtain ∂f (R, τ) 1 = − ∇ 2 f (R, τ) + ∇ · [vD (R)f (R, τ)] + (EL (R) − ET )f (R, τ) ∂τ 2 (4.27) where vD (R) is the 3N -dimensional drift velocity vector, defined by −

∇T (R) T (R)

(4.28)

EL (R) = T−1 − 12 ∇ 2 + V (R) T

(4.29)

vD (R) = ∇ ln |T (R)| = and

is the usual local energy. The propagator from R to R for the importance sampled algorithm now looks like this: K DMC (R, R , δτ) =

(R − R − δτF (R ))2 1 exp − (2πδτ)3N/2 2δτ

δτ exp − (EL (R) + EL (R ) − 2ET ) 2

(4.30)

Because the nodal surface of is constrained to be that of T , their product f is positive everywhere and can now be properly interpreted as a probability distribution. The time evolution generates the distribution f = T , where is now the lowest-energy wavefunction with the same nodes as T . This solves

144

QUANTUM MONTE CARLO

the first of our two problems. The second problem of the poor statistical behavior due to the divergences in the potential energy is also solved because the term V (R) − ET in Eq. (4.20) has been replaced by EL (R) − ET in Eq. (4.27), which is much smoother. Indeed, if T was an exact eigenstate, EL (R) − ET would be independent of position in configuration space. Although we cannot in practice find the exact T , it is possible to eliminate the local energy divergences due to coincident particles by choosing a trial function that has the correct cusplike behavior at the relevant points in the configuration space.56 Note that this is all reflected in the branching factor of the new propagator of Eq. (4.30). The nodal surface partitions the configuration space into regions that we call nodal pockets. The fixed-node approximation implies that we are restricted to sampling only those nodal pockets that are occupied by the initial set of configurations, and this appears to introduce some kind of ergodicity concern, since at first sight it seems that we ought to sample every nodal pocket. This would be an impossible task in large systems. However, the tiling theorem for exact fermion ground states77,78 asserts that all nodal pockets are in fact equivalent and related by permutation symmetry; one need therefore only sample one of them. This theorem is intimately connected with the existence of a variational principle for the DMC ground-state energy.78 Other interesting investigations of properties of nodal surfaces have been published.79 – 81 A practical importance-sampled DMC simulation proceeds as follows. First we pick an ensemble of a few hundred configurations chosen from the distribution |T |2 using VMC and the standard Metropolis algorithm. This ensemble is then evolved according to the short-time approximation to the Green’s function of the importance-sampled imaginary-time Schr¨odinger equation [Eq. (4.27)], which involves repeated steps of biased diffusion followed by the deletion and/or duplication of configurations. The bias in the diffusion is caused by the drift vector arising out of the importance sampling, which directs the sampling toward parts of configuration space where |T | is large (i.e., it plays the role of an Einsteinian osmotic velocity). This drift step is always directed away from the node, and ∇T is in fact a normal vector of the nodal hypersurface. After a period of equilibration the excited-state contributions will have largely died out and the configurations start to trace out the probability distribution f (R)/ f (R) dR. We can then start to accumulate averages, in particular the DMC energy. Note that throughout this process the reference energy ET is varied to keep the configuration population under control through a specific feedback mechanism. The initial stages of a DMC simulation— for solid antiferromagnetic NiO crystal with 128 atoms per cell using unrestricted Hartree–Fock trial functions of the type discussed in Refs. 82 and 83—are shown in Fig. 4.4. The DMC energy is given by EDMC =

f (R)EL (R) dR ≈ EL (Ri ) i f (R) dR

(4.31)

DIFFUSION MONTE CARLO

145

1500 1400 1300 1200 POPULATION

1100 1000

0

500

1000

1500

–55.4 –55.5

Local energy (Ha) Reference energy Best estimate

–55.6 –55.7 –55.8 0

500

1000

1500

Number of moves

Fig. 4.4 (color online) DMC simulation of solid antiferromagnetic NiO. In the lower panel, the noisy black line is the local energy after each move, the smoother green line is the current best estimate of the DMC energy, and the red line is ET in Eq. (4.27), which is varied to control the population of configurations through a feedback mechanism. As the simulation equilibrates, the best estimate of the energy, initially equal to the VMC energy, decreases significantly, then approaches a constant, which is the final DMC energy. The upper panel shows the variation in the population of the ensemble during the simulation as walkers are created or destroyed.

This energy expression would be exact if the nodal surface of T were exact, and the fixed-node error is second order in the error in the T nodal surface (when a variational theorem exists78 ). The accuracy of the fixed-node approximation can be tested on small systems and normally leads to very satisfactory results. The trial wavefunction thus limits the final accuracy that can be obtained and it also controls the statistical efficiency of the algorithm. Like VMC, the DMC algorithm satisfies a zero-variance principle (i.e., the variance of the energy goes to zero as the trial wavefunction goes to an exact eigenstate). For other expectation values of operators that do not commute with the Hamiltonian, the DMC mixed estimator is biased and other techniques are required in order to sample the pure distribution.84 – 86 A final point: The necessity of using the fixed-node approximation suggests that the best way of optimizing wavefunctions would be to do it in DMC directly. The nodal surface could then in principle be optimized to the shape that minimizes the DMC energy. The backflow technique discussed in Section 4.5.1 has some bearing on the problem, but the usual procedure involving optimization of the energy or variance in VMC will not usually lead to the optimal nodes in the sense that the fixed-node DMC energy is minimal. The large number of parameters—up to a few hundred—in your typical Slater–Jastrow(-backflow)

146

QUANTUM MONTE CARLO

wavefunction means that direct variation of the parameters in DMC is too expensive (although this has been done, see, e.g., Refs. 87 and 88). Furthermore, we note that optimizing the energy in DMC is tricky for the nodal surface, as the contribution of the region near the nodes to the energy is small. More exotic ways of optimizing the nodes are still being actively developed.89,90

4.5 BITS AND PIECES 4.5.1 More About Wavefunctions, Orbitals, and Basis Sets

Single-determinant Slater–Jastrow wavefunctions often work very well in QMC calculations since the orbital part alone provides a pretty good description of the system. In the ground state of the carbon pseudoatom, for example, a single Hartree–Fock determinant retrieves about 98.2% of the total energy. The remaining 1.8%, which at the VMC level must be recovered by the Jastrow factor, is the correlation energy, and in this case it amounts to 2.7 eV—clearly important for an accurate description of chemical bonding. By definition a determinant of Hartree–Fock orbitals gives the lowest energy of all single-determinant wavefunctions, and DFT orbitals are often very similar to them. These orbitals are not optimal when a Jastrow factor is included, but it turns out that the Jastrow factor does not change the detailed structure of the optimal orbitals very much, and the changes are well described by a fairly smooth change to the orbitals. This can conveniently be included in the Jastrow factor itself. How, though, might we improve on the Hartree–Fock/DFT orbitals in the presence of the Jastrow factor? One might naturally consider optimizing the orbitals themselves. This has been done, for example, with the atomic orbitals of a neon atom by Drummond et al.,91 optimizing a parameterized function that is added to the self-consistent orbitals. This was found to be useful only in certain cases. In atoms one often sees an improvement in the VMC energy but not in DMC, indicating that the Hartree–Fock nodal surface is close to optimal even in the presence of a correlation function. Unfortunately, direct optimization of both the orbitals and the Jastrow factor cannot easily be done for large polyatomic systems because of the computational cost of optimizing large numbers of parameters, so it is difficult to know how far this observation extends to more complex systems. One technique that has been tried92,93 is to optimize the potential that generates the orbitals rather than the orbitals themselves. It was also suggested by Grossman and Mitas94 that another way to improve the orbitals over the Hartree–Fock form is to use a determinant of the natural orbitals, which diagonalize the one-electron density matrix. While the motivation here is that the convergence of configuration interaction expansions is improved by using natural orbitals instead of Hartree–Fock orbitals, it is not clear why this would work in QMC. The calculation of reasonably accurate natural orbitals costs a lot, and such an approach is therefore less attractive for large systems. It should be noted that all such techniques which move the nodal surface of the trial function (and hence potentially improve the DMC energy) make

BITS AND PIECES

147

wavefunction optimization with fixed configurations more difficult. The nodal surface deforms continuously as the parameters are changed, and in the course of this deformation the fixed set of electron positions of one of the configurations may end up being on the nodal surface. As the local energy Hˆ / diverges on the nodal surface, the unreweighted variance of the local energy of a fixed set of configurations also diverges, making it difficult to locate the global minimum of the variance. A discussion of what one might do about this can be found elsewhere.62 In some cases it is necessary to use multideterminant wavefunctions to preserve important symmetries of the true wavefunction. In other cases a single determinant may give the correct symmetry, but a significantly better wavefunction can be obtained by using a linear combination of a few determinants. Multideterminant wavefunctions have been used successfully in QMC studies of small molecules and even in periodic calculations such as the study of the neutral vacancy in diamond due to Hood et al.27 However, other studies have shown that although using multideterminant functions improves VMC, this sometimes does not extend to DMC, indicating that the nodal surface has not been improved.91 Of course, there is very little point in using methods that employ expansions of large numbers of determinants to generate QMC trial functions, not only because the use of methods that scale so badly as a preliminary calculation completely defeats the entire point of QMC, but because the medium- and short-range correlation which these expansions describe95,96 is dealt with directly and vastly more efficiently by the Jastrow factor. By far the most useful way to go beyond the Slater–Jastrow form is the backflow technique, to which we have already alluded. Backflow correlations were originally derived from a current conservation argument by Feynman97 and by Feynman and Cohen98 to provide a picture of the excitations in liquid 4 He and the effective mass of a 3 He impurity in 4 He. In a modern context they can also be derived from an imaginary-time evolution argument.99,100 In the simplest form of backflow trial function the electron coordinates ri appearing in the Slater determinants of Eq. (4.7) are replaced by quasiparticle coordinates, ri = ri +

N

η(rij )(ri − rj )

(4.32)

j =i

where rij = |ri − rj |. This is supposed to represent the characteristic flow pattern where the quantum fluid is “pushed out of the way” in front of a moving particle and fills in the space behind it. The optimal function η(rij ) may be determined variationally, and in so doing the nodal surface is shifted. Backflow thus represents another practical possibility for relaxing the constraints of the fixed-node approximation in DMC. Kwon et al.99,101 found that the introduction of backflow significantly lowered the VMC and DMC energies of the two- and three-dimensional uniform electron gas at high densities. The use of backflow has also been investigated for metallic hydrogen.102 For real polyatomic systems, a much more complicated inhomogeneous backflow function is required; the one

148

QUANTUM MONTE CARLO

developed in our group and implemented in the CASINO program by L´opez R´ıos103 has the following functional form: ↑

↓

BF (R) = eJ (R) det [ψi (ri + ξi (R))] det [ψi (rj + ξj (R))]

(4.33)

with the backflow displacement for electron i in a system of N electrons and Nn nuclei given by ξi =

N j =i

ηij rij +

Nion I

μiI riI +

Nion N j =i

jI

jI

(i rij + i riI )

(4.34)

I

Here ηij = η(rij ) is a function of electron–electron separation, μiI = μ(riI ) jI jI is a function of electron–ion separation, and i = (riI , rj I , rij ) and i = (riI , rj I , rij ). The functions η, μ, , and are parameterized using power expansions with optimizable coefficients.103 Now, of course, the use of backflow wavefunctions can significantly increase the cost of a QMC calculation. This is largely because every element of the Slater determinant has to be recomputed each time an electron is moved, whereas only a single column of the Slater determinant has to be updated after each move when the basic Slater–Jastrow wavefunction is used. The basic scaling of the algorithm with backflow (assuming localized orbitals and basis set) is thus N 3 rather than N 2 . Backflow functions also introduce more parameters into the trial wavefunction, making the optimization procedure more difficult and costly. However, the reduction in the variance normally observed with backflow greatly improves the statistical efficiency of QMC calculations in the sense that the number of moves required to obtain a fixed error in the energy is smaller. In our Ne-atom calculations,91 for example, it was observed that the computational cost per move in VMC and DMC increased by a factor of between 4 and 7, but overall the time taken to complete the calculation to a fixed error bar increased only by a factor of between 2 and 3. One interesting thing that we found is that energies obtained from VMC with backflow approached those of DMC without backflow. VMC with backflow may thus represent a useful level of theory since it is significantly less expensive than DMC (although the problem with obtaining accurate energy differences in VMC presumably remains). Finally, it should be noted that backflow is expected to improve the QMC estimates of all expectation values, not just the energy. We like it. We now move on to consider the issue of basis sets. The importance of using good-quality single-particle orbitals in building up the Slater determinants in the trial wavefunction is clear. The determinant part accounts for by far the most significant fraction of the variational energy. However, the evaluation of singleparticle orbitals and their first and second derivatives can sometimes take up more than half of the total computer time, and consideration must therefore be given to obtaining accurate orbitals that can be evaluated rapidly at arbitrary points in space. It is not difficult to see that the most critical thing is to expand

BITS AND PIECES

149

the single-particle orbitals in a basis set of localized functions. This ensures that beyond a certain system size, only a fixed number of the localized functions will give a significant contribution to a particular orbital at a particular point. The cost of evaluating the orbitals does not then increase rapidly with the size of the system. Note that localized basis functions can (1) be strictly zero beyond a certain radius, or (2) can decrease monotonically and be prescreened before the calculation starts, so that only those functions that could be significant in a particular region are considered for evaluation. An alternative procedure is to tabulate the orbitals and their derivatives on a grid, and this is feasible for small systems such as atoms, but for periodic solids or larger molecules the storage requirements quickly become enormous. This is an important consideration when using parallel computers, as it is much more efficient to store the single-particle orbitals on every node. Historically, a very large proportion of condensed matter electronic structure theorists have used plane-wave basis sets in their DFT calculations. However, in QMC, plane-wave expansions are normally extremely inefficient because they are not localized in real space; every basis function contributes at every point, and the number of functions required increases linearly with system size. Only if there is a short repeat length in the problem are plane waves not totally unreasonable. Note that this does not mean that all plane-wave DFT codes (such as CASTEP,104 ABINIT,105 and PWSCF106 ) are useless for generating trial wavefunctions for CASINO; a postprocessing utility can be used to reexpand a function expanded in plane waves in another localized basis before the wavefunction is read into CASINO. The usual thing here is to use some form of localized spline functions on a grid such as “blip” functions.107,108 Another reasonable way to do this is to expand the orbitals in a basis of Gaussian-type functions. These are localized, relatively quick to evaluate, and are available from a wide range of sophisticated software packages. Such a large expertise has been built up within the quantum chemistry community with Gaussians that there is significant resistance to using any other type of basis. A great many Gaussian-based packages have been developed by quantum chemists for treating molecules. The best known of these are probably the various versions of the GAUSSIAN software.3 In addition to the regular single-determinant methods, these codes implement various techniques involving multideterminant correlated wavefunctions and are flexible tools for developing accurate molecular trial wavefunctions. For systems with periodic boundary conditions, the Gaussian basis set program CRYSTAL109 turns out to be very useful; it can perform all-electron or pseudopotential Hartree–Fock and DFT calculations both for molecules and for systems periodic in one, two, or three dimensions. For some systems, Slater basis sets may be useful in QMC (since they provide a more compact representation than Gaussians, and hence more rapidly calculable orbitals).74 To this end, we have implemented an interface to the program ADF.110 There is one more issue we must consider that is relevant to all basis sets but is particular to the case of Gaussian-type functions. This has to do with cusp conditions. At a nucleus the exact wavefunction has a cusp so that the divergence

150

QUANTUM MONTE CARLO

in the potential energy is canceled by an equal and opposite divergence in the kinetic energy. Therefore, if this cusp is represented accurately in the QMC trial wavefunction, the fluctuations in the local energy will be greatly reduced. It is relatively easy to produce an accurate representation of this cusp when using a grid-based numerical representation of the orbitals. However, as we have already remarked, such representations cannot really be used for large polyatomic systems because of the excessive storage requirements, and we would prefer to use a Gaussian basis set. But then there can be no cusp in the wavefunction since Gaussians have zero gradient at r = 0. The local energy thus diverges at the nucleus. In practice, one finds that the local energy has wild oscillations close to the nucleus, which can lead to numerical instabilities in DMC calculations. To solve this problem we can make small corrections to the single-particle orbitals close to the nuclei, which impose the correct cusp behavior; these need to be applied at each nucleus for every orbital which is larger than a given tolerance at that nucleus. The scheme we developed to correct for this is outlined elsewhere.73 Generalizations of this method have been developed for other basis set types. To see the cusp corrections in action, let us first look at a hydrogen atom where the basis set has been made to model the cusp very closely by using very sharp Gaussians with high exponents. Visually (top left in Fig. 4.5), the fact that the orbital does not obey the cusp condition is not immediately apparent. If we zoom in on the region close to the nucleus (top right), we see the problem; the black line is the orbital expanded in Gaussians and the red line is the cusp-corrected orbital. The effect on the gradient and local energy is clearly significant. This scheme has been implemented within the CASINO code for both finite and periodic systems, and produces a significant reduction in the computer time required to achieve a specified error bar, as one can appreciate from looking at the bottom two panels in Fig. 4.5, which show the local energy as a function of move number for a carbon monoxide molecule with and without cusp corrections. The problem with electron–nucleus cusps is clearly more significant for atoms of higher atomic number. To understand how this helps to do all-electron DMC calculations for heavier atoms, and to understand how the necessary computer time scales with atomic number, we performed calculations for various noble gas atoms.64 By ensuring that the electron–nucleus cusps were represented accurately, it proved perfectly possible to produce converged DMC energies with acceptably small error bars for atoms up to xenon (Z = 54). 4.5.2 Pseudopotentials

Well, “perfectly possible,” I said. Possible, maybe, but definitely somewhat tiresome. On trying to do all-electron calculations for heavier atoms than xenon, we were quickly forced to stop when smoke was observed coming out of the side of the computer.111 Might it therefore be better to do heavy atoms using pseudopotentials, as is commonly done with other methods, such as DFT? In electronic structure calculations pseudopotentials or effective core potentials are used to remove the inert core electrons from the problem and to improve the computational efficiency. Although QMC scales very favorably with system size

151

BITS AND PIECES

Orbital

Orbital

0.5

0.56

0.4 0.3 0.2

0.55

0.1 0–2

–1

0

1

2

0.54 –0.02

–0.01

0

0.01

0.02

0.6 x-gradients

0.4

0

0.2

–100

0 –200

–0.2

–300

–0.4

Local –0.02

–0.01

0

0.01

0.02

–0.02

0

0

–200

–200

–400

–400

–600

–600

–0.01

Energy

0 r (Å)

0.01

0.02

Local energy

–800

0

5000 10000 15000 Number of moves

20000–800 0

5000 10000 15000 Number of moves

20000

Fig. 4.5 (color online) The top two rows show the effect of Gaussian basis set cusp corrections in the hydrogen atom (red straight-line segments corrected; black lines not corrected). The bottom row shows local energy as a function of move number in a VMC calculation for a carbon monoxide molecule with a standard reasonably good Gaussian basis set. The cusp corrections are imposed only in the figure on the right. The reduction in the local energy fluctuations with the new scheme is clearly apparent.

in general, it has been estimated63 that the scaling of all-electron calculations with the atomic number Z is approximately Z 5.5 , which in the relatively recent past was generally considered to rule out applications to atoms with Z greater than about 10. Our paper64 pushing all-electron QMC calculations to Z = 54 was therefore a significant step. The use of a pseudopotential then serves to reduce the effective value of Z and to improve the scaling to Z 3.5 . Although errors are inevitably introduced, the gain in computational efficiency is easily sufficient to make pseudopotentials preferable in heavier atoms. They also offer a simple way to incorporate approximate relativistic corrections.

152

QUANTUM MONTE CARLO

Accurate pseudopotentials for single-particle theories such as DFT or Hartree–Fock theory are well developed, but pseudopotentials for correlated wavefunction techniques such as QMC present additional challenges. The presence of core electrons causes two related problems. The first is that the shorter length-scale variations in the wavefunction near a nucleus of large Z require the use of a small time step. In VMC this problem can, at least in principle, be somewhat reduced by the use of acceleration schemes.112,113 The second problem is that the fluctuations in the local energy tend to be large near the nucleus because both the kinetic and potential energies are large. The central idea of pseudopotential theory is to create an effective potential that reproduces the effects of both the nucleus and the core electrons on the valence electrons. This is done separately for each of the different angular momentum states, so the pseudopotential contains angular momentum projectors and is therefore a nonlocal operator. It is convenient to divide the pseudopotential ps for each atom into a local part Vloc (r) common to all angular momenta and a corps rection, Vnl,l (r), for each angular momentum l. The electron–ion potential energy term in the full many-electron Hamiltonian of the atom then takes the form ps ps Vˆnl,i Vloc + Vˆnl = Vloc (ri ) + (4.35) i

i

where Vˆnl,i is a nonlocal operator that acts on an arbitrary function g(ri ) as follows: ps

ps Vˆnl,i g(ri ) =

ps Vnl,l (ri )

l

l

Ylm (ri )

∗ Ylm (ri )g(ri ) d i

(4.36)

m=−l

where the angular integration is over the sphere passing through the ri . This expression can be simplified by choosing the z-axis along ri , noting that Ylm (0, 0) = 0 for m = 0, and using the definition of the spherical harmonics to give ps 2l + 1 ps ˆ Vnl,l (ri ) (4.37) Vnl,i g(ri ) = Pl [cos(θ i )]g(ri ) d i 4π l

where Pl denotes a Legendre polynomial. While the use of nonlocal pseudopentials is relatively straightforward in a VMC calculation,115,116 there is an issue with DMC. The fixed-node boundary condition turns out not to be compatible with the nonlocality. This forces us to introduce an additional approximation (the locality approximation 117 ) whereby the nonlocal pseudopotential operator Vˆnl acts on the trial function rather than the DMC wavefunction; that is, we replace Vˆnl by T−1 Vˆnl T . The leading-order error term is proportional to (T − 0 )2 , where 0 is the exact fixed-node groundstate wavefunction.117 Unfortunately, this error may be positive or negative, so the method is no longer strictly variational. An alternative to this approximation

BITS AND PIECES

153

is the semilocalization scheme for DMC nonlocal pseudopotentials introduced by Casula et al. in 2005118,119 ; as well as restoring the variational property, this method appears to have better numerical stability than the older scheme. It is not currently possible to construct pseudopotentials for heavy atoms entirely within a QMC framework, although progress in this direction was made by Acioli and Ceperley.114 It is therefore currently necessary to use pseudopotentials generated within some other framework. Possible schemes include Hartree–Fock theory and local DFT, where there is a great deal of experience in generating accurate pseudopotentials. There is evidence to show that Hartree–Fock pseudopotentials give better results within QMC calculations than DFT pseudopotentials,120 although the latter work quite well in many cases. The problem with DFT pseudopotentials appears to be that they already include a (local) description of correlation which is quite different from the QMC description. Hartree–Fock theory, on the other hand, does not contain any effects of correlation. The QMC calculation puts back the valence–valence correlations but neglects core–core correlations (which have only an indirect and small effect on the valence electrons) and core–valence correlations. Core–valence correlations are significant when the core is highly polarizable, such as in alkali-metal atoms. The core–valence correlations may be approximately included by using a core polarization potential (CPP), which represents the polarization of the core due to the instantaneous positions of the surrounding electrons and ions. Another issue is that relativistic effects are important for heavy elements. It is still, however, possible to use a QMC method for solving the Schr¨odinger equation with the scalar relativistic effects obtained within the Dirac formalism incorporated within the pseudopotentials. The combination of Dirac–Hartree–Fock pseudopotentials and CPPs appears to work well in many QMC calculations. CPPs have been generated for a wide range of elements (see, e.g., Ref. 121). Many Hartree–Fock pseudopotentials are available in the literature, mostly in the form of sets of parameters for fits to Gaussian basis sets. Unfortunately, many of them diverge at the origin and it well known that this can lead to significant time step errors in DMC calculations.120 It was thus apparent a few years ago that none of the available sets were ideal for QMC calculations, and it was decided that it would be helpful if we generated an online periodic table of smooth nondivergent Hartree–Fock pseudopotentials (with relativistic corrections) developed specifically for QMC. This project has now been completed and has been described in detail by Trail and Needs.122,123 The resulting pseudopotentials are available online124 ; the repository includes both Dirac–Fock and Hartree–Fock potentials, and a choice of small or large core potentials (the latter being more amenable to plane-wave calculations). Burkatzki et al. have since developed another set of pseudopotentials, also intended for use in QMC calculations.125 Although data are limited, tests126,127 appear to show that the Trail–Needs pseudopotentials give essentially the same results as the Burkatzki pseudopotentials, although the smaller core radii of the former appear to lead to a slight increase in efficiency.

154

QUANTUM MONTE CARLO

4.5.3 Periodic Systems

As with other methods, QMC calculations for extended systems may be performed using finite clusters or infinitely large crystals with periodic boundary conditions. The latter are generally preferred because they approximate the desired large-size limit (i.e., the infinite system size without periodic boundary conditions) more closely. One can also use the standard supercell approach for aperiodic systems such as point defects. For such cases, cells containing a point defect and a small part of the host crystal are repeated periodically throughout space; the supercell must clearly be made large enough so the interactions between defects in different cells are negligible. In periodic DFT calculations the charge density and potentials are taken to have the periodicity of a suitably chosen lattice. The single-particle orbitals can then be made to obey Bloch’s theorem, and the results for the infinite system are obtained by summing quantities obtained from the different Bloch wave vectors within the first Brillouin zone. The situation with many-particle wavefunctions is rather different, since it is not possible to reduce the problem to solving within a primitive unit cell. Such a reduction is allowed in single-particle methods because the Hamiltonian is invariant under the translation of a single electronic coordinate by a translation vector of the primitive lattice, but this is not a symmetry of the many-body Hamiltonian.129,128 Consequently, QMC calculations must be performed at a single k-point. This normally gives a poor approximation to the result for the infinite system, unless one chooses a pretty large nonprimitive simulation cell. One may also average over the results of QMC calculations done at different single k-points.130 There are also a number of problems associated with the long-range Coulomb interaction in many-body techniques such as QMC. It is well known that simply summing the 1/r interaction out over cells on the surface of an ever-expanding cluster never settles down because of the contribution from shape-dependent arrangements of surface charge. The usual solution to this problem is to employ the Ewald method.131 The Ewald interaction contains an effective depolarization field intended to cancel the field produced by the surface charges (and is thus equivalent to what you get if you put the large cluster in a medium of infinite dielectric constant). Long-range interactions also induce long-range exchangecorrelation interactions, and if the simulation cell is not large enough, these effects are described incorrectly. Such effects are absent in local DFT calculations because the interaction energy is written in terms of the electronic charge density, but Hartree–Fock calculations show very strong effects of this kind, and various ways to accelerate the convergence have been developed. The finitesize effects arising from the long-range interaction can be divided into potential and kinetic energy contributions.132,133 The potential energy component can be removed from the calculations by replacing the Ewald interaction by the model periodic Coulomb (MPC) interaction.134 – 136 Recent work has added substantially to our understanding of finite-size effects, and theoretical expressions have been derived for them,132,133 but at the moment it seems that they cannot entirely

BITS AND PIECES

155

replace extrapolation procedures. An alternative approach to estimating finitesize errors in QMC calculations has been developed recently.137 DMC results for the three-dimensional homogeneous electron gas are used to obtain a systemsize-dependent local-density approximation functional. The correction to the total energy is given by the difference between the DFT energies for finite-sized and infinite systems. This approach is interesting, although it does rely on the LDA giving a reasonable description of the system. As will be shown later, DMC calculations using periodic boundary conditions with thousands of atoms per cell have now been done, and the technology is clearly approaching maturity. 4.5.4 Differences, Derivatives, and Forces

Calculations in computational electronic structure theory almost always involve the evaluation of differences in energy, and all methods that work in complex systems rely for their accuracy on the cancellation of errors in such energy differences. Apart from the statistical errors, all known errors in DMC have the same sign and partially cancel out in the subtraction because the method is variational. That said, incomplete cancellation of nodal errors is the most important source of error in DMC results, even though DMC often retrieves 95% or more of the correlation energy. Correlated sampling138 is one way of improving computation of the energy difference between two similar systems with a smaller statistical error than those obtained for the individual energies. This is relatively straightforward in VMC, and a version of it was described briefly in Section 4.3 when discussing variance minimization. As well as simple differences, we would quite often like to calculate derivatives. Many quantities of physical interest can be formulated as an energy derivative, and thus an ability to calculate them accurately in QMC considerably enhances the scope of the method. Normally, of course, this sort of thing would be encountered in the calculation of forces on atoms, but if we expand the energy in a Taylor series in a perturbation such as the strength of an applied electric field, for example, the coefficients of the first- and second-order terms, respectively, give the dipole moment and the various elements of the dipole polarizability tensor:

2 3 1 ∂E ∂ E + Fi Fj + · · · (4.38) E(Fi ) = E(0) + Fi ∂Fi Fi =0 2 ∂Fi Fj Fi =0,Fj =0 j =1 dipole moment

dipole polarizability tensor

One may also calculate the dipole moment (no surprise) by evaluating the expectation value of the dipole-moment operator. However, since the operator doesn’t commute with the Hamiltonian, there will be a significant error using the mixed distribution in DMC—you need to use the pure distribution using future walking84,85 or whatever. This is a significant extra complication, and by formulating the thing as a derivative, you avoid having to do that. As well as the electric field, the perturbation could be the displacement of nuclear positions

156

QUANTUM MONTE CARLO

(giving forces, etc.) or a combination of both (e.g., the intensity of peaks in infrared spectra depends on changes in the dipole moment corresponding to changes in geometry). Such energy derivatives can, of course, be computed numerically (by finite differencing) or analytically (by differentiating the appropriate energy expressions), the latter being clearly preferable in this case. First, we focus on atomic forces. These are generally used in three main areas of computational electronic structure theory: structural optimization, the computation of vibrational properties, and in explicit molecular dynamics simulations of atomic behavior.139 Unfortunately, methods for calculating accurate forces in QMC in a reasonable amount of computer time have proved elusive, at least until relatively recently, due to the lack of readily calculable expressions with reasonable statistical properties. As usual, we begin with a discussion of the Hellmann–Feynman theorem (HFT), which in this context is the statement that the force is the expectation value of the gradient of the Hamiltonian Hˆ : ∇ Hˆ dR F = −∇E = − (4.39) dR The other terms in the expression for the gradient of the expectation value of the energy (the ones involving derivatives of the wavefunction itself) have disappeared only because we are assuming that the wavefunction is an exact eigenstate. Inevitably, then, the use of the HFT is an approximation in QMC because we have only an inexact trial function. The correct QMC expressions for the forces must contain additional (“Pulay”) terms, which depend on wavefunction derivatives. There is also an additional term which accounts for the action of the gradient operator on parameters which couple only indirectly with the nuclear positions (e.g., orbital coefficients), but this can be greatly reduced by optimizing the wavefunction through minimization of the energy rather than the variance. There is another type of Pulay term which arises in DMC. The HFT is expected to be valid for the exact DMC algorithm since it solves for the ground state of the fixed-node Hamiltonian exactly. However, this Hamiltonian differs from the physical one due to the presence of the infinite potential barrier on the trial nodal surface, which constrains the DMC wavefunction φ0 to go to zero there. As we vary the nuclear position(s), the nodal surface moves, and hence the infinite potential barrier moves, giving a contribution to ∇ Hˆ that depends on both T and its first derivative.140 – 142 To calculate the Pulay terms arising from the derivative of the mixed estimator of Eq. (4.31), we need in principle to calculate a derivative of the DMC wavefunction φ0 . Because we don’t have any kind of formula for φ0 , this derivative cannot be readily evaluated, and what has been done in the past is to use the expression for the derivative of the trial function T in its place.142 – 150 The resulting errors are of first order in (T − φ0 ) and (T − φ 0 ); therefore, its accuracy depends sensitively on the quality of the trial function and its derivative.

APPLICATIONS

157

In practice the results obtained from this procedure are not generally accurate enough. Instead of using the usual mixed DMC energy expression, one may calculate forces from the “pure DMC” energy given by ED = φ0 Hˆ φ0 dR/ φ0 φ0 dR, which, by construction, is equal to the mixed DMC energy. It is more expensive to do things this way, but the benefits are now clear. Despite the fact that the derivative ED contains the derivative of the DMC wavefunction, φ 0 , Badinski et al.142 were able to show that φ 0 can be eliminated from the pure DMC formula to give the following exact expression (where dS is a nodal surface element): −1 ˆ φ0 φ0 φ0 H φ0 dR φ0 φ0 T−2 |∇R T |T dS 1

− (4.40) ED = 2 φ0 φ0 dR φ0 φ0 dR Of course it is not easy to compute integrals over the nodal surface, and luckily, the expression can be converted into a regular volume integral with no φ 0 . The error in the required approximation is then of order (T − φ0 )2 , giving −1 ˆ

ˆ φ0 φ0 [φ−1 0 H φ0 + T (H − ED )T ] dR

ED = φ0 φ0 dR T T (EL − ED )T−1 T dR + O[(T − φ0 )2 ] (4.41) + T T dR One may readily evaluate this expression by generating configurations distributed according to the pure (φ20 ) and variational (T2 ) distributions. The approximation is in the Pulay terms, which are smaller in pure than in mixed DMC, and in addition, the approximation in equation (4.41) is second order, in contrast to the first-order error obtained by simply substituting T for φ 0 . This equation satisfies the zero-variance condition; if T and T are exact, the variance of the force obtained from this formula is zero (the variance of the Hellman–Feynman estimator is, strictly speaking, infinite!). Although it remains true that not many papers have been published with actual applications of these methods (some calculations of very accurate forces in small molecules can be found, e.g., in Refs. 150 and 151), one can certainly say that reasonable approximations for the difficult expressions have been found and that the outlook for QMC forces is very promising. 4.6 APPLICATIONS

Time and space preclude me from presenting a long list of applications. Here are two: (1) a somewhat unfair comparison of the worst DFT functional with VMC

158

QUANTUM MONTE CARLO

and DMC for some cohesive energies of tetrahedrally bonded semiconductors, and (2) the equations of state of diamond and iron. Many other applications can be found, for example, in Ref. 5. 4.6.1 Cohesive Energies

A number of VMC and DMC studies have been performed on the cohesive energies of solids. This quantity is given by the difference between the summed energies of the appropriate isolated atoms and the energies of the same atoms in the bulk crystal. This is generally reckoned to be a severe test of QMC methods because the trial wavefunctions used in the two cases must be closely matched in quality to maximize the effective cancellation of errors. Data for Si, Ge, C, and BN have been collected in Table 4.1. The local spin density approximation (LSDA) density functional theory data shows the standard overestimation of the cohesive energy, while the QMC data is in good agreement with experiment. Studies such as these have been important in establishing DMC as an accurate method for calculating the energies of crystalline solids. 4.6.2 Equations of State of Diamond and Iron

The equation of state is the equilibrium relationship between the pressure, volume, and temperature. Computed equations of state are of particular interest in regions where experimental data are difficult to obtain. Diamond anvil cells are

TABLE 4.1 Cohesive Energies of Tetrahedrally Bonded Semiconductors Calculated Within the LSDA, VMC, and DMC Methods and Compared with Experimental Valuesa Method

Si

Ge

C

BN

LSDA VMC

5.28b 4.38(4)d 4.82(7)f 4.48(1)h 4.63(2)h 4.62(8)b

4.59b 3.80(2)e —

8.61b 7.27(7)f 7.36(1)g

15.07c 12.85(9)c

3.85(2)e 3.85b

7.346(6)g 7.37b

DMC Expt.

12.9i

a The energies for Si, Ge, and C are quoted in eV per atom, while those for BN are in eV per two atoms. b From Ref. 152 and references therein. c From Ref. 153. d From Ref. 162. e From Ref. 128. f From Ref. 115. Zero-point energy corrections of 0.18 eV for C and 0.06 eV for Si have been added to the published values for consistency with the other data in the table. g From Ref. 27. h From Ref. 26. i From Ref. 154, estimated from experimental results on hexagonal BN.

APPLICATIONS

159

widely used in high-pressure research, and one of the important problems is the measurement of the pressure inside the cell. The most common approach is to place a small grain of ruby in the sample chamber and measure the frequency of a strong laser-stimulated fluorescence line. The resolution is, however, poor at pressures above about 100GPa, and alternative methods are being investigated. One possibility is to measure the Raman frequency of diamond itself, assuming that the highest frequency derives from the diamond faces adjacent to the sample chamber. Calibrating such a scale requires an accurate equation of state and the corresponding pressure dependence of the Raman frequency. Maezono et al. performed VMC, DMC, and DFT calculations of the equation of state of diamond.12 The DMC and DFT data are shown in Fig. 4.6, along with equations of state derived from experimental data.155,156 The experimentally derived equations of state differ significantly at high pressures. It is now believed that the pressure calibration in the more modern experiment of Occelli et al.156 is inaccurate, and our DMC data support this view. As can be seen in Fig. 4.6, the equations of state calculated within DFT depend on the choice of exchange-correlation functional, undermining confidence in the DFT method. A recent QMC study of the equation of state and Raman frequency of cubic boron nitride has produced data that could be used to calibrate pressure measurements in diamond anvil cells.157 Another example of a DMC equation of state was produced by Sola et al.,158 who calculated the equation of state of hexagonal close-packed (hcp) iron under Earth’s core conditions. With up to 150 atoms or 2400 electrons per

Pressure (GPa)

800 Expt (McSkimin & Andreatch) Expt (Occelli et al.) DFT-LDA DFT-PBE DMC

600

400

200 3

3.5 4 Volume per atom (Å3)

4.5

Fig. 4.6 (color online) Equation of state of diamond at high pressures from measurements by McSkimin and Andreatch155 and Occelli et al.,156 and as calculated using DFT with two different functionals and DMC.12 The shaded areas indicate the uncertainty in the experimental equations of state. The zero-point phonon pressure calculated using DFT with the PBE functional is included in the theoretical curves.

160

QUANTUM MONTE CARLO

Fig. 4.7 (color online) Pressure–volume curve in iron obtained from DMC calculations (solid line158 ). The small yellow error band above the DMC curve is due to the errors in the parameters of a fit to the Birch–Murnaghan equation of state. DFT-PW91 results (dotted line160 ) and experimental data (circles161 and open triangles159 ) are reported for comparison.

cell, these represent some of the largest systems studied with DMC to date and demonstrate the ability of QMC to treat heavier transition metal atoms. Figure 4.7 shows the calculated equation of state, which agrees closely with experiments and with previous DFT calculations. (DFT is expected to work well in this system and the DMC calculations appear to confirm this.) Notice the discontinuity due to the hcp–bcc (body-centered cubic) phase transition in the experimental values reported by Dewaele et al.159 At low pressures, the calculations and experiments differ because of the magnetism, which is not taken into account in these particular calculations (although it could be in principle). 4.7 CONCLUSIONS

Quite a lot of progress has been made in the theory and practical implementation of quantum Monte Carlo over the past few years, but certainly many interesting problems remain to be solved. For its most important purpose of calculating highly accurate total energies, the method works well and currently has no serious competitors for medium-sized and large systems. Our group has developed the software package CASINO,46 – 48 which has been designed to allow researchers to explore the potential of QMC in arbitrary molecules, polymers, slabs, and crystalline solids and in various model systems, including standard electron and electron–hole phases such as the homogeneous electron gas and Wigner crystals. Many young people also seem to believe that QMC is way cooler than boring old density functional theory, and they’re probably right. So that’s all right, then.

REFERENCES

161

Acknowledgments

M.D.T. would like to thank the Royal Society for the award of a long-term university research fellowship. He also wishes to acknowledge the many contributions of R.J. Needs, N.D. Drummond, and P. L´opez R´ıos to the work described in this chapter, along with all the other members of the Cavendish Laboratory TCM Group, plus our many collaborators around the world. Computing facilities were provided largely by the Cambridge High Performance Computing Service.

REFERENCES 1. Cramer, C. J. Essentials of Computational Chemistry, Wiley, Hoboken, NJ, 2002, pp. 191–232. 2. Parr, R. G.; Yang, W. Density Functional Theory of Atoms and Molecules, Oxford University Press, New York, 1994. 3. Frisch, M. J.; et al. Gaussian 09 , Gaussian Inc., Wallingford, CT, 2009. 4. Hammond, B. L.; Lester, W. A., Jr.; Reynolds, P. J. Monte Carlo Methods in Ab Initio Quantum Chemistry, World Scientific, Singapore, 1994. 5. Foulkes, W. M. C.; Mitas, L.; Needs, R. J.; Rajagopal, G. Rev. Mod. Phys. 2001, 73 , 33. 6. Ceperley, D. M.; Alder, B. J. Phys. Rev. Lett. 1980, 45 , 566. 7. Vosko, S. H.; Wilk, L.; Nusair, M. Can. J. Phys. 1980, 58 , 1200. 8. Perdew, J. P.; Zunger, A. Phys. Rev. B 1981, 23 , 5048. 9. Wu, Y. S. M.; Kuppermann, A.; Anderson, J. B. Phys. Chem. Chem. Phys. 1999, 1 , 929. 10. Natoli, V.; Martin, R. M.; Ceperley, D. M. Phys. Rev. Lett. 1993, 70 , 1952. 11. Delaney, K. T.; Pierleoni, C.; Ceperley, D. M. Phys. Rev. Lett. 2006, 97 , 235702. 12. Maezono, R.; Ma, A.; Towler, M. D.; Needs, R. J. Phys. Rev. Lett. 2007, 98 , 025701. 13. Pozzo, M.; Alf`e, D. Phys. Rev. B 2008, 77 , 104103. 14. Alf`e, D.; Alfredsson, M.; Brodholt, J.; Gillan, M. J.; Towler, M. D.; Needs, R. J. Phys. Rev. B 2005, 72 , 014114. 15. Manten, S.; L¨uchow, A. J. Chem. Phys. 2001, 115 , 5362. 16. Grossman, J. C. J. Chem. Phys. 2002, 117 , 1434. 17. Aspuru-Guzik, A.; El Akramine, O.; Grossman, J. C.; Lester, W. A., Jr. J. Chem. Phys. 2004, 120 , 3049. 18. Gurtubay, I. G.; Drummond, N. D.; Towler, M. D.; Needs, R. J. J. Chem. Phys. 2006, 124 , 024318. 19. Gurtubay, I. G.; Needs, R. J. J. Chem. Phys. 2007, 127 , 124306. 20. Hood, R. Q.; Chou, M.-Y.; Williamson, A. J.; Rajagopal, G.; Needs, R. J.; Foulkes, W. M. C. Phys. Rev. Lett. 1997, 78 , 3350. 21. Hood, R. Q.; Chou, M.-Y.; Williamson, A. J.; Rajagopal, G.; Needs, R. J. Phys. Rev. B 1998, 57 , 8972. 22. Nekovee, M.; Foulkes, W. M. C.; Needs, R. J. Phys. Rev. Lett. 2001, 87 , 036401. 23. Nekovee, M.; Foulkes, W. M. C.; Needs, R. J. Phys. Rev. B 2003, 68 , 235108.

162

QUANTUM MONTE CARLO

24. Williamson, A. J.; Grossman, J. C.; Hood, R. Q.; Puzder, A.; Galli, G. Phys. Rev. Lett. 2002, 89 , 196803. 25. Drummond, N. D.; Williamson, A. J.; Needs, R. J.; Galli, G. Phys. Rev. Lett. 2005, 95 , 096801. 26. Leung, W.-K.; Needs, R. J.; Rajagopal, G.; Itoh, S.; Ihara, S. Phys. Rev. Lett. 1999, 83 , 2351. 27. Hood, R. Q.; Kent, P. R. C.; Needs, R. J.; Briddon, P. R. Phys. Rev. Lett. 2003, 91 , 076403. 28. Alf`e, D.; Gillan, M. J. Phys. Rev. B 2005, 71 , 220101. 29. Towler, M. D.; Needs, R. J. Int. J. Mod. Phys. B 2003, 17 , 5425. 30. Wagner, L. K.; Mitas, L. Chem. Phys. Lett. 2003, 370 , 412. 31. Wagner, L. K.; Mitas, L. J. Chem. Phys. 2007, 126 , 034105. 32. Mitas, L.; Martin, R. M. Phys. Rev. Lett. 1994, 72 , 2438. 33. Williamson, A. J.; Hood, R. Q.; Needs, R. J.; Rajagopal, G. Phys. Rev. B 1998, 57 , 12140. 34. Towler, M. D.; Hood, R. Q.; Needs, R. J. Phys. Rev. B 2000, 62 , 2330. 35. Ghosal, A.; Guclu, A. D.; Umrigar, C. J.; Ullmo, D.; Baranger, H. Nature Phys. 2006, 2 , 336. 36. Healy, S. B.; Filippi, C.; Kratzer, P.; Penev, E.; Scheffler, M. Phys. Rev. Lett. 2001, 87 , 016105. 37. Filippi, C.; Healy, S. B.; Kratzer, P.; Pehlke, E.; Scheffler, M. Phys. Rev. Lett. 2002, 89 , 166102. 38. Kim, Y.-H.; Zhao, Y.; Williamson, A.; Heben, M. J.; Zhang, S. Phys. Rev. Lett. 2006, 96 , 016102. 39. Carlson, J.; Chang, S.-Y.; Pandharipande, V. R.; Schmidt, K. E. Phys. Rev. Lett. 2003, 91 , 050401. 40. Astrakharchik, G. E.; Boronat, J.; Casulleras, J.; Giorgini, S. Phys. Rev. Lett. 2004, 93 , 200404. 41. Carlson, J.; Reddy, S. Phys. Rev. Lett. 2008, 100 , 150403. 42. Schr¨odinger, E. Ann. Phys. 1926, 79 , 361. 43. Ashcroft, N. W.; Mermin, N. D. Solid State Physics, W. B. Saunders, Philadelphia, 1976, p. 330. 44. Kent, P. R. C., Towler, M. D.; Needs, R. J.; Rajagopal, G. Phys. Rev. B 2000, 62 , 15394. 45. http://www.qmcwiki.org/index.php/Research_resources. 46. Needs, R. J.; Towler, M. D.; Drummond, N. D.; L´opez R´ıos, P. CASINO Version 2.5 User Manual , Cambridge University, Cambridge, UK, 2009. 47. CASINO Web site: http://www.tcm.phy.cam.ac.uk/∼mdt26/casino2.html. 48. http://www.vallico.net/tti/tti.html. Click on “PUBLIC EVENTS.” 49. Trail, J. R. Phys. Rev. E 2008, 77 , 016703. 50. Trail, J. R. Phys. Rev. E 2008, 77 , 016704. 51. Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. M.; Teller, E. J. Chem. Phys. 1953, 21 , 1087.

REFERENCES

163

52. Towler, M. D. De Broglie-Bohm pilot-wave theory and the foundations of quantum mechanics. Graduate lecture course, available at http://www.tcm. phy.cam.ac.uk/∼mdt26/pilot_waves.html, 2009. 53. Jastrow, R. J. Phys. Rev . 1955, 98 , 1479. 54. Drummond, N. D.; Towler, M. D.; Needs, R. J. Phys. Rev. B 2004, 70 , 235119. 55. Aragon, S. Density Functional Theory: A Primer , San Francisco State University teaching material, available at www.wag.caltech.edu/PASI/lectures/SFSUElectronicStructure-Lect-6.doc. 56. Kato, T. Commun. Pure Appl. Math. 1957, 10 , 151. 57. de Palo, S.; Rapisarda, F.; Senatore, G. Phys. Rev. Lett. 2002, 88 , 206401. 58. L´opez R´ıos, P.; Needs, R. J. Unpublished. 59. Dennis, J. E.; Gay, D. M.; Welsch, R. E. ACM Trans. Math. Software 1981, 7 , 369. 60. Umrigar, C. J.; Wilson, K. G.; Wilkins, J. W. Phys. Rev. Lett. 1988, 60 , 1719. 61. Kent, P. R. C.; Needs, R. J.; Rajagopal, G. Phys. Rev. B 1999, 59 , 12344. 62. Drummond, N. D.; Needs, R. J. Phys. Rev. B 2005, 72 , 085124. 63. Ceperley, D. M. J. Stat. Phys. 1986, 43 , 815. 64. Ma, A.; Drummond, N. D.; Towler, M. D.; Needs, R. J. Phys. Rev. E 2005, 71 , 066704. 65. Riley, K. E.; Anderson, J. B. Mol. Phys. 2003, 101 , 3129. 66. Nightingale, M. P.; Melik-Alaverdian, V. Phys. Rev. Lett. 2001, 87 , 043401. 67. Umrigar, C. J.; Toulouse, J.; Filippi, C.; Sorella, S.; Hennig, R. G. Phys. Rev. Lett. 2007, 98 , 110201. 68. Toulouse, J.; Umrigar, C. J. J. Chem. Phys. 2007, 126 , 084102. 69. Ceperley, D. M. Top-ten reasons why no-one uses quantum Monte Carlo, Ceperley group Web site, 1996; since removed. 70. Pople, J. A.; Head-Gordon, M.; Fox, D. J.; Raghavachari, K.; Curtiss, L. A. J. Chem. Phys. 1989, 90 , 5622. 71. Curtiss, L. A.; Jones, C.; Trucks, G. W.; Raghavachari, K.; Pople, J. A. J. Chem. Phys. 1990, 93 , 2537. 72. Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Pople, J. A. J. Chem. Phys. 1997, 106 , 1063. 73. Ma, A.; Drummond, N. D.; Towler, M. D.; Needs, R. J. J. Chem. Phys. 2005, 122 , 224322. 74. Nemec, N.; Towler, M. D.; Needs, R. J. J. Chem. Phys. 2010, 132 , 034111. 75. Kalos, M. H.; Colletti, L.; Pederiva, F. J. Low Temp. Phys. 2005, 138 , 747. 76. Anderson, J. B. J. Chem. Phys. 1975, 63 , 1499; Ibid., 1976, 65 , 4121. 77. Ceperley, D. M. J. Stat. Phys. 1991, 63 , 1237. 78. Foulkes, W. M. C.; Hood, R. Q.; Needs, R. J. Phys. Rev. B 1999, 60 , 4558. 79. Glauser, W.; Brown, W.; Lester, W.; Bressanini, D.; Hammond, B. J. Chem. Phys. 1992, 97 , 9200. 80. Bressanini, B.; Reynolds, P. J. Phys. Rev. Lett. 2005, 95 , 110201. 81. Bajdich, M.; Mitas, L.; Drobn´y, G.; Wagner, L. K. Phys. Rev. B 1999, 60 , 4558. 82. Towler, M. D.; Allan, N. L.; Harrison, N. M.; Saunders, V. R.; Mackrodt, W. C.; Apr`a, E. Phys. Rev. B 1994, 50 , 5041.

164

83. 84. 85. 86. 87. 88. 89. 90. 91. 92.

93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105.

106. 107. 108. 109.

110. 111.

QUANTUM MONTE CARLO

Needs, R. J.; Towler, M. D. Int. J. Mod. Phys. B 2003, 17 , 5425. Liu, S. K.; Kalos, M. H.; Chester, G. V. Phys. Rev. A 1974, 10 , 303. Barnett, R. N.; Reynolds, P. J.; Lester, W. A., Jr. J. Comput. Phys. 1991, 96 , 258. Baroni, S.; Moroni, S. Phys. Rev. Lett. 1999, 82 , 4745. Drummond, N. D.; Radnai, Z.; Trail, J. R.; Towler, M. D.; Needs, R. J. Phys. Rev. B 2004, 69 , 085116. Drummond, N. D.; Needs, R. J. Phys. Rev. Lett. 2009, 102 , 126402. L¨uchow, A.; Petz, R.; Scott, T. C. J. Chem. Phys. 2007, 126 , 144110. Reboredo, F. A.; Hood, R. Q.; Kent, P. R. C. Phys. Rev. B 2009, 79 , 195117. Drummond, N. D.; L´opez R´ıos, P.; Ma, A.; Trail, J. R.; Spink, G.; Towler, M. D.; Needs, R. J. J. Chem. Phys. 2006, 124 , 224104. Fahy, S. In Quantum Monte Carlo Methods in Physics and Chemistry, Nato Science Series C: Mathematical and Physical Sciences, Vol. 525, Nightingale, P., Umrigar, C. J., Eds., Kluwer Academic, Dordrecht, The Netherlands, 1999, p. 101. Filippi, C.; Fahy, S. J. Chem. Phys. 2000, 112 , 3523. Grossman, J. C.; Mitas, L. Phys. Rev. Lett. 1995, 74 , 1323. Kutzlnigg, W.; Morgan, J. D., III. J. Phys. Chem. 1992, 96 , 4484. Prendergast, D.; Nolan, M.; Filippi, C.; Fahy, S.; Greer, J. C. J. Chem. Phys. 2001, 115 , 1626. Feynman, R. P. Phys. Rev . 1954, 94 , 262. Feynman, R. P.; Cohen, M. Phys. Rev . 1956, 102 , 1189. Kwon, Y.; Ceperley, D. M.; Martin, R. M. Phys. Rev. B 1993, 48 , 12037. Holzmann, M.; Ceperley, D. M.; Pierleoni, C.; Esler, K. Phys. Rev. E 2003, 68 , 046707. Kwon, Y.; Ceperley, D. M.; Martin, R. M. Phys. Rev. B 1998, 58 , 6800. Pierleoni, C.; Ceperley, D. M.; Holzmann, M. Phys. Rev. Lett. 2004, 93 , 146402. L´opez R´ıos, P.; Ma, A.; Drummond, N. D.; Towler, M. D.; Needs, R. J. Phys. Rev. E 2006, 74 , 066701. Segall, M. D.; Lindan, P. L. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C. J. Phys. Condens. Matter 2002, 14 , 2717. Gonze, X.; Beuken, J.-M.; Caracas, R.; Detraux, F.; Fuchs, M.; Rignanese, G.-M.; Sindic, L.; Verstraete, M.; Zerah, G.; Jollet, F.; Torrent, M.; Roy, A.; Mikami, M.; Ghosez, Ph.; Raty, J.-Y.; Allan, D. C. Comput. Mater. Sci . 2002, 25 , 478. Baroni, S.; Dal Corso, A.; de Gironcoli, S.; Giannozzi, P. http://www.pwscf.org. Hernandez, E.; Gillan, M. J.; Goringe, C. M. Phys. Rev. B 1997, 55 , 13485. Alf`e, D.; Gillan, M. J. Phys. Rev. B 2004, 70 , 161101. Dovesi, R.; Saunders, V. R.; Roetti, C.; Orlando, R.; Zicovich-Wilson, C. M.; Pascale, F.; Civalleri, B.; Doll, K.; Harrison, N. M.; Bush, I. J.; D’Arco, Ph.; Llunell, M. CRYSTAL06 User’s Manual , University of Torino, Torino, Italy, 2006. te Velde, G.; Bickelhaupt, F. M.; Baerends, E. J.; Guerra, C. F.; van Gisbergen, S. J. A.; Snijders, J. G.; Ziegler, T. J. Comput. Chem. 2001, 22 , 931. This practice has recently been outlawed in our department by new university antismoking legislation. My thanks to an anonymous referee for supplying me with this joke.

REFERENCES

112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138.

139. 140. 141. 142. 143.

165

Umrigar, C. J. Phys. Rev. Lett. 1993, 71 , 408. Stedman, M. L.; Foulkes, W. M. C.; Nekovee, M. J. Chem. Phys. 1998, 109 , 2630. Acioli, P. H.; Ceperley, D. M. J. Chem. Phys. 1994, 100 , 8169. Fahy, S.; Wang, X. W.; Louie, S. G. Phys. Rev. B 1990, 42 , 3503. Fahy, S.; Wang, X. W.; Louie, S. G. Phys. Rev. Lett. 1998, 61 , 1631. Mitas, L.; Shirley, E. L.; Ceperley, D. M. J. Chem. Phys. 1991, 95 , 3467. Casula, M.; Filippi, C.; Sorella, S. Phys. Rev. Lett. 2005, 95 , 100201. Casula, M. Phys. Rev. B 2006, 74 , 161102. Greeff, C. W.; Lester, W. A., Jr. J. Chem. Phys. 1998, 109 , 1607. Shirley, E. L.; Martin, R. M. Phys. Rev. B 1993, 47 , 15413. Trail, J. R.; Needs, R. J. J. Chem. Phys. 2005, 122 , 174109. Trail, J. R.; Needs, R. J. J. Chem. Phys. 2005, 122 , 014112. http://www.tcm.phy.cam.ac.uk/∼mdt26/casino2_pseudopotentials.html. Burkatzki, M.; Filippi, C.; Dolg, M. J. Chem. Phys. 2007, 126 , 234105; ibid., 2008, 129 , 164115. Trail, J. R.; Needs, R. J. J. Chem. Phys. 2008, 128 , 204103. Santra, B.; Michaelides, A.; Fuchs, M.; Tkatchenko, A.; Filippi, C.; Scheffler, M. J. Chem. Phys. 2008, 129 , 194111. Rajagopal, G.; Needs, R. J.; James, A. J.; Kenny, S. D.; Foulkes, W. M. C. Phys. Rev. B 1995, 51 , 10591. Rajagopal, G.; Needs, R. J.; Kenny, S. D.; Foulkes, W. M. C.; James, A. J. Phys. Rev. Lett. 1994, 73 , 1959. Lin, C.; Zong, F. H.; Ceperley, D. M. Phys. Rev. E 2001, 64 , 016702. Ewald, P. P. Ann. Phys. 1921, 64 , 25. Chiesa, S.; Ceperley, D. M.; Martin, R. M.; Holzmann, M. Phys. Rev. Lett. 2006, 97 , 076404. Drummond, N. D.; Needs, R. J.; Sorouri, A.; Foulkes, W. M. C. Phys. Rev. B 2008, 78 , 125106. Fraser, L. M.; Foulkes, W. M. C.; Rajagopal, G.; Needs, R. J.; Kenny, S. D.; Williamson, A. J. Phys. Rev. B 1996, 53 , 1814. Williamson, A. J.; Rajagopal, G.; Needs, R. J.; Fraser, L. M.; Foulkes, W. M. C.; Wang, Y.; Chou, M.-Y. Phys. Rev. B 1997, 55 , R4851. Kent, P. R. C.; Hood, R. Q.; Williamson, A. J.; Needs, R. J.; Foulkes, W. M. C.; Rajagopal, G. Phys. Rev. B 1999, 59 , 1917. Kwee, H.; Zhang, S.; Krakauer, H. Phys. Rev. Lett. 2008, 100 , 126404. Dewing, M.; Ceperley, D. M. Methods for coupled electronic–ionic Monte Carlo. In Recent Advances in Quantum Monte Carlo Methods, Part II, Lester, W. A., Rothstein, S. M., and Tanaka, S., Eds., World Scientific, Singapore, 2002. Grossman, J. C.; Mitas, L. Phys. Rev. Lett. 2005, 94 , 056403. Huang, K. C.; Needs, R. J.; Rajagopal, G. J. Chem. Phys. 2000, 112 , 4419. Schautz, F.; Flad, H.-J. J. Chem. Phys. 2000, 112 , 4421. Badinski, A.; Haynes, P. D.; Needs, R. J. Phys. Rev. B 2008, 77 , 085111. Reynolds, P. J.; Barnett, R. N.; Hammond, B. L.; Grimes, R. M.; Lester, W. A., Jr. Int. J. Quantum Chem. 1986, 29 , 589.

166

144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163.

QUANTUM MONTE CARLO

Assaraf, R.; Caffarel, M. Phys. Rev. Lett. 1999, 83 , 4682. Casalegno, M.; Mella, M.; Rappe, A. M. J. Chem. Phys. 2003, 118 , 7193. Assaraf, R.; Caffarel, M. J. Chem. Phys. 2003, 119 , 10536. Lee, M. W.; Mella, M.; Rappe, A. M. J. Chem. Phys. 2005, 122 , 244103. Badinski, A.; Needs, R. J. Phys. Rev. E 2007, 76 , 036707. Badinski, A.; Needs, R. J. Phys. Rev. B 2008, 78 , 035134. Badinski, A.; Trail, J. R.; Needs, R. J. J. Chem. Phys. 2008, 129 , 224101. Badinski, A.; Haynes, P. D.; Trail, J. R.; Needs, R. J. J. Phys. Condens. Matter 2010, 22 , 074202. Farid, B.; Needs, R. J. Phys. Rev. B 1992, 45 , 1067. Malatesta, A.; Fahy, S.; Bachelet, G. B. Phys. Rev. B 1997, 56 , 12201. Knittle, E.; Wentzcovitch, R.; Jeanloz, R.; Cohen, M. L. Nature 1989, 337 , 349. McSkimin, H. J.; Andreatch, P. J. Appl. Phys. 1972, 43 , 2944. Occelli, F.; Loubeyre, P.; LeToullec, R. Nature Mater. 2003, 2 , 151. Esler, K. P.; Cohen, R. E.; Militzer, B.; Kim, J.; Needs, R. J.; Towler, M. D. Phys. Rev. Lett. 2010, 104 , 185702. Sola, E.; Brodholt, J. P.; Alf`e, D. Phys. Rev. B 2009, 79 , 024107. Dewaele, A.; Loubeyre, P.; Occelli, F.; Mezouar, M.; Dorogokupets, P. I.; Torrent, M. Phys. Rev. Lett. 2006, 97 , 215504. S¨oderlind, P.; Moriarty, J. A.; Wills, J. M. Phys. Rev. B 1996, 53 , 14063. Mao, K.; Wu, Y.; Chen, L. C.; Shu, J. F. J. Geophys. Res. 1990, 95 , 21737. Li, X.-P.; Ceperley, D. M.; Martin, R. M. Phys. Rev. B 1991, 44 , 10929. Towler, M.D.; Russell, N.J.; Valentini, A. arXiv 2011, 1103.1589v1 [quant-ph].

5

Coupled-Cluster Calculations for Large Molecular and Extended Systems KAROL KOWALSKI William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington

JEFF R. HAMMOND The University of Chicago, Chicago, Illinois

WIBE A. de JONG, PENG-DONG FAN, MARAT VALIEV, DUNYOU WANG, and NIRANJAN GOVIND William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington

The ever-increasing power of modern computer systems is advancing many areas of computational chemistry and allowing one to study significantly larger systems with extremely accurate quantum chemistry methods. This has been made possible, in part, by the developments of highly scalable implementations of core quantum chemistry methodologies. In particular, there has been significant progress in the parallel implementations of coupled-cluster (CC) methods, which has become a method of choice for studying complex chemical processes that require accurate treatment of the electron correlation. In this chapter we outline the various CC formalisms available in NWChem and discuss the parallel implementation of these methods in our code. Performance issues, system-size limitations, and the accuracies that can be achieved with these calculations are also discussed. Representative examples from two key domains of CC theory (excited-state formalism and linear response studies) are reviewed and the possibilities of coupling CC methods with different multiscale approaches are highlighted.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

167

168

COUPLED-CLUSTER CALCULATIONS FOR LARGE SYSTEMS

5.1 INTRODUCTION

Many aspects of computational chemistry require accuracies that can only be achieved by highly accurate computational methods that account appropriately for the instantaneous interactions or correlations between electrons in molecules.1 Including these electronic correlation effects is necessary to be able to compare theory and experiment in a precise manner. Even though these correlation effects contribute less than 1% of the total energy, they are fundamental to an understanding of the electronic structure of various systems and in the development of predictive models. For this reason these methods have become an integral part of many computational chemistry packages. Among the many methods that describe correlation effects systematically, the coupled-cluster (CC) formalism2,3 has evolved into a widely used and very accurate method for solving the electronic Schr¨odinger equation. Compared with other formalisms, such as perturbative methods or approaches based on the linear expansion of the wavefunction (e.g., configuration interaction methods), the main advantage of CC methods lies in the fact that the correlation effects are elegantly captured in the exponential form of the wavefunction. A simple consequence of this ansatz is the size extensivity of the resulting energies or, equivalently, proper scaling of the energy with the number of electrons. Although the CC method was initially proposed in nuclear physics,4,5 it was quickly adopted by quantum chemists, and since the late 1960s there has been steady development that has spawned a variety of CC methodologies. In the last decade this formalism has been “rediscovered” by the nuclear physics community.6 – 8 This clearly demonstrates the universal applicability of the method across a wide energy scale. Despite these successes, the inherent numerical cost of CC methods, which grows rapidly with system size, significantly hampers the wide applicability of this formalism. This difficulty may be overcome through the use of massively parallel computer systems and highly scalable CC implementations. The parallel implementations available in quantum chemistry programs such as ACES II MAB,9 ACES III,10,11 PQS,12 – 15 MOLPRO,16 GAMESS(US),17 – 19 and NWChem implementations20 – 24 are excellent examples of recent developments. In this chapter we demonstrate the capabilities and review the parallel CC implementation in NWChem. We refer the reader to other papers listed above for discussions on other implementations. The rest of this chapter is organized as follows. An overview of CC theory for ground or excited states and CC linear response theory is given in Section 5.2. The details of our parallel CC implementation are described in Section 5.3. In Section 5.4 we present various groundand excited-state examples and studies involving coupling CC methodologies with multiphysics approaches. 5.2 THEORY

The details of the CC formalism have been discussed in many review articles.1,25 – 27 For the purpose of this chapter we present only the most

THEORY

169

important approaches within the single reference formulation, where the CC ground-state wavefunction |0 is represented in the form of the exponential Ansatz, |0 = eT |

(5.1)

where the reference function | is usually chosen as a Hartree–Fock (HF) determinant and the cluster operator T is represented as T =

N

Ti

(5.2)

i=1

where N refers to the total number of correlated electrons. Each component Tn takes the form in + + Tn = tai11··· (5.3) ··· an Xa1 · · · Xan Xin · · · Xi1 i1 A One can find that the GEBF-HF energy differs from the conventional HF energy by less than 1 mHa (see Table 7.A2). It should be mentioned that other properties can be calculated similarly as a linear combination of corresponding properties of all subsystems.

REFERENCES

TABLE 7.A3

NPA Charges of All Atoms Used in the GEBF Approach

Atom Element 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

255

C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C H

Charge −0.478580 −0.306510 −0.312180 −0.307940 −0.306800 −0.306840 −0.306660 −0.306660 −0.306660 −0.306660 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306660 −0.306660 −0.306660 −0.306660 −0.306840 −0.306800 −0.307940 −0.312180 −0.306510 −0.478580 0.159400

Atom Element 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H

Charge 0.152910 0.151950 0.151950 0.165180 0.159400 0.159400 0.154180 0.154180 0.152640 0.152640 0.152960 0.152960 0.152910 0.152910 0.152550 0.152550 0.151880 0.151880 0.151910 0.151910 0.151920 0.151920 0.151920 0.151920 0.151940 0.151940 0.151940 0.151940 0.151950 0.151950 0.151950 0.151950 0.151950

Atom Element 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98

H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H

Charge 0.151950 0.151950 0.151950 0.151950 0.151950 0.151950 0.151950 0.151950 0.151950 0.151940 0.151940 0.151940 0.151940 0.151920 0.151920 0.151920 0.151920 0.151910 0.151910 0.151880 0.151880 0.152550 0.152550 0.152910 0.152960 0.152960 0.152640 0.152640 0.154180 0.154180 0.165180 0.159400

REFERENCES 1. Alsenoy, C. V.; Yu, C.-H.; Peeters, A.; Martin, J. M. L.; Sch¨afer, L. J. Phys. Chem. A 1998, 102 , 2246. 2. Scuseria, G. E. J. Phys. Chem. A 1999, 103 , 4782. 3. Inaba, T.; Tahara, S.; Nisikawa, N.; Kashiwagi, H.; Sato, F. J. Comput. Chem. 2005, 26 , 987. 4. Xu, H.; Ma, J.; Chen, X.; Hu, Z.; Huo, K.; Chen, Y. J. Phys. Chem. B 2004, 108 , 4024.

256

THE ENERGY-BASED FRAGMENTATION APPROACH

5. Gao, B.; Jiang, J.; Liu, K.; Wu, Z.; Lu, W.; Luo, Y. J. Comput. Chem. 2008, 29 , 434. 6. Brothers, E. N.; Izmaylov, A. F.; Scuseria, G. E. J. Phys. Chem. C 2008, 112 , 1396. 7. Strout, D. L.; Scuseria, G. E. J. Chem. Phys. 1995, 102 , 8448. 8. Strain, M. C.; Scuseria, G. E.; Frisch, M. J. Science 1996, 271 , 51. 9. White, C. A.; Head-Gordon, M. J. Chem. Phys. 1994, 101 , 6593. 10. Schwegler, E.; Challacombe, M. J. Chem. Phys. 1996, 105 , 2726. 11. Ochsenfeld, C.; White, C. A.; Head-Gordon, M. J. Chem. Phys. 1998, 109 , 1663. 12. Burant, J. C.; Strain, M. C.; Scuseria, G. E.; Frisch, M. J. Chem. Phys. Lett. 1996, 248 , 43. 13. Kudin, K. N.; Scuseria, G. E. Phys. Rev. B 2000, 61 , 16440. 14. Stratmann, R. E.; Scuseria, G. E.; Frisch, M. J. Chem. Phys. Lett. 1996, 257 , 213. 15. Millam, J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106 , 5569. 16. Li, X.; Millam, J. M.; Scuseria, G. E.; Frisch, M. J.; Schlegel, H. B. J. Chem. Phys. 2003, 119 , 7651. 17. Lecszsynski, J. Computational Chemistry: Review of Current Trends, World Scientific, Singapore, 2002. 18. Pulay, P. Chem. Phys. Lett. 1983, 100 , 151. 19. Saebø, S.; Pulay, P. Annu. Rev. Phys. Chem. 1993, 44 , 213. 20. Hampel, C.; Werner, H.-J. J. Chem. Phys. 1996, 104 , 6286. 21. Sch¨utz, M.; Hetzer, G.; Werner, H.-J. J. Chem. Phys. 1999, 111 , 5691. 22. Sch¨utz, M.; Werner, H.-J. J. Chem. Phys. 2001, 114 , 661. 23. Werner, H.-J.; Manby, F. R.; Knowles, P. J. J. Chem. Phys. 2003, 118 , 8149. 24. Ayala, P. Y.; Scuseria, G. E. J. Chem. Phys. 1999, 110 , 3660. 25. Scuseria, G. E.; Ayala, P. Y. J. Chem. Phys. 1999, 111 , 8330. 26. Ayala, P. Y.; Kudin, K. N.; Scuseria, G. E. J. Chem. Phys. 2001, 115 , 9698. 27. Alml¨of, J. Chem. Phys. Lett. 1991, 181 , 319. 28. Head-Gordon, M.; Maslen, P. E.; White, C. A. J. Chem. Phys. 1998, 108 , 616. 29. Nakao, Y.; Hirao, K. J. Chem. Phys. 2004, 120 , 6375. 30. Christiansen, O.; Manninen, P.; Jørgensen, P.; Olsen, J. J. Chem. Phys. 2006, 124 , 084103 31. F¨orner, W.; Ladik, J.; Otto, P.; E´ızˇ ek, J. Chem. Phys. 1985, 97 , 251. 32. Li, S.; Ma, J.; Jiang, Y. J. Comput. Chem. 2002, 23 , 237. 33. Li, S.; Shen, J.; Li, W.; Jiang, Y. J. Chem. Phys. 2006, 125 , 074109. 34. Saebø, S.; Baker, J.; Wolinski, K.; Pulay, P. J. Chem. Phys. 2004, 120 , 11423. 35. Azhary, A. E.; Rauhut, G.; Pulay, P.; Werner, H.-J. J. Chem. Phys. 1998, 108 , 5185. 36. Rauhut, G.; Werner, H.-J. Phys. Chem. Chem. Phys. 2001, 3 , 4853. 37. Sch¨utz, M.; Werner, H.-J.; Lindh, R.; Manby, F. R. J. Chem. Phys. 2004, 121 , 737. 38. Yang, W. Phys. Rev. Lett. 1991, 66 , 1438. 39. Yang, W.; Lee, T.-S. J. Chem. Phys. 1995, 103 , 5674. 40. Exner, T. E.; Mezey, P. G. J. Phys. Chem. A 2004, 108 , 4301. 41. He, X.; Zhang, J. Z. H. J. Chem. Phys. 2005, 122 , 031103. 42. Chen, X.; Zhang, Y.; Zhang, J. Z. H. J. Chem. Phys. 2005, 122 , 184105.

REFERENCES

257

43. Chen, X.; Zhang, J. Z. H. J. Chem. Phys. 2006, 125 , 044903. 44. Li, W.; Li, S. J. Chem. Phys. 2005, 122 , 194109 45. Gu, F. L.; Aoki, Y.; Korchowiec, J.; Imamura, A.; Kirtman, B. J. Chem. Phys. 2004, 121 , 10385. 46. Kitaura, K.; Ikeo, E.; Asada, T.; Nakano, T.; Uebayasi, M. Chem. Phys. Lett. 1999, 313 , 701. 47. Fedorov, D. G.; Kitaura, K. J. Chem. Phys. 2004, 120 , 6832. 48. Fedorov, D. G.; Ishida, T.; Uebayasi, M.; Kitaura, K. J. Phys. Chem. A 2007, 111 , 2722. 49. Fedorov, D. G.; Kitaura, K. J. Phys. Chem. A 2007, 111 , 6904. 50. Morita, S.; Sakai, S. J. Comput. Chem. 2001, 22 , 1107. 51. Sakai, S.; Morita, S. J. Phys. Chem. A 2005, 109 , 8424. 52. Hirata, S.; Valiev, M.; Dupuis, M.; Xantheas, S. S.; Sugiki, S.; Sekino, H. Mol. Phys. 2005, 103 , 2255. 53. Li, W.; Li, S. J. Chem. Phys. 2004, 121 , 6649. 54. Li, S.; Li, W.; Fang, T. J. Am. Chem. Soc. 2005 127 , 7215. 55. Deev, V.; Collins, M. A. J. Chem. Phys. 2005, 122 , 154102. 56. Collins, M. A.; Deev, V. A. J. Chem. Phys. 2006, 125 104104. 57. Bettens, R. P. A.; Lee, A. M. J. Phys. Chem. A 2006, 110 , 8777. 58. Lee, A. M.; Bettens, R. P. A. J. Phys. Chem. A 2007, 111 , 5111. 59. Jiang, N.; Ma, J.; Jiang, Y. J. Chem. Phys. 2006, 124 , 114112. 60. Li, W.; Fang, T.; Li, S. J. Chem. Phys. 2006, 124 154102. 61. Ganesh, V.; Dongare, R. K.; Balanarayan, P.; Gadre, S. R. J. Chem. Phys. 2006, 125 , 104109. 62. Rahalkar, A. P.; Ganesh, V.; Gadre, S. R. J. Chem. Phys. 2008, 129 , 234101. 63. Dahlke, E. E.; Truhlar, D. G. J. Chem. Theory Comput. 2007, 3 , 46. 64. Dahlke, E. E.; Truhlar, D. G. J. Chem. Theory Comput. 2007, 3 , 1342. 65. Li, W.; Li, S.; Jiang, Y. J. Phys. Chem. A 2007, 111 , 2193. 66. Hua, W.; Fang, T.; Li, W.; Yu, J.-G.; Li, S. J. Phys. Chem. A 2008, 112 , 10864. 67. Li, S.; Li, W. Annu. Rep. Prog. Chem. Sect. C 2008, 104 , 256. 68. Li, W.; Dong, H.; Li, S. Progress in Theoretical Chemistry Physics, Vol. 18, Frontiers in Quantum Systems in Chemistry Physics, Wilson, S., Grout, P. J., Maruani, J., Delgado-Barrio, G., and Piecuch, P., Eds., Springer-Verlag, Berlin, 2008, pp. 289–299. 69. Zhang, D. W.; Zhang, J. Z. H. J. Chem. Phys. 2003, 119 , 3599. 70. Zhang, D. W.; Xiang, Y.; Zhang, J. Z. H. J. Phys. Chem. B 2003, 107 , 12039. 71. Gadre, S. R.; Shirsat, R. N.; Limaye, A. C. J. Phys. Chem. 1994, 98 , 9165. 72. Pulay, P. Adv. Chem. Phys. 1987, 69 , 241. 73. Amos, R. D.; Rice, J. E. Comput. Phys. Rep. 1989, 10 , 147. 74. The criterion for hydrogen bonds X − H · · · Y in our calculations is rH···Y ≤ ˚ ∠X − H · · · Y ≥ 120◦ . ˚ rX···Y ≤ 3.5A 2.9A 75. Foster, J. P.; Weinhold, F. J. Am. Chem. Soc. 1980, 102 , 7211. 76. Reed, A. E.; Weinstock, R. B.; Weinhold, F. J. Chem. Phys. 1985, 83 , 735.

258

THE ENERGY-BASED FRAGMENTATION APPROACH

77. Hurst, J. B.; Dupuis, M.; Clementi, E. J. Chem. Phys. 1989, 89 , 385. 78. Kamada, K.; Ueda, M.; Nagao, H.; Tawa, K.; Sugino, T.; Shmizu, Y.; Ohta, K. J. Phys. Chem. A 2000, 104 , 4723. 79. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28 , 235. 80. Case, D. A.; Cheatham, T. E., III; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M., Jr.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. J. Comput. Chem. 2005, 26 , 1668. 81. Ponder, J. W. Tinker Software Tools for Molecular Design, 4.2 ed., http://dasher.wustl.edu/tinker, 2004. 82. Jørgensen, W. L.; Chandrasekhar, J; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79 , 926. 83. http://www.pci.tu-bs.de/agbauerecker/Sigurd/WaterClusterDatabase/. 84. Li, S.; Li, W.; Fang, T.; Ma, J.; Jiang, Y. LSQC Program, version 1.1 , Nanjing University, Nanjing, China, 2006. 85. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; et al. In Gaussian 03, Revision D.01 , Gaussian, Inc., Wallingford, CT, 2004. 86. Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S. J.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J. Comput. Chem. 1993, 14 , 1347. 87. Li, W.; Piecuch, P.; Gour, J. R.; Li, S. J. Chem. Phys. 2009, 131 , 114109. 88. Frechet, D.; Guitton, J. D.; Herman, F.; Faucher, D.; Helynck, G.; du Sorbier, B. M.; Ridoux, J. P.; James-Surcouf, E.; Vuilhorgne, M. Biochemistry 1994, 33 , 42. 89. Farkas, O.; Schlegel, H. B. J. Chem. Phys. 1999, 111 , 10806. 90. Schlegel, H. B. J. Comput. Chem. 1982, 3 , 214. 91. Pulay, P.; Fogarasi, G. J. Chem. Phys. 1992, 96 , 2856. 92. Leach, A. R. Molecular Modelling: Principles and Applications, Addison Wesley Longman, London, 1996. 93. Structures available at http://itcc.nju.edu.cn/itcc/shuhua/Mol/. 94. http://www-unix.mcs.anl.gov/mpi/.

8

MNDO-like Semiempirical Molecular Orbital Theory and Its Application to Large Systems TIMOTHY CLARK Computer-Chemie-Centrum, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany

JAMES J. P. STEWART Stewart Computational Chemistry, Colorado Springs, Colorado

In this chapter we describe modern MNDO-like semiempirical theory and its application either to very large molecules or to a very large number of smaller ones. We use the term MNDO-like to describe methods that use variations of the original MNDO1 and MNDO/d2 – 6 techniques. This covers essentially all commonly used techniques, which all use the original multipole formulation for the two-electron integrals, and many of the original MNDO approximations. We first outline the theory of LCAO-SCF methods in general, followed by a more detailed discussion of the neglect of diatomic differential overlap (NDDO) approximation and the MNDO technique. We discuss individual Hamiltonians and their parameterization and describe the strengths of these remarkably powerful methods and their application to large systems.

8.1 BASIC THEORY 8.1.1 LCAO-SCF Theory

The two approximations linear combination of atomic orbitals (LCAO) and selfconsistent field (SCF) form the core of modern (MNDO-like) semiempirical molecular orbital theory. They have been described in many standard textbooks Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

259

260

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

but are important for understanding MNDO-like techniques and so are outlined briefly here. We can write the Hamiltonian for a molecule that consists of M nuclei and N electrons as N 1

H =

i=1

2

∇i2 +

M A=1

N N M M N M 1 ZA 1 ZA ZB ∇A2 − + + 2MA RAi r RAB > ij > i=1 j

i=1 A=1

i

A=1 B

A

(8.1) where the indices i and j run over the electrons and A and B over the nuclei. The individual terms that make up the Hamiltonian are defined in Table 8.1. We make use of the Born–Oppenheimer approximation,7 which in turn uses the fact that the nuclei move so much more slowly than the electrons that the former can, in effect, be regarded as being stationary. This reduces the kinetic energy of the nuclei to zero and makes the nucleus–nucleus repulsion term a constant, so that they can be neglected in the electronic Hamiltonian: H = Hnuclear + Helectronic = Hnuclear +

N 1 i=1

2

∇i2 −

N N M N ZA 1 + RAi r > ij i=1 A=1

i=1 j

i

(8.2) TABLE 8.1

Definitions of the Individual Terms in Eq. (8.1)

Term

Definition

Variables

Kinetic energy of the electrons

∇i = the first derivative of the position of electron i with respect to time (its velocity)

Kinetic energy of the nuclei (zero within the Born–Oppenheimer approximation)

∇A = the first derivative of the position of nucleus A with respect to time (its velocity)

N M ZA RAi

Nucleus–electron attraction

ZA is the nuclear charge of atom A and RAi is the distance between atom A and electron i

N N 1 r > ij

Electron–electron repulsion

rij is the distance between electrons i and j

Nucleus–nucleus repulsion (constant within the Born–Oppenheimer approximation)

RAB is the distance between atoms A and B

N 1 i=1 M A=1

2

∇i2

1 ∇2 2MA A

i=1 A=1

i=1 j

i

M M ZA ZB RAB >

A=1 B

A

BASIC THEORY

261

where the total Hamiltonian H has now been separated into nuclear and electronic components. This allows us to write the total energy as the sum of the nuclear repulsion energy and the electronic energy defined by the Hamiltonian Helectronic : Etotal = Eelectronic +

M M ZA ZB RAB >

A=1 B

(8.3)

A

Thus, we “only” need to calculate the electronic energy, which according to the Schr¨odinger equation8 is obtained from the electronic wavefunction. The electronic wavefunction electronic in turn is a function of the positions and spins of the N electrons of the system: electronic = (x1 , x2 , x3 , . . . , xN )

where xi = {ri , ωi }

(8.4)

Here ri denotes the (vector) position of electron i and ωi its spin. Thus, the wavefunction is a function of 4N variables (the three coordinates and the spin per electron). To cut a long story short, we can only solve Schr¨odinger’s equation for systems with only one electron, so we are forced to introduce approximations. The first of these is the SCF (also known as mean-field or Hartree–Fock ) approximation.9,10 Basically, rather than solving the Schr¨odinger equation for many particles, we approximate the many-particle solution in terms of many one-electron wavefunctions, which are solvable. This means that we make the approximation that Helectronic ≈

N

hi

(8.5)

i=1

where hi is the one-electron Hamiltonian for electron i . This leads to the Hartree product, HP , which is an approximation for a many-electron wavefunction, electronic : HP (x1 , x2 , . . . , xN ) = χ1 (x1 )χ2 (x2 ) · · · χN (xN )

(8.6)

In Eq. (8.6), χi are the spin orbitals, which are one-electron wavefunctions. The Schr¨odinger equation based on the Hartree approximation becomes H HP = EHP ,

(8.7)

so that the eigenvalues εi of the one-electron wavefunctions χi can be summed to give the electronic energy: Eelectronic =

N i=1

εi

(8.8)

262

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

This would all be fine except for one significant complication. Because electrons are fermions (i.e., they have spin), they must obey the Pauli exclusion principle,11 which can be formulated as the antisymmetry principle, which states that the wavefunction must be antisymmetric with respect to the exchange of any two electrons. Fock’s contribution was to point out that the Hartree product does not obey the antisymmetry principle. Slater12 later pointed out that the wavefunction suggested by Fock can be expressed as a determinant now known as a Slater determinant, Slater : χ1 (x1 ), χ2 (x1 ), . . . , χN (x1 ) χ1 (x2 ), χ2 (x2 ), . . . , χN (x2 ) 1 (8.9) Slater = √ .. .. .. N! . . . χ1 (xN ),χ2 (xN ), . . . , χN (xN ) √ The prefactor 1/ N ! is simply a normalization constant. This is the Hartree–Fock (or SCF) wavefunction, but the question remains as to how we define the spin orbitals χi . This is where the almost universal LCAO approximation, introduced by Erich H¨uckel,13 comes into play. H¨uckel’s idea was that molecular orbitals (in our case the χi introduced above) can be represented as a linear combination of atomic orbitals appropriate for the constituent atoms. For a system constituted of N atomic orbitals (AOs),

NAOs

χi =

cji ϕj

(8.10)

j =1

where cji is the coefficient of atomic orbital ϕj in molecular orbital χi , and the NAOs i 2 (cj ) = 1. coefficients are normalized so that j =1 We still cannot solve for the wavefunction directly, even using the SCF and LCAO approximations. This is where the variational principle, which says that there are no solutions with a lower energy than the correct wavefunction, comes into play. Solutions are generally found by starting with a set of guessed molecular orbitals χi and iterating until the energy converges to its minimum value and the electron density does not vary. We discuss this algorithm in more detail below. 8.1.2 Implications of LCAO-SCF Theory

LCAO-SCF theory is remarkably successful but has two limitations that we need to discuss in order to understand MNDO-like theories better. The first is a consequence of the SCF approximation and is known as electron correlation. Physically, the introduction of the Hartree product [Eq. (8.6)] means that the electrons do not feel each other individually. Instead, each electron feels the electron density (but not the instantaneous positions) of the others. This means that the individual electrons are not given the opportunity to avoid each other

BASIC THEORY

263

instantaneously, which they would obviously do because they are negatively charged. Thus, the SCF approximation means that the electron–electron repulsion is overestimated. This effect, which is purely a consequence of the SCF approximation, is known as dynamic correlation.14 A second type of correlation (nondynamic or static correlation) has also been defined. It is a consequence of using only a single Slater determinant to describe the wavefunction. Although most “normal” molecules can be described very well using a single Slater determinant, some (such as diradicals) cannot. This is essentially because the wavefunction cannot be described adequately by a single scheme in which a single set of molecular orbitals is occupied by zero, one, or two electrons. This second type of correlation is very different from the first and not as easily treated. However, the implicit treatment of dynamic correlation in MNDO-like theories is poorly appreciated and will be discussed below. The second implication of the LCAO-SCF approximations concerns the limitations placed on the wavefunction by the atomic orbitals used to form the MOs. Although the LCAO approximation is very instinctive and actually forms the basis of our qualitative understanding of bonding effects,15 it nevertheless has no physical basis. It is very convenient for calculations, but we can also describe MOs as combinations of non-atom-centered functions or simply as numerical grids. The LCAO approach, however, does bring some limitations. We can only describe wavefunctions that are linear combinations of the atomic orbitals [which are usually called the basis set in ab initio and density functional theory (DFT) calculations]. Current MNDO-like semiempirical techniques use single-valence basis sets. This means that each atomic orbital in the valence shell is represented by only one basis function. This, in turn, means that the size of the orbital is fixed, although in reality some valence orbitals are more or less diffuse than others. This is a serious limitation in ab initio and DFT calculations, but appears to be less serious in MNDO-like techniques. The one possible exception is hydrogen, for which a single valence 1s orbital is not ideal in some bonding situations.16 8.1.3 Neglect of Diatomic Differential Overlap

The NDDO approximation is perhaps the key simplification made in MNDOlike semiempirical MO theories. Interestingly, although some adverse effects of other approximations have been identified (see below), the NDDO approximation appears to be extremely robust and does not lead to identifiable systematic errors. In full (ab initio) Hartree–Fock theory, calculating the electron–electron repulsion requires that all integrals of the type (μυ|λσ) (i.e., all integrals in which the indices μ, ν, λ, and σ vary from 1 to NAOs , the number of atomic orbitals) be 4 /4, calculated. This means that a very large number of integrals (formally NAOs if we ignore symmetry) must be calculated and processed in every iteration of the SCF procedure. The NDDO approximation sets all integrals (μν|λσ) to zero in which either atomic orbitals μ and ν or λ and σ are on different atoms. The combinations μν and λσ are known as charge distributions, so that the NDDO approximation can also be expressed as meaning that we only consider integrals between charge distributions μν and λσ situated on single, but not necessarily

264

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

the same, atoms. Thus, the NDDO approximation reduces the problem of calculating and using the two-electron integrals (i.e., those needed for calculating the electron–electron repulsion) from one of four centers to one of only two; we calculate only one- and two-center two-electron integrals and ignore three- and four-center two-electron integrals. Having reduced the number of integrals to be calculated, we need an efficient technique to calculate them. Ab initio and DFT calculations often use basis sets based on Gaussian functions because these are particularly suitable for calculating the integrals. Gaussian orbitals have the form ϕlm (r) = Ylm e−ζr

2

(8.11)

where Ylm is the angular part (a spherical harmonic function) of the orbital with principal quantum number l and angular momentum quantum number m. The 2 expression e−ζr describes the radial behavior of the wavefunction, where ζ is the exponent that governs how fast the wavefunction falls off with increasing distance r from the nucleus. Despite their almost universal use as atom-centered basis sets in ab initio and DFT techniques, Gaussian functions are far from ideal. Because the distance from the nucleus is squared in the exponent, the wavefunction falls off far faster than it should do and also does not describe the wavefunction at the nucleus correctly. A far better choice would be Slater orbitals, which have the form ϕlm (r) = Ylm e−ζ|r|

(8.12)

However, the two-electron integrals are very expensive to calculate for Slater orbitals, so that they are not used as often as Gaussians, despite their inherent advantages. MNDO-like techniques use Slater-type orbitals, but must therefore resort to a fast, approximate method for calculating the two-electron integrals. This is the multipole approach introduced with MNDO1 and extended to d-orbitals for MNDO/d.2 In this approximation, the interactions between Slater orbitals are approximated as interactions between electrostatic monopoles, dipoles, and quadrupoles, which allows the integrals to be calculated very effectively and with reasonable accuracy. The multipole model has been used to calculate the molecular electrostatic potential for MNDO-like wavefunctions, and the definitions for all the multipoles for the 45 charge distributions that arise with an s-, p-, d-basis set have been listed.17 An important approximation in standard MNDO-like theories is that the basis set (the atomic orbitals) is assumed to be orthogonal (i.e., the orbitals have zero overlap with each other). This saves an initial orthogonalization step in the SCF calculation, which would slow semiempirical calculations considerably. Jorgensen et al.18 reintroduced this orthogonalization into MNDO and found that the resulting method (NO-MNDO) performed as well as later, more highly parameterized, methods and gave improvements in two problem areas: the rotational

BASIC THEORY

265

barriers about C—C single bonds and the relative stabilities of branched and unbranched hydrocarbons. NO-MNDO require about twice the CPU time needed for a standard MNDO calculation. A better known solution to the orthogonalization problem is to add an orthogonalization correction that mimics the effects of the orthogonalization step at less cost in CPU time. This is the basis of the OMn (n = 1 to 3) methods introduced by Thiel and co-workers.19 – 22 These methods are probably the most sophisticated MNDO-like techniques available. One of the most difficult areas in MNDO-like theories is the treatment of the nucleus–nucleus repulsion. What appears initially in Eq. (8.1) and Table 8.1 to be a very simple Coulomb repulsion is, in fact, a fairly complex entity in MNDOlike theories. The problem arises from the fact that the Coulomb interactions in MNDO-like theories are not all treated equally well. Whereas we treat the nucleus–nucleus repulsion exactly in Eq. (8.1), introducing the NDDO approximation leads to some neglect of Coulomb terms involving the electrons. Specifically, the long-range behavior of the electron–electron and nucleus–electron integrals is not correct, so that the simple, physically correct nucleus–nucleus repulsion term in Eq. (8.1) would lead to a net repulsion between neutral atoms or molecules at distances outside their van der Waals radii. Thus, an artificial screening effect must be introduced. In MNDO, the nucleus–nucleus repulsion term EAB becomes MNDO = ZA ZB (sA sA |sB sB )(1 + e−αARAB + e−αB RAB ) EAB

(8.13)

where the integral is treated in the same way as the electron–electron integrals and the two constants αA and αB are parameters specific to the elements A and B. However, MNDO is not able to reproduce hydrogen bonds, an effect that was,23 probably erroneously,16 attributed to the nucleus–nucleus repulsion being too strong. Therefore, this term was modified by the addition of up to four Gaussian terms in MNDO/H.23 These Gaussian terms were later adopted for other methods (see below), but lead to some artifacts. The corresponding expression for EAB becomes

EAB

⎛ ⎞ Z Z 2 2 A B ⎝ MNDO = EAB + aA,i e−bA,i (RAB −cA,i ) + aB,j e−bB,j (RAB −cB,j ) ⎠ RAB i

j

(8.14) where there are i Gaussian functions for atom A and j for atom B. The variables a, b, and c are parameterized for each element [A and B in Eq. (8.14)] and each individual Gaussian function [1 − i and 1 − j in Eq. (8.14)]. Use of these Gaussian functions is not without hazard because they can lead to spurious minima24 and is generally undesirable because the function introduce a large number of additional parameters for each element. A solution that has been found more practical and yields very good results is to introduce two-center terms in to the nucleus–nucleus repulsion, as suggested originally for AM1(d)

266

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

by Voityuk and R¨osch.25 The nucleus–nucleus repulsion term then becomes MNDO (1 + δAB e−αAB RAB ) EAB = EAB

(8.15)

where δAB and αAB are parameters specific to the pair of elements AB. In addition, it is common to use distance-dependent expressions for metal–hydrogen nucleus–nucleus interactions. The problem with all these corrections is that they essentially represent fixes to a fundamental deficiency of current MNDO-like theories. In addition, they all represent modifications to a two-center potential and can adversely affect the parameterization of other such interactions because the effects of the two potentials are not independent of each other. 8.1.4 SCF Iterations and Pseudodiagonalization

Figure 8.1 is a standard flow diagram for a semiempirical MO SCF iteration algorithm. Given a set of Cartesian coordinates, the number of electrons, and the spin multiplicity, the program first assigns atomic orbitals (the basis set) to the atoms and calculates the one-electron matrix, which contains all the interactions except the electron–electron term. In order to proceed, an initial guess density matrix is required. In standard semiempirical MO programs, this initial guess consists of simply dividing the electrons evenly over the available atomic orbitals. More sophisticated initial guesses, such as extended H¨uckel MOs, could be envisaged but would involve an extra diagonalization. The two-electron contribution is then added to the one-electron matrix to give the Fock matrix. This two-electron contribution depends on the density matrix and the two-electron integrals, which are generally precalculated and stored in memory. The Fock matrix is then diagonalized to give a new set of MOs, from which a new density matrix can be generated. The total energy and the density matrix are then tested

Calculate oneelectron matrix Calculate twoelectron integrals Calculate initial guess density matrix

Convergence test Assemble Fock matrix Diagonalize ( MOs)

Fig. 8.1

Calculate density matrix

Standard semiempirical MO SCF flow diagram.

BASIC THEORY

267

for convergence by comparison with the last cycle, and if they have not yet converged, another SCF cycle is started using the new density matrix. The energy improves from cycle to cycle and the density converges steadily until they are both static within predefined thresholds, after which the program exits the SCF cycles. In practice, additional features, such as interpolation schemes, damping, or level shifting, are often included to improve convergence, but Fig. 8.1 gives the basics of the algorithm. However, because the other steps of the calculation are so fast, the diagonalization of the Fock matrix typically takes up approximately 50% of the CPU time for an implementation such as that shown in Fig. 8.1. This is often not appreciated because the diagonalization is a relatively minor component of the calculation for ab initio or DFT calculations. Modern semiempirical programs therefore do not perform full diagonalizations in every SCF cycle but, rather, switch to pseudodiagonalization 26 as soon as the SCF converges far enough. This is shown in Fig. 8.2. The pseudodiagonalization procedure is key to the remaining discussion and therefore is described in detail. The principle of pseudodiagonalization is that the MO eigenvectors are updated but not their eigenvalues. However, as the differences between eigenvalues are needed for the pseudodiagonalization procedure, full diagonalizations must be performed until the eigenvalues have settled to more or less constant values. This is shown in Fig. 8.2. Full diagonalizations are performed until a given threshold (usually, convergence on the density matrix, although convergence of the eigenvalues would be more relevant), after which the pseudodiagonalization can be used until the SCF criteria are met. A final full diagonalization must be performed after convergence to obtain the final eigenvalues and eigenvectors. Using the pseudodiagonalization procedure rather than full diagonalizations at every cycle does not slow convergence and speeds up the calculation by approximately a factor of 2. Just as important, the

Convergence test

Final diagonalization

Assemble Fock matrix > 10–1

Diagonalize ( MO vectors and Eigenvalues)

Convergence?

< 10–1

Calculate density matrix

Pseudodiagonalize ( MO vectors only)

Fig. 8.2 Cyclic section of the SCF iteration algorithm with pseudodiagonalization.

268

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

pseudodiagonalization procedure has properties that can be exploited for alternative SCF iteration schemes, as outlined below. Note that separate calculation of the eigenvalues and pseudodiagonalization can be used to replace the full diagonalizations in Fig. 8.2. Alternatively, if the initial guess is close enough to the final solution, no initial full diagonalizations are needed. The principle behind pseudodiagonalization is that improvements in the eigenvectors for the occupied MOs must come from mixing with virtual MOs. Essentially, there is nothing to win by mixing two occupied MOs. Therefore, the first step is to calculate the occupied-virtual block of the Fock matrix, , in the current MO basis: = co+ F cv

(8.16)

where the subscripts o and v denote the occupied and virtual blocks, respectively, c are the current eigenvector coefficients, and F is the Fock matrix. Large elements of indicate strong interactions between occupied and virtual MOs, which must be removed by mixing the two. The mixing is achieved by a Givens rotation. For an updated occupied eigenvector c˜o , 2 )c c˜o = xov co − (1 − xov (8.17) v where co and cv are the coefficients of the relevant occupied and virtual eigenvectors, respectively, and xov is the rotation angle between the two eigenvectors. The expression for the corresponding updated virtual eigenvector is 2 )c (8.18) c˜v = xov cv + (1 − xov o Thus, the Givens rotations simply mix an occupied MO with a virtual MO with which it interacts strongly. However, the rotation angle xov must be determined before the rotation can be carried out. This is achieved using what is essentially a first-order perturbation theory expression: xov =

ov εo − εv

(8.19)

where ov is the element of that connects the occupied and virtual orbitals o and v, and εo and εv are the eigenvalues of these two orbitals, respectively. This expression explains the need for relatively constant eigenvalues (or eigenvalues calculated explicitly from the eigenvectors) before using the pseudodiagonalization, as these determine the rotation angles. The importance of the pseudodiagonalization procedure is that is allows us to select which orbitals to mix in a very transparent way. This feature is used, for example, in the MOZYME algorithm (see below). For normal-sized molecules, one possible implementation is to calculate and to select a certain proportion

BASIC THEORY

269

of the largest elements (the details of this step vary from implementation to implementation) in order to carry out the rotations between the orbitals connected by these elements. After testing for convergence and calculating the new density and Fock matrices, is calculated for the new Fock matrix and the process is repeated until convergence. 8.1.5 Dispersion

MNDO-like semiempirical MO techniques exhibit the weakness also found for ab initio Hartree–Fock and DFT: that weak (van der Waals) interactions (dispersion) are not reproduced. This problem is more severe than might seem at first sight because, in addition to the obvious intermolecular interaction energies, the intramolecular dispersion energies, which become very significant for large molecules such as those now treated routinely by MNDO-like methods, are also affected. The solution that was introduced for ab initio Hartree–Fock27 and has also been used for DFT28 – 30 has been to add a classical two-center potential with a damping function for short distances to the DFT Hamiltonian. A similar correction has been added to SCC-DFTB calculations (see Chapter 9).31 Such corrections are very successful, but suffer from the inherent problem for MNDOlike methods that they represent an additional two-center potential that can lead to linear dependencies with the nucleus–nucleus potential function. This is not a problem if the dispersion term is added after parameterization, as in OMnD,32 although some methods have been reported in which a dispersion potential was parameterized together with the remaining parameters.33 A more consistent way to treat this problem is to modify the existing two-center potential (the nucleus–nucleus repulsion potential) to include the effects of dispersion. This is the approach used by PM6,34 for which the core–core term is given by 6

PM6 MNDO EAB = EAB (1 + δAB e−αAB (RAB +0.0003RAB ) )

(8.20)

This modification of Voityuk and R¨osch’s formula [Eq. (8.15)] behaves very ˚ and larger gives a noticeably similarly at short distances, but at distances of 3 A smaller repulsion. This, together with an additional correction to take account of the nonvalence electrons (which are neglected in MNDO-like methods), leads to better performance and behavior similar to that expected from a method that includes dispersion. Each of these modifications assumes that the dispersion interaction attributable to a given atom is isotropic. Even if we accept the hypothesis that dispersion interactions can be assigned on an atom–atom basis, this is probably not a good approximation, for example, for sp2 -hybridized carbon atoms or atoms with lone pairs. One Ansatz that takes this effect into account also has the advantage that the dispersion term can be separated from other two-center potentials because it is based on (and parameterized for) the polarizability. In the early 1970s, Rinaldi and Rivail introduced a variational treatment for calculating molecular electronic polarizabilities using MNDO-like methods.35 This approach leads to very fast calculations but is not very accurate. However, Sch¨urer et al.36 were

270

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

able to show that parameterizing the atomic multipole integrals (three per nonhydrogen element), rather than using the analytical values, gave very accurate molecular electronic polarizabilities. Furthermore, this technique lends itself to (arbitrarily) partitioning the molecular electronic polarizability into atomic, or even atomic–orbital, contributions.37 The “atomic polarizability tensors” thus obtained can be used in conjunction with the London equation38 and a damping function at short distances to provide a dispersion correction to MNDO.39 8.1.6 Need for Linear-Scaling Methods

Using current, readily available computers, conventional semiempirical SCF methods are limited to systems of only a few hundred atoms; above that, the computational effort becomes prohibitive. This limit is a direct consequence of the use of matrix algebra for solving the SCF equations, for which several operations, such as inversion and diagonalization, scale as the third power of the size of the system. By using special methods, such as pseudodiagonalization, this effort can be minimized, but elimination of the N 3 dependency is impossible when matrix algebra is used. Before larger systems could be studied, alternatives to matrix algebra methods had to be developed; two of the more successful are the divide-and-conquer linear-scaling method, and the localized molecular orbital method MOZYME. 8.1.7 Divide-and-Conquer Linear Scaling

Given that the N 3 dependency cannot be eliminated, the computational effort required to solve the SCF for a large system can be reduced by splitting the system into smaller ones, which can then be solved separately. Thus, if a system of N atoms is split into m equal parts, each of the m parts will require a computational effort approximately proportional to (N/m)3 . That is, the total effort is reduced by a factor of m2 . This is the basis for the divide-and-conquer (D&C) method.40 Once special care is taken to ensure that the joins between the various parts are handled correctly, the results are almost indistinguishable from those obtained using exact matrix algebra methods.41 The computational effort involved in the D&C method scales linearly with the size of system, which makes it suitable for modeling phenomena in very large species, including protein–protein interactions.42 8.1.8 Localized Orbital SCF

For a self-consistent field to exist, it is a necessary and sufficient condition that all Fock integrals involving occupied and virtual molecular orbitals be zero. On the assumption that a rough approximation to the electronic structure of a molecule is provided by its Lewis structure, the conditions necessary for an SCF provide a guide for moving from the simple Lewis structure to the optimized electronic structure. This is the premise for MOZYME43 : Starting with a Lewis structure represented by localized molecular orbitals (LMOs) on one or at most two atoms,

PARAMETERIZATION

271

in order to generate an NSCF it is sufficient to eliminate the Fock terms between these LMOs and the nearby virtual LMOs. For each pair of LMOs, this operation is very fast and can be performed using a 2 × 2 Givens rotation. The operation is carried out on every occupied LMO and every nearby virtual LMO. A result of this operation is to move the system in the direction of the SCF. However, because each Givens rotation modifies the occupied and virtual LMOs, the result of one annihilation rotation is to cause some matrix elements that had been eliminated by earlier Givens rotations now to become nonzero. This means that the process of annihilating occupied-virtual LMO interactions must be repeated. Over the first few complete sweeps of Givens rotations, the size of the LMOs, represented by the number of atoms on which the LMO has significant intensity, increases rapidly, and then tapers off as the system converges toward self-consistency. To the degree that each complete set of annihilation steps results in the system moving closer to the energy minimum, the MOZYME method is similar to the conventional matrix algebra procedure. Indeed, when an SCF is achieved, MOZYME and conventional matrix algebra give rise to identical electron density distributions. Surprisingly, the MOZYME method is intrinsically more arithmetically stable than the conventional method. Using conventional methods, an SCF sometimes fails to form—the charge distribution simply oscillates from iteration to iteration. This propensity increases as the HOMO-LUMO energy gap decreases. When the gap is very small, the polarizabilities of the HOMO and LUMO become very large, and autoregenerative charge fluctuations effectively prevent an SCF from forming. In conventional methods the MOs are eigenvectors; therefore, the HOMO–LUMO gap is irreducibly small. By contrast, when LMOs are used, the HOMO–LUMO gap is at or near its maximum possible value, and the polarizability of the HOMO is correspondingly small. One practical consequence is that, in general, the MOZYME procedure requires fewer iterations to achieve an SCF. Using the MOZYME technique, the computational effort scales approximately as N 1.4 , and much larger systems can be studied, with the upper limit now being on the order of 15,000 atoms.44 Because having a starting Lewis structure is a prerequisite, the MOZYME method is limited to systems for which a Lewis structure can be defined. At present, only closed-shell systems are allowed, so while ferrocene, FeII (Cp)2 , and crystalline potassium chromate, K2 CrVI O4 , can be modeled, no open-shell system (e.g., [CrIII (H2 O)6 ]3+ ) can be run. Similarly, systems with extended π-conjugation cannot be treated using the MOZYME or D&C techniques because individual orbitals are delocalized across the boundaries between subsystems or cannot be localized. 8.2 PARAMETERIZATION

Many of the equations used in semiempirical methods contain adjustable parameters. Within the broad family of NDDO45,46 methods, the main difference between the various methods lies in the values of these parameters. Provided that the set

272

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

of approximations is sufficiently flexible and physically realistic, the accuracy of a semiempirical method depends on precisely two quantities: the accuracy and range of the reference data used in determining the values of the parameters and the thoroughness of optimization of the parameters. 8.2.1 Data

The set of reference data used in parameterization must satisfy several criteria: It obviously must be as accurate as possible, it must represent a wide range of chemical systems and properties, and it must be manipulated easily by the parameter optimization program. Several useful collections of reference data are available, such as the NIST databases of atomic energy levels,47 reference heats of formation,48 and atomic47 and molecular ionization potentials,49 and the Cambridge Structural Database50 for molecular geometries. Despite the large amount of available experimental reference data, important gaps or deficiencies exist. For the organic elements C, H, N, and O, this is not a problem, but for less popular elements, particularly transition metals, such as Sc and Tc, there is a paucity of reliable reference data. Where data are missing or are incomplete, the few data that do exist can be augmented by using reference data generated from the results of high-level (i.e., highly accurate) theoretical calculations. Of course, since the objective of a semiempirical method is to model the real world, great care must be taken to maximize confidence in the accuracy of all calculated reference data. In the most recent parameterization, the training set consisted of over 10,000 individual data representing over 9000 separate species. 8.2.2 Parameterization Techniques

Although parameterization might initially appear to be a complicated process, in principle it is really very simple51 : Given a set of reference data, x ref , and a set of adjustable parameters, Pi , the values of the parameters are modified so as to minimize the root-mean-square difference between the data predicted and the reference data. That is, given (xi − xiref )2 (8.21) S= i

parameters are consider optimized when ∂S/∂Pi = 0 and ∂ 2 S/∂Pi2 > 0 for all parameters. The first step is to take all the various reference data (dipole moments, bond lengths, heats of formation, etc.) and render them dimensionless, so that they can be manipulated using standard mathematical tools. Default weighting factors for this operation are shown in Table 8.2. In the early days of parameter optimization, making decisions regarding the initial values for the various parameters for the different elements was difficult52 ; in that groundbreaking work, there was no precedent to refer to. A real risk at that time was that an incorrect choice could result in the parameters converging

PARAMETERIZATION

273

TABLE 8.2 Weighting Factors for Reference Data Reference Data Hf0 Bond length Angle Dipole Ionization potential

Weight 1.0 mol · kcal−1 ˚ mol · kcal−1 0.7 A ˚ mol · kcal−1 0.7 A 20 debye−1 10 V−1

on a false minimum. This risk was not hypothetical; computers available in the 1970s were much less powerful than now and only a small number of reference data could be used in a parameter optimization. This increased the probability that spurious minima might be encountered. Over time, and by dint of hard work, these issues were resolved, and now, more than 30 years later, there is a wealth of knowledge of suitable starting values for parameter optimization. 8.2.3 Methods and Hamiltonians

In ab initio work, different methods (e.g., Hartree–Fock and density functional) can be defined using quantum mechanical terms such as the one- and two-electron operators and instantaneous correlation. These terms are a natural consequence of the underlying quantum theory. Within a given method, a balance can be struck between computational effort and accuracy. In part, this is achieved by the choice of basis set—a small set would give rise to a faster but less accurate method, and vice versa. Ab initio methods are thus defined by two quantities: the method and the basis set. The NDDO-based semiempirical methods, on the other hand, use similar sets of approximations and are best distinguished by the values of the parameters. Minor differences do exist in the approximations, with most of these having to do with the core–core terms. Thus, the oldest NDDO method, MNDO,1 had the simplest core–core term; AM1,53 PM3,54,55 and RM156 had terms added to mimic the van der Waals attraction; and in PM634 diatomic parameters were used. These changes were the results of attempts to make the set of approximations more realistic. That the main difference between the methods lies in the values of the parameters can be readily shown. If the original MNDO set of approximations were used and the parameters for H, C, N, and O were reoptimized using modern reference data and modern optimization techniques, the accuracy of the resulting method would be significantly higher than that of the original MNDO method. This is not to disparage the quality of parameterization in MNDO (when it was first developed, it represented a large improvement over even older methods); rather, it demonstrates how the accuracy of methods can be increased as the quality of parameter optimization improves. NDDO methods are best defined by the set of approximations and the set of parameters. This definition is easily seen to be necessary: If the set of parameters

274

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

is not specified, the three methods AM1, PM3, and RM1, methods of very different accuracies, would become indistinguishable. 8.2.3.1 MNDO First published in 1977, MNDO1,52 is the oldest of the NDDO methods. At that time it represented a large increase in accuracy over the thenpopular MINDO/3.57 There were two reasons for this increase in accuracy: For the first time, a semiempirical method could represent the lone-pair/lone-pair interaction of the type found in hydrazine and in hydrogen peroxide (hitherto, such interactions had simply been ignored) and also for the first time reference data based on experimental results for molecular systems were used in the parameter optimization. Parameters for H, C, N, and O were optimized using data on 34 compounds. The much-increased accuracy of MNDO resulted in its becoming instantly popular. But as it was applied to more and more species, various systematic errors became apparent, the most serious of these being the almost complete absence of a hydrogen bond. 8.2.3.2 AM1 Hydrogen bonds are much weaker than covalent bonds and can best be represented by three terms: an electrostatic, a covalent, and a third term variously called the instantaneous correlation, dispersion, or van der Waals interaction. MNDO included the electrostatic and covalent terms, but not the VDW term. To mimic the effect of the VDW term, during the development of AM1 the core–core interaction in MNDO was modified by the addition of simple Gaussian functions to provide a weak attractive force. This extra stabilization allowed hydrogen bonds to form. Parameters for H, C, N, and O were again optimized, now using a larger set of reference data, and the resulting AM1 method was published in 1985.53 Over the following few years, parameters were optimized for many more main-group elements. Each new element was parameterized without changing the parameters for the original AM1 elements. This resulted in a piecemeal method—the values of the parameters depended on the sequence in which the parameterizations were done. At the time the parameters in the AM1 method were being optimized, two different philosophical approaches were explored. One, advocated by Michael Dewar, was to guide the progress of the optimization by using chemical knowledge. At the same time, by carefully selecting the reference data used in the parameterization, the size of the training data set could be kept to a minimum. The quality of such a method could then be determined by its accuracy and predictive power; that is, the ability of the method to predict the properties of systems not used in the training set. As Dewar had an encyclopedic knowledge in this field, this approach had obvious merit. The other approach, advocated by one of us (J.S.), was to provide the parameter optimization procedure with a wide range of reference data, in the hope that if enough data were provided, the rules of chemistry would be implicitly provided to the parameter optimization. In the development of AM1, the first of these two approaches was used. 8.2.3.3 PM3 In contrast to the approach used in AM1, a large amount of reference data was used in the training set for the development of PM3.54,55 In

PARAMETERIZATION

275

the initial parameter optimization, parameters for 12 elements, H, C, N, O, F, Al, Si, P, S, Cl, Br, and I, were optimized simultaneously. Also, in contrast to the development of AM1, no external constraints based on chemical experience were applied. When PM3 was completed, it was found that the average errors for common properties such as heats of formation were lower than those in AM1, but the troubling question of predictive power of PM3 versus AM1 became more difficult to answer. Possibly because of this, although PM3 was widely used, it was never as widely used as was AM1. PM3 was soon extended to include most,58 and ultimately all,59 of the main group. As with AM1, the later parameterizations were carried out using fixed values for the elements that had previously been parameterized. In the initial PM3 work, parameters for all 12 elements were optimized simultaneously, this eliminating any error due to undesired restrictions on the values of the parameters. At the same time, the training set increased in both size and quality. Each entry in it was checked for consistency with the other data. Errors due to incomplete parameterization and inconsistent reference data were minimized. Despite all this, the average unsigned error in the heat of formation remained stubbornly and unacceptably large. 8.2.3.4 PM6 In 2000, in an attempt to improve the accuracy of a method for modeling systems containing molybdenum, Voityuk and R¨osch25 proposed using diatomic core–core parameters. This modification was tested using various pairs of elements in the first PM3 set. In every case, the average error decreased. The next step was obvious: to replace the original MNDO core–core term with a simple function that used diatomic parameters. A few other minor modifications were made to the core–core term, mainly to cater for highly specific interactions such as the acetylenic triple bond. Parameters for the whole of the main group, plus Zn, Cd, and Hg (three elements that behave like main-group elements), 42 elements in all, were then optimized simultaneously. This was followed by the remaining 27 transition metals of periods 4, 5, and 6, and the fourteenth lanthanide, Lu. Two other approaches had been considered, but these were not completed (PM4) or not published (PM5), so the new method was named PM6. A reasonable question to ask is: How does the accuracy of PM6, the most recent semiempirical method, compare with standard ab initio methods? This can best be answered by comparing standard quantities. In PM6, the accuracy of prediction of heats of formation of common organic compounds is somewhat better than those predicted by B3LYP DFT calculations using the 6-31G(d) basis set,60 which in turn is significantly better than Hartree–Fock, using the same basis set. Unfortunately, Hf0 is the only property for which PM6 is superior to B3LYP, for geometries it is somewhat worse, and for ionization potentials and dipole moments—purely electronic properties—it is significantly worse. There is a reason for this initially surprising high accuracy relative to standard ab initio HF and DFT methods, methods that require considerably more computational effort than PM6. Semiempirical methods are parameterized to reproduce experimental reference data, which by definition take into account

276

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

all possible phenomena. Many of these phenomena (e.g., instantaneous correlation) are extremely difficult to calculate ab initio, but in semiempirical methods their effects are simply absorbed into the values of the parameters, and, in turn, when the methods are used in modeling chemical systems, the effects are reproduced. This benefit comes at a price: In semiempirical methods, each atomic basis set is normally referred to by using the standard principal quantum number (PQN), but because the associated parameters are optimized using experimental data, the basis set cannot strictly be identified with a specific PQN. Instead, it represents the blend of atomic functions that most precisely reproduces the phenomena observed. A result of this is that the theoretical underpinnings of semiempirical methods cannot, and should not, be compared with those of ab initio methods. 8.2.3.5 AM1* AM1*61 – 66 provides an interesting contrast to PM6. In AM1*, d-orbitals were added to various elements that had previously been parameterized at the AM1 level, but the original AM1 parameterization was retained for the elements H, C, N, O, and F. Using the original AM1 parameters for these elements obviously limits its ultimate accuracy. Unlike other methods, where the objective was to increase accuracy, the motivation for the development of AM1* was an exploration of the role of the training data and development of a strategy for increasing the robustness or predictive power. To this end, training data calculated using DFT or ab initio techniques were used extensively to supplement the experimental data available. Also in contrast to PM6, the “chemical intuition” approach was used to provide a “reasonable” parameterization. The resulting method performs very similarly to PM6 in terms of its overall statistics. AM1* is usually statistically better than PM6 for its own training data, but usually not for the PM6 training data set. This is expected for local parameterizations, especially so for cases in which it is impossible to use an independent validation data set because of the lack of experimental data. Together, PM6 and AM1* provide an opportunity to validate results by comparing the results of the two methods, which are essentially identical quantum mechanically but were parameterized using different data and philosophies. 8.2.3.6 Methods with Orthogonalization Corrections The desirability of either explicit orthogonalization of the atomic orbitals18 or a more computationally efficient orthogonalization correction was discussed above. The latter technique has been used by Thiel and co-workers in the OMn methods. The first such method, OM1,19 introduced orthogonalization corrections to the one-electron terms within the NDDO approximation. This work was extended to include two-center corrections and the use of effective core potentials in place of the frozen-core approximation in OM2.20 The faster OM3 method22 neglects some of the expensive, but less important, terms included in OM2. The benefits of orthogonalization corrections lie predominantly in improved performance in reproducing relative conformational energies in, for example, peptides.21 OM2 combined with a multireference configuration-interaction technique performs extremely well for excited states (see below).67

PARAMETERIZATION

277

8.2.3.7 Other Hamiltonians Over the past 30 years, several avenues for improving semiempirical methods have been explored. In each instance there were good reasons to believe that the proposed change would be beneficial. Sometimes this was true; other times the proposed benefit did not materialize or there were competing factors that militated against the change being adopted. Some of the more important ideas that were examined will now be described. MNDOC An increase in accuracy should occur if correlation effects were included in semiempirical methods such as MNDO. This principle was examined by Thiel68 in 1982, when parameters for H, C, N, and O were optimized using a modification of MNDO in which a perturbational correction for electron correlation was included explicitly. Whereas the results obtained using the new method, MNDOC, were better than for stand-alone MNDO, the computational effort was significantly larger, and MNDOC was not widely used. MNDO/d In its original form, MNDO was limited to an sp-basis set. This obviously constrained its use to modeling normal-valent systems; the study of hypervalent species such as H2 SVI O4 and PV Cl5 , which occur frequently in normal chemistry, was precluded. During chemical reactions, many main-group elements expand their valency temporarily to form extra bonds with ligands; such phenomena could not be modeled using MNDO. In 1992, Thiel and Voityuk2 added d -orbitals to some elements, and in 1996 demonstrated6 that this resulted in a significant increase in accuracy, particularly in reducing the average unsigned errors (AUE) in Hf0 . The new method involved optimizing parameters for several elements that could be hypervalent, but did not involve reoptimizing those for the other MNDO elements. As such, it was a piecemeal approach. Nevertheless, the demonstration was convincing, and all subsequent methods employed used Thiel and Voityuk’s multipole formalism for the integrals involving d -orbitals. SAM1 While modifications to the core–core repulsion function have resulted in large improvements in accuracy, another function, the electron repulsion integral (ER), should also be regarded as a candidate for examination. Various forms of the ER were examined, and parameters for H, C, N, and O were optimized. When it was published in 1993, the new method, SAM1,69 was shown to be more accurate than the then-current methods AM1 and PM3. It is unfortunate that no further work has been reported on this topic: If the improvements resulting from modifying the ER approximation are real, and there is no reason to doubt that, there is a high probability that further work on modifying the ER term would result in significant improvements over current methods. PDDG As just mentioned, a computationally inexpensive way to reduce error in NDDO-type methods is by modification of the core–core term. In MNDO itself, the analytic expression ZA ZB /RAB had been replaced by an approximation that took into account the long-range electron–nuclear attraction and

278

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

electron–electron repulsion terms. The core–core term had been further modified in AM1 and PM3, and in PDDG, Jorgensen et al., explored the effects of using a pairwise distance-directed Gaussian modification.70 At the heart of the PDDG method is a modification of the core repulsion function, the modification being the addition of the following term: PDDG(A,B) =

A BA

1 nA + nB

⎧ 2 2 ⎨ ⎩

i=1 j =1

(nA PAi

⎫ ⎬

+ nB PBj ) exp −10(RAB − DAi − DBj )2 ⎭ (8.22) and DAi are

where nA is the number of valence electrons on atom A, and PAi parameters. As with SAM1, the PDDG method resulted in an increase in accuracy over AM1 and PM3.

RM1 A convincing demonstration of the importance of training set and parameterization is provided by RM1.56 Starting with the AM1 method, and without making any change to the formalism, parameters for H, C, N, P, P, S, F, Cl, Br, and I were reoptimized. The AUE for heats of formation dropped to about half of that for AM1, and for dipole moments the accuracy exceeded that of PM6.

8.3 NATURAL HISTORY OR EVOLUTION OF MNDO-LIKE METHODS

The evolution of NDDO methods has followed a completely logical course. When it first appeared, MNDO represented a large improvement over the earlier purely atom-based method, MINDO/3. This improvement was due to the more sophisticated set of approximations and to the use of molecular reference data. Only after it had been used for awhile did severe errors in MNDO become apparent, the most important of these being the almost complete lack of a hydrogen bond. This deficiency contained within it an indication of the direction for further improvement—to add a term to represent the hydrogen bond. Still using a small set of reference data, parameters for H, C, N, and O were reoptimized; this resulted in AM1. A consequence of piecemeal parameterization of AM1, in which the first elements parameterized were not reoptimized when more elements were added, was that the final set of parameters were by no means optimal. An obvious next step to correct this was to investigate the consequences of optimizing many elements simultaneously using large amounts of reference data. This gave rise to PM3. No further reduction in accuracy could be achieved by better parameterization or better reference data, so the focus turned to the third and last possible cause of error: the set of approximations used. The core–core terms were modified

NATURAL HISTORY OR EVOLUTION OF MNDO-LIKE METHODS

279

to include diatomic parameters, and a reparameterization involving the entire main group resulted in a dramatic drop in AUE for heats of formation. The new method was named PM6. Each modification addressed a definite fault in the earlier method and resulted in a significant improvement in accuracy. This sequence of incremental improvement is both clear and simple and the overall effect is a natural evolution in the direction of increased accuracy. As the accuracy improves, various faults in any given method that were hidden by much more severe errors in earlier methods become apparent, and these could then be addressed. There is every indication that this sequence will continue far into the future. As just mentioned, the most recent method, PM6, represents a large improvement over PM3. Nevertheless, soon after it was released, errors that were masked by the relatively large errors in PM3 became apparent, the most important of these being a bias in favor of zwitterions instead of neutral biochemical species. It is likely that such errors had existed in earlier methods, but they only became obvious in PM6. In principle, correcting such an error is straightforward—simply adding appropriate reference data to the training set and rerunning the parameterization. In practice, such operations are time consuming, as checks have to be run to ensure that none of the previous gains made are compromised. 8.3.1 Strengths of MNDO-like Methods

The most recent methods developed from the MNDO line, PM6 and AM1*, are particularly useful, that is, accurate, in modeling the structural and thermochemical properties of a broad swath of ordinary chemistry, particularly biochemical systems. However, like the earlier methods, their accuracy is much reduced when they are used for modeling exotic systems, such as transition states, electronic excited states, high-energy systems such as radicals, and solids with low or zero bandgaps, such as metals. For such systems, ab initio methods still reign supreme. In part, this reflects the emphasis or bias imposed on the parameterization: Since one of the objectives of the development of PM6 was to focus on systems of biochemical interest, it is not surprising that it is particularly suitable for modeling such systems. This accuracy comes at a price: A direct consequence of the increased emphasis on ordinary chemistry is the inability to model exotic systems accurately. AM1* provides some contrast because of the conscious attempt to represent “more chemistry” in its parameterization. Once again, the dominant effect of the training data on determining the range of applicability of a semiempirical molecular orbital method cannot be overemphasized. Nevertheless, MNDO-like methods as a general class have important strengths that have tended to be forgotten since the rise of DFT techniques. We outline some of these below. 8.3.1.1 Correlation in MNDO-like Methods As outlined in Section 8.1, MNDO-like methods are based on the LCAO-SCF approximations. They do not, therefore, explicitly include electron correlation. However, in an analogy

280

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

to DFT that is often overlooked, dynamic correlation is included implicitly in MNDO-like techniques. This is achieved through parameterization (experimental results clearly include correlation) and through scaling of the two-electron integrals so that they are correct at the one-center limit (i.e., at RAB = 0). Perhaps the best known pre-MNDO scaling scheme is that of Klopman–Ohno.71,72 In MNDO1 this scaling is achieved by constructing the multipoles used to calculate the two-electron integrals so that they give the correct values at RAB = ∞ and at the one-center limit. The values at the one-center limit are determined by fitting to atomic spectra using Oleari’s method.73,74 This restriction was relaxed when PM3 was introduced54 and the one-center two-electron repulsion integrals were treated as variable parameters. The result of this integral scaling is similar to that of treating electron correlation using a functional of the density in DFT. Dynamic correlation can be treated quite effectively in this fashion and the implicit consideration of dynamic correlation in MNDO-like methods has important consequences for configurationinteraction (CI) calculations on excited states, as discussed below. 8.3.2 One-Electron Properties

One-electron properties,75 in this case primarily the molecular electrostatic potential and field and electrostatic and transition moments, are generally reproduced very well by MNDO-like methods, almost independent of the particular Hamiltonian being used. As an example, we can think of the molecular electrostatic potential (MEP), which has been shown to be a dominant factor in determining intermolecular interactions.76 The MNDO formalism offers a convenient model for representing the electrostatics of molecules because we can derive an atomcentered multipole model77 (up to quadrupoles) directly from the MNDO multipole model for the two-electron integrals.1 Using the AM1* Hamiltonian,61 – 66 for a small test set of diverse molecules, standard deviations between AM1* multipole MEPs at points on the isodensity surfaces of the molecules and those calculated at the same points using MP2/6-31G(d) or B3LYP/6-31G(d) was only on the order of 2 kcal mol−1 if a simple linear scaling factor was used. This observation has significant consequences for many branches of chemistry. It means, for example, that we can happily use MNDO-like methods to calculate solvation energies using polarizable continuum methods because the electrostatics of the molecules are correct. Further examples are given below for the use of transition moments in ensemble models. 8.3.3 Excited States

Semiempirical molecular orbital techniques were used very early to investigate excited states and to predict spectra. The early π-only Pople–Pariser–Parr technique78 was quite successful in predicting ultraviolet/visible spectra.79 Later, the development of the specially parameterized INDO/S technique,80 which used CI calculations limited to single excitations, became the method of choice for calculating spectra of organic and inorganic molecules.81 In the late 1990s, INDO/S

LARGE SYSTEMS

281

allowed calculation of the excited states of systems as large as a bacteriochlorophyll hexadecamer with 704 atoms, more than 2000 electrons, and a CI expansion of 4096 symmetry-selected configurations.82 Semiempirical CI calculations are not limited to INDO/S. Even “general purpose” methods such as AM1 give surprisingly good results for predicting absorption and fluorescence spectra and nonlinear optical (NLO) properties.83,84 It is probably fair to say that semiempirical CI calculations can give similar agreement with experimental excitation energies as current standard time-dependent DFT (TDDFT) methods, although the latter clearly have considerable potential for improvement. Multireference semiempirical techniques can provide remarkably accurate results when used with an orthogonalization correction and are eminently suitable for geometry optimizations on excited states.67 One major advantage of semiempirical CI calculations is that they are computationally very efficient, so that we can afford to perform tens of thousands of calculations on snapshots from classical moleculardynamics simulations. This is the basis of the ensemble model, which has been used to simulate fluorescence resonant energy transfer (FRET) in proteins85 and field-dependent second-harmonic generation by a dye embedded in a biological membrane.86 Such applications demonstrate the real potential and one of the most promising areas of application for MNDO-like methods. 8.4 LARGE SYSTEMS

By large systems we mean both very large molecules and large databases of smaller molecules. Semiempirical molecular orbital methods are useful for the former because of their potential linear scaling. Their inherent speed makes them the ideal choice for both applications. 8.4.1 Databases

Because of their ability to deliver accurate geometries, energies, and one-electron properties, semiempirical MO methods are ideally suited for providing extra information about, for example, druglike molecules.87 It is important to emphasize that the all-important76 molecular electrostatic potential (MEP) is reproduced very poorly by the atomic monopoles commonly used in force fields. The MEP calculated from an atomic-monopole model may even be so much in error as to preclude important intermolecular bonding effects, such as halogen bonding.88 The MEP generated from common semiempirical methods is, however, in very good agreement with that calculated by DFT or ab initio methods.77 Furthermore, semiempirical MO techniques can be used to calculate an array of local properties that describe intermolecular interactions.89 It is therefore not surprising that a complete database of 53,000 compounds was treated (the geometries of all molecules optimized) with AM1 as early as 199890 and to process the entire NCI database (250,000 compounds) in 2005.91 Several in-house databases of companies in the pharmaceutical industry (1 to 2 million compounds) have been treated similarly.

282

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

8.4.2 Ensemble Models

Large databases are not the only area in which very many calculations are required. The two major challenges that face computational chemistry are to represent the potential energy hypersurface of the system correctly (the Hamiltonian) and, for large flexible systems, to sample the conformational space adequately to be able to calculate thermodynamic or spectral properties of the real system (sampling). Clearly, we cannot calculate Avogadro’s number of molecules in order to simulate a mole of substance. We can, however, use the ergodic hypothesis,92,93 which basically proposes that if we sample long enough, we will obtain a distribution of conformations for a single molecule that corresponds to that of an ensemble of very many molecules. This leads to the ensemble models94 for simulating macroscopic systems. In these models, very many snapshots (instantaneous geometries of the system) are taken from a single (or several) molecular-dynamics simulations, their properties calculated by a suitable method (in the examples below semiempirical CI) and the properties of the real system calculated as the average of those of the individual snapshots. Such models have been very successful in calculating the details of FRET in the tetracycline repressor protein85 and simulating the effects of an applied potential on an NLO dye embedded in a cell membrane.86 Semiempirical CI calculations are the only techniques that can provide the necessary accuracy and throughput for such applications. 8.4.3 Proteins

Linear scaling techniques have made the calculation of protein properties— structure, energetics, interactions—possible with quantum mechanical techniques. In part, this was due to the fact that the computational effort required in solving the SCF equations had limited the size of the systems to just a few hundred atoms; this meant that only the smaller proteins, such as crambin, could be studied. More important, weak interatomic interactions such as those found in hydrogen bonds and π − π stacking, were poorly represented by the “fast” quantum mechanical techniques (semiempirical and DFT). As interactions of this type are important in proteins, this fault cast doubt on any predicted results. But now, with the development of linear scaling methods, the properties of proteins containing up to 15,000 atoms can be modeled; less than 13% of all entries in the Protein Data Bank95 are larger than that, and with the advent of PM6, weak interactions of the type found in proteins can also be reproduced with unprecedented accuracy using semiempirical MO theory. These developments have resulted in the ability to model protein chemistry with relative ease; using PM6 and the linear scaling function MOZYME, the properties of over 40 proteins were modeled using a simple desktop computer.96 Among these properties are structure (albeit starting from the PDB geometry), heat of formation, transition states for enzyme-catalyzed reaction, and elastic modulus for structural proteins. The more general problem of de novo predicting protein structure is still unsolved.

REFERENCES

283

D&C methods were the first to be used for calculations on moderately sized proteins, both with97 and without98 solvent effects simulated using the Poisson–Boltzmann equation. Both AM1 and PM3 have proven to be useful in distinguishing between native and misfolded protein structures.99 The more recent PM6 technique in combination with the LMO linear scaling approach has proven to be very useful for studying proteins.96 Many phenomena in proteins can be modeled with good accuracy using PM6, but significant limitations remain. The long-standing fault of semiempirical methods—that predicted barrier heights for covalent reactions are of low accuracy—still exists in PM6. Another fault is that despite the improvements in modeling weak interactions, intermolecular interactions of the type that occurs when a substrate binds to a protein are also poorly reproduced. Very recent work suggests that by making simple modifications to the core–core interactions, to include100 an explicit correction for hydrogen bonds involving oxygen or nitrogen, and adding in a correlation term,29 the accuracy of prediction of intermolecular interactions can be increased significantly. Thus, for the S22 data set,101 intermolecular interactions were reproduced with chemical accuracy (average unsigned error = 0.8 kcal mol−1 ), considerably less than the 3.4 kcal mol−1 found when PM6 was used.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

Dewar, M. J. S.; Thiel, W. J. Am. Chem. Soc. 1977, 99 , 4899. Thiel, W.; Voityuk, A. A. Theor. Chim. Acta 1992, 81 , 391. Thiel, W.; Voityuk, A. A. Theor. Chim. Acta 1996, 93 , 315. Thiel, W.; Voityuk, A. A. Int. J. Quantum Chem. 1994, 44 , 807. Thiel, W.; Voityuk, A. A. J. Mol. Struct . 1994, 313 , 141. Thiel, W.; Voityuk, A. A. J. Phys. Chem. 1996, 100 , 616. Born, M.; Oppenheimer, J. R. Ann. Phys. (Leipzig) 1927, 84 , 457. Schr¨odinger, E. Phys. Rev . 1926, 28 , 1049. Hartree, D. R. Proc. Cambridge Phil. Soc. 1928, 24 , 89, 111, 426. Fock, V. Z. Phys. 1930, 61 , 126. Pauli, W. Z. Phys. 1925, 31 , 765. Slater, J. C. Phys. Rev . 1929, 34 , 1293; 1930, 35 , 509. H¨uckel, E. Z. Phys. 1931, 70 , 204; 1931, 72 , 310; 1932, 76 , 628; 1933, 83 , 632. Sinanoglu, O.; Fu-Tai Tan, D. Chem. Phys. 1963, 38 , 1740. Clark, T.; Koch, R. The Chemist’s Electronic Book of Orbitals, Springer-Verlag, Berlin, 1999. 16. Winget, P.; Selc¸uki, C.; Horn, A. H. C.; Martin, B.; Clark, T. Theor. Chem. Acc. 2003, 110 , 254. 17. Horn, A. H. C.; Lin, J.-H.; Clark, T. Theor. Chem. Acc. 2005, 114 , 159–168; erratum: Theor. Chem. Acc. 2007, 117 , 461–465. 18. Sattelmeyer, K. W.; Tubert-Brohmann, I.; Jørgensen, W. L. J. Chem. Theor. Comput. 2006, 2 , 413.

284

19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48.

49.

50.

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

Kolb, M.; Thiel, W. J. Comput. Chem. 1993, 14 , 775. Weber, W.; Thiel, W. Theor. Chem. Acc. 2000, 103 , 495. Mohle, K.; Hofmann, H.-J.; Thiel, W. J. Comput. Chem. 2001, 22 , 509. Scholten, M. Ph.D. dissertation, Heinrich-Heine-Universit¨at, D¨usseldorf, Germany, 2003. Burstein, K. Y.; Isaev, A. N. Theor. Chim. Acta 1984, 64 , 397. ´ Csonka, G. I.; Angy´ an, J. G. J. Mol. Struct . (Theochem) 1997, 393 , 31. Voityuk, A. A.; R¨osch, N. J. Phys. Chem. A 2000, 104 , 4089. Stewart, J. J. P.; Cs´asz´ar, P.; Pulay, P. J. Comput. Chem. 1982, 3 , 227. Ahlrichs, R.; Penco, R.; Scoles, G. Chem. Phys. 1977, 19 , 119. Grimme, S. J. Comput. Chem. 2004, 25 , 1463. Jurecka, J.; Cerny, J.; Hobza, P.; Salahub, D. J. Comput. Chem. 2007, 28 , 555. Cerny, J.; Jurecka, J.; Hobza, P.; Valdes, H. J. Phys. Chem. A 2007, 111 , 1146. Elstner, M.; Hobza, P.; Frauenheim, T.; Suhai, S.; Kaxiras, E. J. Chem. Phys. 2001, 114 , 5149. Tuttle, T.; Thiel, W. Phys. Chem. Chem. Phys. 2008, 10 , 2159. McNamara, J. P.; Hillier, I. H. Phys. Chem. Chem. Phys. 2007, 9 , 2362. Stewart, J. J. P. J. Mol. Model . 2007, 13 , 1173. Rinaldi, D.; Rivail, J.-L. Theor. Chim. Acta 1973, 32 , 57; 1974, 32 , 243. Sch¨urer, G.; Gedeck, P.; Gottschalk, M.; Clark, T. Int. J. Quantum Chem. 1999, 75 , 17. Martin, B.; Gedeck, P.; Clark, T. Int. J. Quantum Chem. 2000, 77 , 473. Eisenschitz, R.; London, F. Z. Phys. 1930, 60 , 491. Martin, B.; Clark, T. Int. J. Quantum Chem. 2006, 106 , 1208. Yang, W. Phys. Rev. Lett. 1991, 66 , 1438. Dixon, S. L.; Merz, K. M., Jr. J. Chem. Phys. 1997, 107 , 879. Ababoua, A; van der Vaart, A.; Gogonea, V.; Merz, K. M., Jr. Biophys. Chem. 2007, 125 , 221. Stewart, J. J. P. Int. J. Quantum Chem. 1996, 58 , 133. Stewart, J. J. P. J. Mol. Model . 2009, 15 , 765. Pople, J. A.; Santry, D. P.; Segal, G. A. J. Chem. Phys. 1965, 43 , S129. Pople, J. A.; Beveridge, D. L.; Dobosh, P. A. J. Chem. Phys. 1967, 47 , 2026. Kramida, A. E.; Martin, W. C.; Musgrove, A.; Olsen, K.; Reader, J.; Saloman, E. B. http://physicsnistgov/cgi-bin/ASBib1/Elevbib/search_formcgi, 2009. Afeefy, H. Y.; Liebman, J. F.; Stein, S. E. Neutral thermochemical data. In NIST Chemistry WebBook , Linstrom, P. J., and Mallard, W. G., Eds., NIST Standard Reference 69, National Institute of Standards and Technology, Gaithersburg, MD, 2003. Available at http://webbooknistgov/chemistry. Levin, R. D.; Lias, S. G. Ionization Potentials and Appearance Potential Measurements, National Standards Reference Data Series, Vol. 71, National Bureau of Standards, Washington, DC, 1982. Allen, F. H. Acta Crystallogr. B 2007, 58 , 380.

REFERENCES

285

51. Stewart, J. J. P. Parameterization of semiempirical M.O. methods. In Encyclopedia of Computational Chemistry, Vol. 3, Schleyer, P. v. R., Allinger, N. L., Clark, T., Gasteiger, J., Kollman, P. A., Schaefer, H. F. S., III, and Schreiner, P. R., Eds., Wiley, Chichester, UK, 2000. 52. Dewar, M. J. S.; Thiel, W. J. Am. Chem. Soc. 1977, 99 , 4907. 53. Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107 , 3902. 54. Stewart, J. J. P. J. Comput. Chem. 1989, 10 , 209. 55. Stewart, J. J. P. J. Comput. Chem. 1989, 10 , 221. 56. Rocha, G. B.; Freire, R. O.; Simas, A. M.; Stewart, J. J. P. J. Comput. Chem. 2006, 27 , 1101. 57. Bingham, R. C.; Dewar, M. J. S.; Lo, D. H. J. Am. Chem. Soc. 1975, 97 , 1285. 58. Stewart, J. J. P. J. Comput. Chem. 1991, 12 , 320. 59. Stewart, J. J. P. J. Mol. Model . 2004, 10 , 155. 60. (a) Ditchfield, R.; Hehre, W. J.; Pople, J. A. J. Chem. Phys. 1971, 54 , 724. (b) Hehre, W. J.; Ditchfield, R.; Pople, J. A. J. Chem. Phys. 1972, 56 , 2257. (c) Hariharan, P. C.; Pople, J. A. Mol. Phys. 1974, 27 , 209. (d) Gordon, M. S. Chem. Phys. Lett. 1980, 76 , 163. (e) Hariharan, P. C.; Pople, J. A. Theor. Chim. Acta 1973, 28 , 213. (f) Blaudeau, J. -P.; McGrath, M. P.; Curtiss, L. A.; Radom, L. J. Chem. Phys. 1997, 107 , 5016. (g) Francl, M. M.; Pietro, W. J.; Hehre, W. J.; Binkley, J. S.; DeFrees, D. J.; Pople, J. A.; Gordon, M. S. J. Chem. Phys. 1982, 77 , 3654. (h) Binning, R. C., Jr.; Curtiss, L. A. J. Comput. Chem. 1990, 11 , 1206. (i) Rassolov, V. A.; Pople, J. A.; Ratner, M. A.; Windus, T. L. J. Chem. Phys. 1998, 109 , 1223. (j) Rassolov, V. A.; Ratner, M. A.; Pople, J. A.; Redfern, P. C.; Curtiss, L. A. J. Comput. Chem. 2001, 22 , 976. (k) Frisch, M. J.; Pople, J. A.; Binkley, J. S. J. Chem. Phys. 1984, 80 , 3265. 61. Winget, P.; Horn, A. H. C.; Selc¸uki, C.; Martin, B.; Clark, T. J. Mol. Model . 2004, 9 , 408. 62. Winget, P.; Clark, T. J. Mol. Model . 2005, 11 , 439. 63. Kayi, H.; Clark, T. J. Mol. Model . 2007, 13 , 965. 64. Kayi, H.; Clark, T. J. Mol. Model . 2009, 15 , 295. 65. Kayi, H.; Clark, T. J. Mol. Model . 2009, 15 , 1253. 66. Kayi, H.; Clark, T. J. Mol. Model . 2010, 16 , 29. 67. Koslowski, A.; Beck, M. E.; Thiel, W. J. Comput. Chem. 2003, 24 , 714–726. 68. Thiel, W. Quantum Chemistry Program Exchange, QCPE 438, University of Indiana, Bloomington, IN, 1982. 69. Dewar, M. J. S.; Jie, C.; Yu, J. Tetrahedron 1993, 49 , 5003. 70. Repasky, M. P.; Chandrasekhar, J.; Jørgensen, W. L. J. Comput. Chem. 2002, 23 , 1601. 71. Klopman, G. J. Am. Chem. Soc. 1964, 86 , 4550. 72. Ohno, K. Theor. Chim. Acta 1964, 3 , 219. 73. Oleari, L.; DiSipio, L.; DeMichelis, G. Mol. Phys. 1966, 10 , 97. 74. Dewar, M. J. S.; Lo, D. H. J. Am. Chem. Soc. 1972, 94 , 5296. 75. See Karplus, M.; Kuppermann, A.; Isaacson, L. M. J. Chem. Phys. 1958, 29 , 1240.

286

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

76. Murray, J. S.; Politzer, P. J. Mol. Struct . (Theochem) 1998, 425 , 107; Murray, J. S.; Lane, P.; Brinck, T.; Paulsen, K.; Grince, M. E.; Politzer, P. J. Phys. Chem. 1993, 97 , 9369. 77. Horn, A. H. C.; Lin, J.-H.; Clark, T. Theor. Chem. Acc. 2005, 114 , 159; erratum: Theor. Chem. Acc. 2007, 117 , 461. 78. Pariser, R.; Parr, R. G. J. Chem. Phys. 1963, 21 , 466. 79. See, e.g., Griffiths, J. Dyes Pigments 1982, 3 , 211. 80. Ridley, J.; Zerner, M. C. Theor. Chim. Acta, 1973, 32 , 111. 81. Zerner, M. C. In Reviews of Computational Chemistry, Vol. 2, Lipkowitz, K. B., Ed., VCH, New York, 1991, p. 313. 82. Cory, M. G.; Zerner, M. C.; Hu X.; Schulten, K. J. Phys. Chem. B 1998, 102 , 7640. 83. Clark, T.; Chandrasekhar, J. Israel J. Chem. 1993, 33 , 435. 84. G¨oller, A.; Grummt, U. W. Int. J. Quantum Chem. 2000, 77 , 727. 85. Beierlein, F. R.; Othersen, O. G.; Lanig, H.; Schneider, S.; Clark, T. J. Am. Chem. Soc. 2006, 128 , 5142. 86. Rusu, C.; Lanig, H.; Clark, T.; Kryschi, C. J. Phys. Chem. B 2008, 112 , 2445. 87. Clark, T. In Molecular Informatics: Confronting Complexity, Hicks M. G., and Kettner C., Eds., Logos Verlag, Berlin, 2003, p. 193. 88. Politzer, P.; Murray, J. S.; Concha, M. J. Mol. Model . 2008, 14 , 659. 89. Clark, T.; Byler, K. G.; de Groot M. J. In Molecular Interactions: Bringing Chemistry to Life, Hicks M. G., and Kettner C., Eds., Logos Verlag, Berlin, 2008, p. 129. 90. Beck, B.; Horn, A. H. C.; Carpenter, J. E.; Clark, T. J. Chem. Inf. Comput. Sci . 1998, 38 , 1214. 91. Murray-Rust, P.; Rzepa, H. S.; Stewart J. J. P.; Zhang, Y. J. Mol. Model . 2005, 11 , 532. 92. Boltzmann, L. Einige allgemeine S¨atze u¨ ber das W¨armegleichgewicht , Vienna, Austria, 1871. 93. Boltzmann, L. Creeles J . 1884, 98 , 68. 94. Lee, M.; Tang, J.; Hochstrasser, R. M. Chem. Phys. Lett. 2001, 344 , 501. 95. http://www.pdb.org/, Research Collaboratory for Structural Bioinformatics, The San Diego Supercomputer Center, San Diego, CA, 2007. 96. Stewart, J. J. P. J. Mol. Model . 2008, 15 , 765. 97. Gogonea, V.; Merz, K. M., Jr. J. Phys. Chem. A 1999, 103 , 5171. 98. For a review, see van der Vaart, A.; Gogonea, V.; Dixon, S. L.; Merz, K. M., Jr. J. Comput. Chem. 2000, 21 , 1494. 99. Wollacott, A. M.; Merz, K. M., Jr. J. Chem. Theor. Comput. 2007, 3 , 1609. 100. Rezac, J.; Fanfrlik, J.; Salahub, D.; Hobza, P. J. Chem. Theor. Comput . 2009, 5 , 1749. 101. Jurecka, P.; Sponer, J.; Cerny, J.; Hobza, P. Phys. Chem. Chem. Phys. 2006, 8 , 1985.

9

Self-Consistent-Charge Density Functional Tight-Binding Method: An Efficient Approximation of Density Functional Theory MARCUS ELSTNER and MICHAEL GAUS Institute of Physical Chemistry, Universit¨at Karlsruhe, Karlsruhe, Germany; Institute for Physical and Theoretical Chemistry, Technische Universit¨at Braunschweig, Braunschweig, Germany

In this chapter we describe the derivation of the approximate DFT method SCCDFTB from DFT. The basic formalism of SCC-DFTB results from a second-order expansion of the DFT total energy, followed by appropriate approximations. The formal basis of SCC-DFTB is the non-self-consistent Harris functional. We discuss the performance of SCC-DFTB as well as recent extensions such as the inclusion of third-order terms and van der Waals corrections.

9.1 INTRODUCTION

Most semiempirical (SE) methods are derived either from Hartree–Fock (HF) or density functional theory (DFT) applying two types of approximations: first, they are based primarily on a minimal atomic orbital-like basis set; second, the numerous integrals, which have to be evaluated in HF and DFT, are partially neglected and the remaining ones can be calculated either using further approximations or can be substituted by parameters, which in turn are be fitted to reproduce experimental data. As a result, no integrals have to be evaluated during the runtime of the program, and the dominant computational cost is given by the diagonalization of the Fock (Hamilton) matrix. Since this matrix is represented in a minimal atomic basis set, solution of the eigenvalue problem is much less expensive than for full DFT and HF methods, which usually

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

287

288

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

apply more extended basis sets. Typically, SE methods are about three orders of magnitude faster than HF/DFT methods using double-zeta basis sets. They exhibit an O(N 3 ) scaling behavior, that means that the computing time increases cubically with the system size (which is roughly proportional to the number of atoms, or, more correctly, proportional to the number of electrons N ). Since DFT is also O(N 3 ) scaling, the factor of 1000 gained in computational speed with respect to DFT means that about 10-fold larger systems can be treated. For example, today about 100 atoms can be handled by DFT on standard desktop PCs, while roughly 1000 atoms can be treated using SE methods. The bottleneck here is the diagonalization of the Fock–Hamilton matrix, and methods that avoid this step, such as O(N ) scaling algorithms,1 help to increase the system size dramatically, as discussed in Chapters 2 and 8. However, in many cases the system size is not the limiting issue. Chemistry often occurs in localized regions and the “active site” of interest often contains only several 10 to 100 atoms [i.e., a quantum mechanical (QM) treatment is needed only for this small subsystem (this applies often in biological systems)]. The remainder of the system can be treated by empirical potentials [molecular mechanics (MM)]. A combination of QM methods with MM force fields in QM/MM methods can now be applied routinely (for recent comprehensive reviews, see, e.g., Refs. 2 and 3). A major issue however, is the time scale that can be reached using molecular dynamics (MD) simulations. HF and DFT make it possible to follow the system dynamics (for several tens of atoms) in the picosecond regime. In this case, the factor 1000 gained in computational speed by SE methods allows for 1000-fold longer MD simulations (i.e., the nanosecond time scale is easily accessible). In many applications, this helps to follow the relevant conformational changes or, much more important, to compute free-energy changes along reaction pathways.4 This is probably the main reason why SE methods have been used increasingly in the past years, although they sacrifice accuracy compared to DFT in many cases (note that this can be reversed for specific applications). In quantum chemistry, the classical route to deriving SE methods is to start from HF theory and fit the remaining parameters (integrals) to experimental data. This approach leads to a family of SE methods, with MNDO, AM1, and PM3 being the best known. The latest and most accurate members of this family are discussed by Clark and Stewart in Chapter 8. In solid-state physics, tight-binding (TB) approaches have been used extensively to study the properties of solids and clusters,5,6 directly paralleling the development of the H¨uckel model in chemistry; these methods are reviewed in Chapter 10. Standard tight-binding methods are usually based on the Harris functional approach7 (i.e., they diagonalize a suitable Hamiltonian once and use this non-self-consistent solution to derive further properties, such as forces and second derivatives). The relation of DFT and TB methods has been discussed in detail by Foulkes and Haydock.8 TB methods can be understood as a stationary approximation to DFT and tend to work well when the “guess” density, which is incorporated into the predetermined Hamilton matrix, is a good approximation to the DFT ground-state density.

THEORY

289

SCC-DFTB is an approximate quantum chemical method that is derived from DFT by a second-order expansion of the DFT total energy with respect to density fluctuations around a suitable reference density.9 On the other hand, SCC-DFTB can be viewed as an extension of a tight-binding method, which includes charge self-consistency and is parameterized using DFT. Energy in tight-binding methods consists of two parts: electronic and repulsive. The electronic part is described by a Hamiltonian, which is usually represented in a minimal basis of atomcentered basis functions. In DFTB, this Hamilton matrix is derived from DFT using as a reference density the superposition of neutral atomic densities and a minimal basis of atomic wavefunctions, which is calculated explicitly.10 – 14 The repulsive energy, which consists of the DFT double-counting contributions and the core–core repulsion, can be approximated as a sum of atomic pair repulsion functions. SCC-DFTB is parameterized using the generalized-gradient approximation (GGA). In the actual version the electronic parameters are calculated using the PBE functional.15 This means, however, that the well-known DFT-GGA deficiencies are inherited by SCC-DFTB. Of particular relevance is the DFTGGA tendency to overpolarize extended π-conjugate systems,16 the problems of ionic and charge-transfer excited states,17 and the missing dispersion interactions, which have been included by augmenting SCC-DFTB using an empirical extension.18 The performance and deficiencies of SCC-DFTB with respect to biological applications have been reviewed recently,19,20 and methodological developments have been described elsewhere.21 9.2 THEORY

The derivation of SCC-DFTB starts from the DFT total energy. In a first step, we discuss the Harris functional approximation as the basis for non-self-consistent TB methods. In a second step, second-order corrections to Harris functional theory are introduced, leading after further approximations to the SCC-DFTB formalism. In a next step, the remaining approximations, the performance and possible extensions of this methodology, are discussed. 9.2.1 DFT and the Harris Functional

The DFT total energy reads ρ(r)ρ(r ) 1 ext dr dr E[ρ] = T [ρ] + v (r)ρ(r) dr + 2 |r − r | 1 Zα Zβ + E xc [ρ] + 2 Rαβ

(9.1)

αβ

where ρ(r) is the electron density, T [ρ] the kinetic energy of the electrons, v ext the external potential arising from the nuclei with charge Z, and E xc [ρ] is the

290

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

exchange-correlation energy. Application of the variational principle leads to the well-known Kohn–Sham (KS) equations, 1 2 (9.2) − 2 ∇ + v eff [ρ] φi = εi φi with v eff [ρ] being the KS effective potential, which determines the KS eigenvalues (molecular orbital energies) εi and KS (molecular) orbitals φi . Since v eff [ρ] already contains the electron density, which is calculated as |φi |2 (9.3) ρ= i

these equations have to be solved iteratively until self-consistency is achieved. Using the Kohn–Sham energies εi , the total energy can be written22 occ

E[ρ] =

i

εi −

1 2

ρ(r)ρ(r ) dr dr + E xc [ρ] |r − r |

v xc (r)ρ(r) dr +

−

1 Zα Zβ 2 Rαβ

(9.4)

αβ

In the Harris-functional approach,7 an initial density ρ0 is constructed as a superposition of fragment densities ρ0α , ρ0α (9.5) ρ0 = α

and it can be shown that the total energy can be approximated in first order as E[ρ] =

occ i

−

εH i

1 − 2

ρ0 (r)ρ0 (r ) dr dr + E xc [ρ0 ] |r − r |

v xc (r)ρ0 (r) dr +

1 Zα Zβ 2 Rαβ

(9.6)

αβ

0 where the εH i are determined from Eq. (9.2) using ρ instead of the true density ρ, which would have to be determined self-consistently by iterating Eqs. (9.2) and (9.3). Any DFT method has to be initialized by choosing a proper initial density ρ0 , which is usually taken as a superposition of atomic densities. As pointed out by Harris,7 the KS equations (9.2) do not have to be solved iteratively if the starting density ρ0 is close to the ground-state density ρG (introducing an error of second order in the difference density δρ = ρ − ρ0 ). This non-self-consistent solution of the KS equations is the basis of the Harris functional approach, and proper implementation boils down to the question of how to find a good starting density ρ0 , which has been elaborated in particular in TB theory.

THEORY

291

9.2.2 Non-Self-Consistent TB Methods

To get started, consider a case where one already knows the ground-state density ρ0 to sufficient accuracy. In this case, one can omit the self-consistent solution of the KS equations and get the orbitals immediately through 1 2 (9.7) − 2 ∇ + v eff [ρ0 ] φi = εi φi (ρ0 stands for a properly chosen input density in the following). This saves a factor of 5 to 10 already; however, it is the starting point for further approximations. Consider a minimal basis set consisting of atomic orbitals: that is, ημ = 2s, 2px , 2py , and 2pz for first-row elements (core orbitals are usually omitted) and ημ = 1s for H. With the basis set expansion φi =

cμi ημ

μ

and the Hamiltonian Hˆ [ρ0 ] = Tˆ + v eff [ρ0 ] we find that

cμi Hˆ [ρ0 ]|ημ > = εi

μ

cμi |ημ >

(9.8)

μ

Multiplication with < ην | leads to cμi < ην |Hˆ [ρ0 ]|ημ > = εi cμi < ην |ημ > μ

(9.9)

μ

or equivalently, in matrix notation, H 0 C = SCε

(9.10)

This means that we just have to solve the eigenvalue equation once; that is, we 0 =< ην |Hˆ [ρ0 ]|ημ >. The superscript have to diagonalize the Hamilton matrix Hμν zero indicates that the matrix elements are evaluated using the reference density ρ0 . Diagonalization leads to the one-particle energies εi , that is, to the electronic energy: εi (9.11) E elec = i

Note that the basis set is nonorthogonal; that is, the overlap matrix Sμν =< ην |ημ > appears in the eigenvalue equations. In such a scheme, the Hamilton and overlap matrix elements have to be determined. Effectively, the Hamilton matrix

292

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

elements can be fitted to reproduce properties of well-chosen benchmark systems. Goringe et al.5 and Colombo6 discuss several examples. Since the general form of the Hamilton operator is always known, fitting determines implicitly a proper starting density, as pointed out by Foulkes and Haydock.8 The overlap matrix, however, is difficult to achieve if matrix elements are not computed from first principles but are fitted to experimental data. Therefore, orthogonal TB methods are usually employed. 9.2.2.1 Orthogonal Empirical Tight Binding (ETB) or Huckel Theory In ¨ empirical schemes, the basis functions are taken to be orthogonal (i.e., Sμν = δμν ). The background is the L¨owdin orthogonalization, where we get orthonormal orbitals through

η = S 1/2 η Introducing orthonormal orbitals means multiplying with S −1/2 and inserting a “1”: S −1/2 H S −1/2 S 1/2 C = S −1/2 S 1/2 S 1/2 Cε to get the orthonormal equations (C = S 1/2 C, H = S −1/2 H S −1/2 ): H C = Cε Introducing orthonormal orbitals means effectively changing the Hamiltonian. This is convenient, since in empirical schemes the Hamilton matrix is completely fitted to empirical data: for example, for carbon to the solid-state band structures of several crystal structures (e.g., diamond, graphite, body-centered cubic) or, in H¨uckel theory, to properties of hydrocarbons.5,6 9.2.2.2 Density Functional Tight Binding (DFTB) The derivation of parameters via fitting is a quite involved process. If one could derive the parameters from DFT calculations, one would gain much more flexibility and a simplified parameterization scheme. In a first step, one has to choose a basis set. In TB theory, basis functions are atomic orbitals ημ , and these can be calculated from the atomic KS equations:

1 2 − 2 ∇ + v eff [ρatom ] ημ = εμ ημ

(9.12)

The choice of a basis is to a large degree arbitrary, and several functional forms have been applied in quantum chemistry. Atomic orbitals have the disadvantage that they are very diffuse compared to the bonding situation in solids, molecules, or clusters, where atomiclike orbitals would be “compressed” due to interaction with the neighbors. Therefore, it would be wise to use orbitals, which anticipate this interaction/compression to some degree. One way to enforce this is to add

THEORY

293

an additional (harmonic) potential to the atomic Kohn–Sham equations, which leads to compressed atomic orbitals or optimized atomic orbitals (O-LCAO), as introduced by Eschrig23 : 2 1 2 eff atom (9.13) ημ = εμ ημ − 2 ∇ + v [ρ ] + rr0 A measure of the distance between neighbors is given by the covalent radius r 0 and is determined for all atoms empirically. This parameter enters the evaluation of the matrix elements and is, of course, of empirical nature. As a result of the atomic calculations, we get the orbitals ημ , the electron density at (the charge neutral) atom α, ρ0α = |ημ |2 (9.14) and the overlap matrix Sμν = < ην |ημ >. To solve the eigenvalue problem in Eq. (9.9) or (9.10), we only need the Hamiltonian matrix. This leads to further

0 approximations, since although we ρα , the Hamiltonian evaluation would have the complete input density ρ0 = be very complicated: Hμν =< ην |Hˆ [ρ0 ]|ημ > = < ην |Hˆ [ ρ0α ]|ημ > We therefore usually make the two-center approximation for μ = ν: Hμν = < ην |Hˆ [ρ0 ]|ημ > = < ην |Hˆ [ρ0α + ρ0β ]|ημ >

(9.15)

where the orbital ν is located on atom α and the orbital μ is located on atom β. The diagonal Hamiltonian elements Hμμ = εμ are taken from Eq. (9.13). The two-center approximation neglects two types of integrals which contain contributions of density ργ . The terms that would enter the diagonal Hμμ are crystal field terms, while the terms missing on the off-diagonal terms Hμν are three-center terms. These approximations are discussed in detail elsewhere.24,25 As can be shown, the neglect of crystal field terms becomes more severe for short interatomic distances, which, however, may be compensated for by a properly chosen repulsive potential.25 The missing crystal field terms may also be responsible for errors in the cohesive energies for highly coordinated systems, as has been described for some bulk silicon systems.26 In the context of semiempirical MO theory, the neglect of three-center terms has been discussed as being responsible for an underestimation of rotational barriers. In DFTB, this may have a similar consequence. Rotational barriers are slightly underestimated, which manifests itself in an underestimation of vibrational frequencies of the low-lying vibrational modes. In DFTB,10 – 13 Hμν and Sμν are tabulated for various distances between atom pairs up to 10 a.u., where they vanish (also due to compression!). For any molecular geometry, these matrix elements are read in based on the distance between two atoms and then oriented in space using the Slater–Koster sin/cos

294

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

combination rules27 (see, e.g., Ref. 6). Then the generalized eigenvalue problem (9.10) is solved and the electronic part of the energy, E elec , from Eq. (9.11) can be calculated. It should be emphasized that this is a nonorthogonal TB scheme, which is more transferable, due to the appearance of the overlap matrix. 9.2.3 Repulsive Energy E rep

Up to now, we have only discussed the first part of the total energy in DFT in Eq. (9.6), the sum over the Kohn–Sham energies εH i as calculated in Eq. (9.11): E[ρ] =

occ i

εH i

1 − 2

−

ρ0 (r)ρ0 (r ) dr dr + E xc [ρ0 ] |r − r |

v xc (r)ρ0 (r) dr +

1 Zα Zβ 2 Rαβ

(9.16)

αβ

In TB theory, the remaining terms, the DFT double-counting and core–core repulsion terms are put together into an energy term called repulsive energy, E rep , that the TB total energy reads: E TB [ρ] =

occ

rep εH i + E [ρ]

(9.17)

i

First, it is interesting to note that the double-counting terms depend on the 0 input/reference

0 density ρ only. If we introduce the atomic density decomposition, 0 ρ = α ρα , where the atomic densities are computed according to Eq. (9.14), the Coulomb contributions ρ0 (r)ρ0 (r ) 1 Zα Zβ α β − dr dr 2 Rαβ |r − r | αβ

decay exponentially with distance Rαβ , since the overlap of the atomic densities decays exponentially. The Coulomb terms therefore can be regarded as a sum of two-body interactions, which is not the case for the exchange–correlation part in Eq. (9.4). Foulkes and Haydock8 suggested applying a cluster expansion, E xc [ρ0 ] =

α

E xc [ρ0α ] +

1 xc 0 (E [ρα + ρ0β ] − E xc [ρ0α ] − E xc [ρ0β ]) + · · · 2 αβ

(9.18) The three-center terms are assumed to be small and are neglected. Therefore, the repulsive potential E rep is approximated as the sum of a set of pairwise atom–atom potentials. Because ρ0α corresponds to the charge density of a neutral atom, the electron–electron and nucleic–nucleic repulsions cancel for

THEORY

295

large interatomic distances. Therefore, E rep can be assumed to be short-ranged. However, due to the first term on right-hand side of Eq. (9.18), the repulsive potential does not approach zero for large interatomic distances R.28 Because in DFTB E rep is assumed to be short-range anyhow, an additive constant has to be taken into account for some applications (e.g., when computing proton affinities). Early ETB models had the form εi + 12 Uαβ E tot = αβ

i

with the two-body terms Uαβ being exponentials fitted to reproduce, for example, geometries, vibrational frequencies, and reaction energies of suitable systems. There are various approaches in the literature to treating this repulsive part, including attempts to account for the many-body nature of E rep . In DFTB, Uαβ E rep [ρ0 ] = 12 αβ

is calculated pointwise as follows: To get the repulsive potential for carbon, for example, one could take the carbon dimer C2 , stretch its bond, and for each

distance calculate the total energy with DFT and the electronic TB part i εi .UCC (RC—C ) is given pointwise for every RC—C by DFT (RC—C ) − UCC (RC—C ) = Etot

εi

(9.19)

i

Since for the varying RC—C in the carbon dimer a lot of state crossings appear in DFT calculations, this example becomes more complex. Another possibility is to include information of a C—C single, double, and triple bond.20 Here for various carbon–carbon distances, RC—C of the molecules ethyne, ethene, and ethane DFT calculations are performed and the resulting curves connected. This example is illustrated in Fig. 9.1. The repulsive potential is shifted so that it goes to zero at the cutoff distance. This shift makes the construction of repulsive potentials the most time-consuming part in a new parameterization. The shift affects the atomization energy and, consequently, the heat of formation of a molecule. More important, reaction energies are controlled by the relative shifts of two potentials. Additionally, no arbitrary shift of a potential is possible, due to restrictions at the cutoff radius. Further restrictions apply for the slope and the curvature of a potential which is directly connected to the description of bond lengths and harmonic vibrational frequencies. With this conventional approach, every repulsive potential was individually hand-constructed. For illustration, we take the example of the C—H bond. Practically, one C—H bond of methane is stretched and compressed, and the DFT total energy and DFTB electronic energy are recorded pointwise for a sufficient number of geometries. Then the difference in the energies according to Eq. (9.19)

296

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY 0.4

EDFT Eel Erep

0.3

energy [a.u.]

0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

distance [Å]

Fig. 9.1 E DFT shows the (shifted) total energy versus C—C distance for HC≡CH,

H2 C=CH2 , and H3 C—CH3 , E el represents i εi + shift for the same structures [the second term on right-hand side of Eq. (9.19)], and E rep is the difference of these two curves.

is fitted to a polynomial (or a spline), yielding the desired repulsive potential. At the end, the repulsive potential is shifted in order to match the atomization energy of methane. Practically, the potentials could not be shifted upward sufficiently; therefore, the potentials were constructed to yield a consistent overbinding for every bond type, as noted recently.29 Recent work has been carried out to find an automated approach. Knaup et al. use a genetic algorithm to reproduce reference forces and reaction barriers.30 Gaus et al. solve a linear equation system containing parameters for the repulsive potentials as unknowns in order to fit them to reference geometries, atomization energies, reaction energies, and vibrational frequencies.31 The resulting DFTB method works very well for homonuclear systems, where charge transfer between the atoms in the system does not occur or is very small. As soon as charge is flowing between atoms because of an electronegativity difference, the resulting density is no

longer well approximated by the superρα . As examples of the breakdown of position of the atomic densities ρ0 = the standard non-self-consistent method, the molecules CO2 and formamide have been discussed.9 However, the formalism works very well when the charge flow is small; therefore, an extension will try to start from the non-self-consistent scheme and augment the Hamiltonian with appropriate additional terms. 9.2.4 Second-Order Approximation of the DFT Total Energy: Self-Consistent-Charge Density Functional Tight-Binding Method

The problem with the charge transfer is that the effective Kohn–Sham potentials contain only the neutral reference density ρ0 , which does not account for charge

THEORY

297

transfer between atoms. Let’s try a Taylor series expansion (functional expansion) of the potential with the ground-state density ρ around the reference density ρ0 : v [ρ] = v [ρ ] + eff

eff

0

δv eff [ρ] δρ dr δρ

(9.20)

This potential could be inserted into Eqs. (9.9) and (9.10). The first term on the right-hand side of Eq. (9.20) would lead to the zero-order terms in Eqs. (9.9) and (9.10), Hμν [ρ0 ], depending on the reference density, while the second term on the right-hand side of Eq. (9.20) would lead to corrections for charge transfer. In a second step, one would have to find approximations for the functional derivatives. Since we need the total energy and not only the KS equations, it is better to start the functional expansion with the DFT total energy. The SCC-DFTB method is derived from density functional theory (DFT) by a second-order expansion of the DFT total energy functional with respect to the charge-density fluctuations

δρ around a given reference density ρ0 [ρ0 = ρ0 (r ), = d r ]: 2 xc 1 E δ 1 < i |Hˆ 0 |i > + ρ ρ E= + 2 |r − r | δρ δρ ρ0 i 0 0 1 ρ ρ xc 0 − V xc [ρ0 ]ρ0 + E cc (9.21) + E [ρ ] − 2 |r − r |

cμi ημ , the first term becomes After introducing an LCAO ansatz i = occ

< i |Hˆ 0 |i > =

0 cμi cνi Hμν

and can be evaluated as discussed above. The last four terms in Eq. (9.21) depend only on the reference density ρ0 and represent the repulsive energy contribution E rep , as discussed above. Therefore, we only have to deal with the second-order terms. Going from DFTB to SCC-DFTB, the second-order term E 2nd in the charge density fluctuations ρ [second term in Eq. (9.21)] is approximated by writing ρ as a superposition of atomic contributions: ρα ρ = α

To further simplify E 2nd , we apply a monopole approximation ρα ≈ qα Fα00 Y 00

(9.22)

Basically, ρα is assumed to look like an 1s orbital. Fα00 denotes the normalized radial dependence of the density fluctuation on atom α, which is constrained (approximated) to be spherical (Y 00 ) (i.e., the angular deformation of the charge

298

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

density change in second order is neglected): 1 δ2 E xc 1 2nd Fα00 Fβ00 (Y 00 )2 dr dr E ≈ + qα qβ 2 |r − r | δρ δρ n0 αβ

(9.23) This formula looks complicated but has a quite simple curve shape:

•

For large distances, Rαβ = |r − r | → ∞, the XC terms vanish and the integral describes the Coulomb interaction of two spherical normalized charge densities, which reduces basically to 1/Rαβ ; that is, we get E 2nd ≈

1 qα qβ 2 Rαβ αβ

•

For vanishing interatomic distance, Rαβ = |r − r | → 0, the integral describes the electron–electron interaction on atom α. We can approximate the integral as E 2nd ≈

1 2 ∂ 2 Eα 1 qα 2 = qα2 Uα 2 ∂ qα 2

Uα , known as the Hubbard parameter (which is twice the chemical hardness), describes how much the energy of a system changes upon adding or removing electrons. Now we need a formula γ to interpolate between these two cases. A very similar situation appears in semiempirical quantum chemical methods such as MNDO, AM1, or PM3, where γ has a simple form, as given, for example, by the Klopman–Ohno approximation, γαβ =

1 2 Rαβ

+ 0.25(1/Uα + 1/Uβ )2

(9.24)

To derive an expression analytically, we approximate the charge density fluctuations with spherical charge densities. Slater-like distributions Fα00 =

τα exp(−τα |r − Rα |) 8π

(9.25)

located at Rα allow for an analytical evaluation of the Hartree contribution of two spherical charge distributions. This leads to a function of γαβ , which depends on the parameters τα and τβ , determining the extension of the charge densities of atoms α and β. This function has a 1/Rαβ dependence for large Rαβ and

THEORY

299

approaches a finite value for Rαβ → 0. For zero interatomic distances (i.e., α = β) one finds that τα =

16 γαα 5

(9.26)

The function γαβ is shown schematically in Fig. 9.2. After integration, E 2nd becomes a simple two-body expression depending on atomic-like charges: qα qβ γαβ (9.27) E 2nd = 12 αβ

The diagonal terms γαα model the dependence of the total energy on charge density fluctuations (decomposed into atomic contributions) in second order. The monopole approximation restricts the change of the electron density considered and no spatial deformations are included; only the change of energy with respect to change of charge on atom α is considered. By neglecting the effect of the chemical environment on atom α, the diagonal part of γ can be approximated by the chemical hardness η of the atom, γαα = 2ηα = Uα =

∂ 2 Eα ∂ 2 qα

(9.28)

where Eα is the energy of the isolated atom α. Uα , the Hubbard parameter, is twice the chemical hardness of atom α, which can be estimated from the difference in

C-C H-H C-H

0.4

γ [a.u.]

0.3

0.2

0.1 0

2

4

6

8

10

r [a.u.]

Fig. 9.2 Function γCC for two carbon atoms with the Hubbard parameter UC = 0.3647 a.u. and γHH for two hydrogen atoms with UH = 0.4195 a.u. over the interatomic distance. The function γCH differs from γCC and γHH for short interatomic distances. Clearly, the case RC−H = 0 a.u. will not appear in a calculation.

300

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

the ionization potential and the electron affinity of atom α. For SCC-DFTB, it is calculated using Janak’s theorem, by taking the first derivative of the energy of the highest-occupied molecular orbital with respect to the occupation number. Therefore, Eq. (9.26) implies that the extension of the charge distribution is inversely proportional to the chemical hardness of the respective atom (i.e., the size of an atom is inversely related to its chemical hardness). This is an important finding which is discussed in more detail below. The total SCC-DFTB finally reads 0 cμi cνi Hμν + E 2nd + E rep (9.29) E SCC-DFTB = iμν

9.3 PERFORMANCE OF STANDARD SCC-DFTB 9.3.1 Timings

The substantial advantage of using SCC-DFTB is its time/performance efficiency. Before showing the performance of several properties in the following subsections, Table 9.1 shows benchmark calculations for the CPU time of a single-point energy calculation on C60 , polyanaline, and some water clusters. All calculations were carried out on a single processor of a standard desktop PC. For SCC-DFTB the DFTB+ code32 was used. The DFT values were obtained using the TURBOMOLE program package.33 For the PBE functional calculations the resolution of the identity (RI) integral evaluation has been used.34 As a basis set for the DFT methods we chose 6-31G(d), which is a rather small basis set for practical use. Table 9.1 shows that SCC-DFTB is at least 250 times faster than RI-PBE and more than 1000 times faster than B3LYP. This acceleration is due primarily to two issues: (1) the use of a minimal basis set within SCC-DFTB, and (2) the tabulation and neglect of integrals. For the water cluster (H2 O)48 , for example, N = 288 basis functions are needed for a minimal basis set and N = 864 basis functions for the 6-31G(d) basis set. The time-limiting step for obtaining the TABLE 9.1

Calculation Time (s) for Various Molecules with DFT and SCC-DFTB

Molecule

na

SCC-DFTB

RI-PBEb

B3LYPb,c

C60 d (Ala)10 e (Ala)20 e (H2 O)48 f (H2 O)123 f

60 112 212 144 369

1 4 12 3 15

1,112 966 3,418 769 5,488

9,398 6,655 27,605 3,466 30,822

a

Number of atoms. Basis set 6-31G(d). c B3LYP_Gaussian keyword in TURBOMOLE. d Buckminsterfullerene C . 60 e Polyalanine in α-helical form and including capping groups. f Water cluster. b

PERFORMANCE OF STANDARD SCC-DFTB

301

energy with all methods discussed here is a matrix diagonalization, which scales with N 3 . Thus, an acceleration just from using the minimal basis of the factor 27 is achieved. The remaining factor is due to the tabulation and neglect of integrals; in this example this factor is roughly 10 and 40, for comparison with RI-PBE and B3LYP, respectively. 9.3.2 Small Organic Molecules

SCC-DFTB has been tested for various properties of small organic molecules, such as heats of formations, geometries, vibrational frequencies, and dipole moments, as documented in several recent publications. It should be noted that all these test sets contain a large number of molecules, representative of many chemical bonding situations. In general, SCC-DFTB is excellent in reproducing geometries. Also, reaction energies are reproduced reasonably well on average,9,35 while heats of formation are overestimated, owing to the overbinding tendency of SCC-DFTB. Recently, the SCC-DFTB heats of formation have been tested systematically. It turned out that reparametrization of atomic contributions can improve the performance for heats of formation significantly; however, refined NDDO methods such as OM236 or PDDG/PM337 are still superior to SCC-DFTB in this respect.29,38 For a set of 622 neutral molecules containing the elements C, H, N, and O, Sattelmeyer et al. found a mean absolute error (MAE) in heats of formation for PDDG/PM3 of 3.2 and 5.8 kcal mol−1 for SCC-DFTB.38 Similarly, for a set of 140 CHNO-containing molecules, the respective mean absolute errors for OM2 and SCC-DFTB are 3.1 and 7.7 kcal mol−1 .29 The performance of SCC-DFTB for vibrational frequencies, although reasonable on average, is less satisfactory than for geometries. However, vibrational frequencies could also be improved significantly after reparametrization.39 The MAE for harmonic vibrational frequencies of 14 hydrocarbons drops from 59 cm−1 for the standard parameterization to 33 cm−1 for the reparameterized version. The MAE for the GGA-functional BLYP with the Dunning-type basis set cc-pVTZ is 25 cm−1 . Currently, parameters are available for O, N, C, H,9 S,40 Zn,28 Mg,41 and many transition metals.42 9.3.3 Peptides

A good performance for small molecules does not guarantee a good description for larger molecules. A good example are the structures and relative energies of peptides, which pose significant problems for semiempirical models such as AM143 and PM344 but are well described at the SCC-DFTB level,45,46 or more elaborate NDDO methods such as OM147 OM2.36,48 Therefore, the performance for small organic molecules does not necessarily tell much about the performance for larger complexes, and SE methods should be benchmarked carefully before applying them to new classes of molecules.

302

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

9.3.4 Hydrogen-Bonded Systems

Standard SCC-DFTB slightly underestimates the dipole moments of polar molecules, as discussed, for example, for peptides.45,46,49 This leads to a slight underestimation of binding energies of weak hydrogen-bonded complexes18,49 by 1 to 2 kcal mol−1 (e.g., the binding of the water dimer is found to be 3.3 kcal mol−1 , in contrast to 5 kcal mol−1 at a high computational level). Also, relative energies of peptide conformations are underestimated due to this error. It should be noted that this underestimation is quite systematic (i.e., the relative stability of different conformers is preserved).

9.4 EXTENSIONS OF STANDARD SCC-DFTB 9.4.1 Inclusion of Dispersion Forces

SCC-DFTB is derived from DFT and therefore inherits the well-known failures of the gradient-corrected (GGA) DFT functionals. This concerns the problem of overpolarizability,16 the problem of charge transfer and ionic excited states,50 and deficiencies in describing van der Waals interactions. These problems have been reviewed briefly by Elstner.20 Dispersion interactions become important for larger molecules, since they stabilize more complex structures. Therefore, we proposed to include them empirically on top of DFT and implemented this for SCC-DFTB.18 This approach was adopted to DFT later51,52 and has become increasingly available in many DFT codes. We have shown that DFT would fail to describe the stacking interaction between DNA bases without proper inclusion of dispersion interactions.18 DNA would not be stable. Surprisingly, dispersion interactions are also vital for stable peptide and protein structures. Neglecting dispersion forces, many peptide and protein conformations would not be stable; that is, standard DFT and SCC-DFTB are not able to describe the structure and dynamics of complex biological matter (and other materials, where dispersion forces are important). To include dispersion forces, simple two-body potentials with 1/R 6 dependence are added to the DFTB total energy. However, they have to be damped using a properly chosen damping function f (Rαβ ) for short distances18 : E SCC-DFTB-D = E SCC-DFTB −

α=β

f (Rαβ )

6 Cαβ

Rαβ

(9.30)

6 being properly chosen van der Waals parameters. Note that including with Cαβ such an extension to DFT leads to very different results, depending on the DFT functional used for exchange and dispersion.51 Only a properly chosen scaling function leads to quantitatively satisfying results.52 More details may be found elsewhere.20

EXTENSIONS OF STANDARD SCC-DFTB

303

9.4.2 Beyond Standard Second-Order DFTB

The approximation of the second derivatives of the total DFT energy by the γ function in order to model charge-transfer effects contains several approximations. As we have discussed in detail, the use of the γ function implicitly assumes that the size of an atom is represented by the inverse of the Hubbard (chemical hardness) parameter Uα , which enters the γ function.20,53 This relation holds quite well for many main-group elements but is completely wrong for the hydrogen atom.53 Therefore, the function γ has been modified to account for this irregularity. This leads to a significant improvement in hydrogen-bonding energies. The large error of 1 to 2 kcal mol−1 per hydrogen bond in the standard SCC-DFTB scheme can be reduced to about 0.5 kcal mol−1 using the modified γ function. Whereas for the description of hydrogen bonds a second-order expansion of total energy seems to be adequate, the calculation of proton affinities have been shown to be largely in error. This property is crucial, however, for an appropriate description of proton transfer reactions, and semiempirical methods in general have problems predicting this value accurately.54 The second-order approximation of DFTB works well for many systems, including charged systems, where the charge is delocalized over extended molecular fragments. For charged molecules, however, where the charge is localized, this approximation breaks down. It has been shown that for these cases the total energy [Eq. (9.21)] has to be expanded up to third order in the density fluctuations.20,53,55 This is crucial in particular for the calculation of deprotonation energies, where the inclusion of third-order terms leads to significant improvement. For example, the deprotonation energy of water is in error by nearly 30 kcal/mol in standard SCC-DFTB, whereas it has an error of a few kcal mol−1 in the third-order formulation. Formally, the expansion of the DFT total energy is carried out up to third order, and similar approximations are made as in the second-order case.53 In third order, the Hubbard parameter Uα becomes charge dependent. Since 1/Uα reflects the atom size, the charge dependence of Uα can account for the larger size of anions compared to neutral atoms or cations. In third-order DFTB, a new parameter occurs, the derivative of the Hubbard parameter, which can be calculated from DFT53 or fitted to minimize the error in the deprotonation energies of a suitably chosen reference set of molecules.55 9.4.3 Excited States via Time-Dependent DFT

The core of SCC-DFTB is an efficient approximation of the second derivatives of the total energy by the function γαβ . Such a second derivative also appears in the TD-DFT linear response formalism, which makes it possible to compute excited-state energies within the DFT framework. We have implemented this formalism for SCC-DFTB,40 finding surprisingly good results for singlet excitations at very low computational cost, while the problems of TD-DFT for

304

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

higher excitation, charge transfer, and ionic excited states are retained.50 More details are available in a recent review by Niehaus.56 9.4.4 QM/MM Methods

To effectively represent coupling between the environment and the quantum region, quantum mechanical methods have been coupled to empirical force-field methods in the QM/MM methods. Although introduced as early as in 1976,57 it was not until the early 1990s that QM/MM methods became widely used in the study of biological systems (a recent comprehensive review can be found in Ref. 2). Several QM/MM implementations with SCC-DFTB as the QM part have been realized up to now, incorporating it into various empirical force-field packages.58 – 62 But even for QM/MM approaches using SE methods as QM, the collective reorganization in the environment can become a computational bottleneck. Therefore, much effort is invested in developing multiscale methods, which combine QM/MM with continuum electrostatic methods (CM) for an integrated treatment of large systems. DFTB QM/MM coupling to CHARMM has been combined with a continuum approach,63,64 the generalized solvent-boundary potential developed originally by Roux and co-workers65 for classical simulations. The SCC-DFTB/MM methodology19,20 as well as the SCC-DFTB/MM/CM methodology63,66 has recently been reviewed. 9.5 CONCLUSIONS

SCC-DFTB is a semiempirical method derived from DFT-GGA. This means that all deficiencies of DFT-GGA are inherited directly. Note that SCC-DFTB applies pure GGA functionals (PBE) (i.e., no hybrid variant is available), which can ameliorate these failures to some degree. On the other hand, SCC-DFTB also inherits the merits of DFT, its conceptual simplicity in incorporating correlation effects, and its good performance for many molecular properties of interest. As a result, SCC-DFTB predicts molecular geometries surprisingly well; vibrational frequencies are also satisfactory. Reproduction of heats of formation for small organic molecules is comparable to the performance of modern semiempirical methods, although new variants such as PDDG-PM3 or OM2 are still slightly superior in this respect. It should be noted that approximate methods should be carefully benchmarked for classes of molecules and not applied blindly.† REFERENCES 1. Bowler, D. R.; Aoki, M.; Goringe, C. M.; Horsfield, A. P.; Pettifor, D. G. Model. Simul. Mater. Sci. Eng. 1997, 5 , 199. † This also applies to DFT methods (although to a lesser degree), since their approximate nature leads to a variety of problems and failures.

REFERENCES

305

2. Senn, H. M.; Thiel, W. Curr. Opin. Chem. Biol . 2007, 11 , 182. 3. Senn, H. M.; Thiel, W. Angew. Chem. Int. Ed . 2009, 48 , 1198. 4. Elstner, M.; Cui, Q. Multi-scale Methods for the Description of Chemical Events in Biological Systems, Multiscale Simulation Methods in Molecular Sciences, NIC-Serie, Publikationsreihe des John von Neumann-Instituts f¨ur Computing, J¨ulich, Germany, 2009. 5. Goringe, C. M.; Bowler, D. R.; Hernandez, E. Rep. Prog. Phys. 1997, 60 , 1447. 6. Colombo, L. Riv. Nuovo Cimento Soc. Ital. Fisi . 2005, 28 , 1. 7. Harris, J. Phys. Rev. B 1985, 31 , 1770. 8. Foulkes, W. M. C.; Haydock, R. Phys. Rev. B 1989, 39 , 12520. 9. Elstner, M.; Porezag, D.; Jungnickel, G.; Elstner, J.; Haugk, M.; Frauenheim, T.; Suhai, S.; Seifert, G. Phys. Rev. B 1998, 58 , 7260. 10. Porezag, D.; Frauenheim, T.; K¨ohler, T.; Seifert, G.; Kaschner, R. Phys. Rev. B 1995, 51 , 12947. 11. Seifert, G.; Eschrig, H.; Bieger, W. Z. Phys. Chem. (Leipzig) 1986, 267 , 529. 12. Widany, J.; Frauenheim, T.; K¨ohler, T.; Sternberg, M.; Porezag, D.; Jungnickel, G.; Seifert, G. Phys. Rev. B 1996, 53 , 4443. 13. Seifert, G. J. Phys. Chem. A 2007, 111 , 5609. 14. Witek, H. A.; K¨ohler, C.; Frauenheim, T.; Morokuma, K.; Elstner, M. J. Phys. Chem. A 2007, 111 , 5712. 15. Perdew, J. P.; Burke, K.; Ernzerhof, M. Phys. Rev. Lett. 1996, 77 , 3865. 16. Wanko, M.; Hoffmann, M.; Frauenheim, T.; Elstner, M. J. Comput. Aided Mol. Des. 2006, 20 , 511. 17. Wanko, M.; Hoffmann, M.; Strodel, P.; Koslowski, A.; Thiel, W.; Neese, F.; Frauenheim, T.; Elstner, M. J. Phys. Chem. B 2005, 109 , 3606. 18. Elstner, M.; Hobza, P.; Frauenheim, T.; Suhai, S.; Kaxiras, E. J. Chem. Phys. 2001, 114 , 5149. 19. Elstner, M.; Frauenheim, T.; Suhai, S. J. Mol. Struct . (Theochem) 2003, 632 , 29. 20. Elstner, M. Theor. Chem. Acc. 2006, 116 , 316. 21. Frauenheim, T.; Seifert, G.; Elstner, M.; Niehaus, T.; K¨ohler, C.; Amkreutz, M.; Sternberg, M.; Hajnal, Z.; Di Carlo, A.; Suhai, S. J. Phys. Condens. Matter 2002, 14 , 3015. 22. Parr, R. G.; Yang, W. Density-Functional Theory of Atoms and Molecules; Oxford University Press, New York, 1989. 23. Eschrig, H. Optimized LCAO Method and Electronic Structure of Extended Systems, Springer-Verlag, Berlin, 1989. 24. Seifert, G. J. Phys. Chem. A 2007, 111 , 5609. 25. Seifert, G.; Porezag, D.; Frauenheim, T. Int. J. Quantum Chem. 1996, 58 , 185. 26. Frauenheim, T.; Weich, F.; K¨ohler, T.; Uhlmann, S.; Porezag, D.; Seifert, G. Phys. Rev. B 1995, 52 , 11492. 27. Slater, J. C.; Koster, G. F. Phys. Rev . 1954, 94 , 1498. 28. Elstner, M.; Cui, Q.; Munih, P.; Kaxiras, E.; Frauenheim, T.; Karplus, M. J. Comput. Chem. 2003, 24 , 565. 29. Otte, N.; Scholten, M.; Thiel, W. J. Phys. Chem. A 2007, 111 , 5751.

306

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

30. Knaup, J. M.; Hourahine, B.; Frauenheim, T. J. Phys. Chem. A 2007, 111 , 5637. 31. Gaus, M.; Chou, C.; Witek, H.; Elstner, M. J. Phys. Chem. A 2009, 113 , 11866. 32. DFTB+, a development of Bremen Center of Computational Material Science (Prof. Frauenheim), available at http://www.dftb.org. 33. TURBOMOLE V6.1 2009, a development of University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989–2007, TURBOMOLE GmbH, since 2007; available at http://www.turbomole.com. 34. Ahlrichs, R. Phys. Chem. Chem. Phys. 2004, 6 , 5119. 35. Kr¨uger, T.; Elstner, M.; Schiffels, P.; Frauenheim, T. J. Chem. Phys. 2005, 122 , 114110. 36. Weber, W.; Thiel, W. Theor. Chem. Acc. 2000, 103 , 495. 37. Repasky, M. P.; Chandrasekhar, J.; Jørgensen, W. L. J. Comput. Chem. 2002, 23 , 1601. 38. Sattelmeyer, K. W.; Tirado-Rives, J.; Jorgensen, W. L. J. Phys. Chem. A 2006, 110 , 13551. 39. Małolepsza, E.; Witek, H. A.; Morokuma, K. Chem. Phys. Lett. 2005, 412 , 237. 40. Niehaus, T. A.; Suhai, S.; Della Sala, F.; Lugli, P.; Elstner, M.; Seifert, G.; Frauenheim, T. Phys. Rev. B 2001, 6308 , 085108. 41. Cai, Z.; Lopez, P.; Reimers, J. R.; Cui, Q.; Elstner, M. J. Phys. Chem. A 2007, 111 , 5743. 42. Zheng, G.; Witek, H. A.; Bobadova-Parvanova, P.; Irle, S.; Musaev, D. G.; Prabhakar, R.; Morokuma, K.; Lundberg, M.; Elstner, M.; Khler, C.; Frauenheim, T. J. Chem. Theory Comput. 2007, 3 , 1349. 43. Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107 , 3902. 44. Stewart, J. J. P. J. Comput. Chem. 1989, 10 , 209. 45. Elstner, M.; Jalkanen, K.; Knapp-Mohammady, M.; Frauenheim, T.; Suhai, S. Chem. Phys. 2000, 256 , 15. 46. Elstner, M.; Jalkanen, K.; Knapp-Mohammadi, M.; Frauenheim, T.; Suhai, S. Chem. Phys. 2001, 263 , 203. 47. Kolb, M.; Thiel, W. J. Comput. Chem. 1993, 14 , 775. 48. M¨ohle, K.; Hofmann, H.-J.; Thiel, W. J. Comput. Chem. 2001, 22 , 509. 49. Elstner, M.; Frauenheim, T.; Kaxiras, E.; Seifert, G.; Suhai, S. Phys. Status Solidi B 2000, 217 , 357. 50. Wanko, M.; Garavelli, M.; Bernardi, F.; Niehaus, T. A.; Frauenheim, T.; Elstner, M. J. Chem. Phys. 2004, 120 , 1674. 51. Wu, Q.; Yang, W. J. Chem. Phys. 2002, 116 , 515. 52. Grimme, S. J. Comput. Chem. 2004, 25 , 1463. 53. Elstner, M. J. Phys. Chem. A 2007, 111 , 5614. 54. Range, K.; Riccardi, D.; Elstner, M.; Cui, Q.; York, D. Phys. Chem. Chem. Phys. 2005, 7 , 3070. 55. Yang, Y.; Yu, H.; York, D.; Cui, Q.; Elstner, M. J. Phys. Chem. A 2007, 111 , 10861. 56. Niehaus, T. A. J. Mol. Struct . (Theochem) 2009, 914 , 38. 57. Warshel, A.; Levitt, M. J. Mol. Biol . 1976, 103 , 227.

REFERENCES

307

58. Han, W.; Elstner, M.; Jalkanen, K. J.; Frauenheim, T.; Suhai, S. Int. J. Quantum Chem. 2000, 78 , 459. 59. Cui, Q.; Elstner, M.; Kaxiras, E.; Frauenheim, T.; Karplus, M. J. Phys. Chem. B 2001, 105 , 569. 60. Seabra, G. D. M.; Walker, R. C.; Elstner, M.; Case, D. A.; Roitberg, A. E. J. Phys. Chem. A 2007, 111 , 5655. 61. Hu, H.; Elstner, M.; Hermans, J. Proteins Struct. Funct. Genet. 2003, 50 , 451. 62. Liu, H.; Elstner, M.; Kaxiras, E.; Frauenheim, T.; Hermans, J.; Yang, W. Proteins Struct. Funct. Genet. 2001, 44 , 484. 63. Riccardi, D.; Schaefer, P.; Yang, Y.; Yu, H.; Ghosh, N.; Prat-Resina, X.; K¨onig, P.; Li, G.; Xu, D.; Guo, H.; Elstner, M.; Cui, Q. J. Phys. Chem. B 2006, 110 , 6458. 64. K¨onig, P. H.; Ghosh, N.; Hoffmann, M.; Elstner, M.; Tajkhorshid, E.; Frauenheim, T.; Cui, Q. J. Phys. Chem. A 2006, 110 , 548. 65. Im, W.; Berneche, S.; Roux, B. J. Chem. Phys. 2001, 114 , 2924. 66. Cui, Q. Theor. Chem. Acc. 2006, 116 , 51.

10

Introduction to Effective Low-Energy Hamiltonians in Condensed Matter Physics and Chemistry BEN J. POWELL Centre for Organic Photonics and Electronics, School of Mathematics and Physics, The University of Queensland, Queensland, Australia

In this chapter I discuss some simple effective Hamiltonians that have widespread applications to solid-state and molecular systems. Although meant to be an introduction to a beginning graduate student, I hope that it may also help to break down the divide between the physics and chemistry literatures. After a brief introduction to second quantization notation (Section 10.1), which is used extensively, I focus on the “four H’s”: the H¨uckel (or tight binding; Section 10.2), Hubbard (Section 10.3), Heisenberg (Section 10.4), and Holstein (Section 10.6) models. These models play central roles in our understanding of condensed matter physics, particularly for materials where electronic correlations are important but are less well known to the chemistry community. Some related models, such as the Pariser–Parr–Pople model, the extended Hubbard model, multiorbital models, and the ionic Hubbard model, are also discussed in Section 10.6. As well as their practical applications, these models allow us to investigate electronic correlations systematically by “turning on” various interactions in the Hamiltonian one at a time. Finally, in Section 10.7, I discuss the epistemological basis of effective Hamiltonians and compare and contrast this approach with ab initio methods before discussing the problem of the parameterization of effective Hamiltonians. As this chapter is intended to be introductory, I do not attempt to make frequent comparisons to the latest research problems; rather, I compare the predictions of model Hamiltonians with simple systems chosen for pedagogical reasons. Similarly, references have been chosen for their pedagogical and historical value rather than on the basis of scientific priority. Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

309

310

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Given the similarity in the problems addressed by theoretical chemistry and theoretical condensed matter physics, surprisingly few advanced texts discuss the interface of two subjects. Unfortunately, this leads to many cultural differences between the fields. Nevertheless, some textbooks do try to bridge the gap, and the reader in search of more than the introductory material presented here is referred to a book by Fulde1 and several other chapters in this book: Chapter 6 describes the state of the art in using density functional theory and ab initio Hartree–Fockbased approaches to the a priori evaluation of properties of systems involving strongly correlated electrons, and Chapter 4 describes ab initio approaches based on quantum Monte Carlo. 10.1 BRIEF INTRODUCTION TO SECOND QUANTIZATION NOTATION

The models discussed in this chapter are easiest to understand if one employs the second quantization formalism. In this section we introduce its basic formalism briefly and informally. More details may be found in many textbooks (e.g., Schatz and Ratner2 or Mahan3 ). Readers already familiar with this notation may wish to skip this section, although the last two paragraphs do define some nomenclature that is used throughout the chapter. 10.1.1 Simple Harmonic Oscillator

Let us begin by considering a particle of mass m moving in a one-dimensional harmonic potential: V (x) = 12 kx 2

(10.1)

This may be familiar as the potential experienced by an ideal spring displaced from its equilibrium position by a distance x , in which context k is known as the spring constant.4 Equation (10.1) is also the potential felt by an atom as it is displaced (by a small amount) from its equilibrium position in a molecule.5 Classically, this problem is straightforward to solve,4 and as well as the trivial solution, one finds that the particle may oscillate with a resonant frequency √ ω = k/m. The time-independent Schr¨odinger equation for a simple harmonic oscillator is therefore 2 1 pˆ + mω2 xˆ 2 ψn = En ψn (10.2) Hˆ sho ψn ≡ 2m 2 where pˆ = (/i)(∂/∂x) is the particle’s momentum and ψn is the nth wavefunction or eigenfunction, which has energy, or eigenvalue, En . This problem is solved in many introductory texts on quantum mechanics6 using the standard methods of “first quantized” quantum mechanics. However, a more elegant way to solve this problem is to introduce the ladder operator:

BRIEF INTRODUCTION TO SECOND QUANTIZATION NOTATION

aˆ ≡ and its hermitian conjugate: aˆ † ≡

pˆ mω xˆ + i √ 2 2mω

mω pˆ xˆ − i √ 2 2mω

311

(10.3)

(10.4)

One of the most important features of quantum mechanics is that momentum and ˆ x] ˆ ≡ pˆ xˆ − xˆ pˆ = −i). From this commutaposition do not commute6 (i.e., [p, tion relation it is straightforward to show that 1 Hˆ sho = ω aˆ † aˆ + (10.5) 2 and [a, ˆ aˆ † ] ≡ aˆ aˆ † − aˆ † aˆ = 1

(10.6)

ˆ = ω(aˆ † aˆ + 12 ), aˆ = ω[aˆ † , a] ˆ aˆ = −ωa, ˆ in One can also show that [Hˆ sho , a] a similar manner. Therefore, [Hsho , a]ψ ˆ n = −ωaψ ˆ n , and hence Hˆ sho aψ ˆ n = (En − ω)aψ ˆ n

(10.7)

Equation (10.7) tells us that aψ ˆ n is an eigenstate of Hˆ sho with energy En − ω, provided that aψ ˆ n = 0. That is, the operator aˆ moves the system from one eigenstate to another whose energy is lower by ω; thus, aˆ is known as the lowering or destruction operator. Note that for any wavefunction φ, φ|pˆ 2 |φ ≥ 0 and φ|xˆ 2 |φ ≥ 0. Therefore, it follows from Eq. (10.2) that En ≥ 0 for all n. Hence, there is a lowest energy state, or ground state, which we will denote as ψ0 . Therefore, there is a limit to how often we can keep lowering the energy of the state, (i.e., aψ ˆ 0 = 0). We can now calculate the ground-state energy of the harmonic oscillator, (10.8) Hˆ sho ψ0 = ω aˆ † aˆ + 12 ψ0 = 12 ω In the same way as we derived Eq. (10.7), one can easily show that Hsho aˆ † ψn = (En + ω)aˆ † ψn . Therefore, aˆ † moves us up the ladder of states that aˆ moved us down. Hence aˆ † is known as a raising or creation operator. Thus, we have √ (10.9) aˆ † ψn = n + 1 ψn+1 and √ aψ ˆ n = nψn−1 (10.10) where the terms inside the radicals are required for the correct normalization of √ the wavefunctions.7 Therefore, ψn = (1/ n!)(aˆ † )n ψ0 and

312

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

En = ω n + 12

(10.11)

Notice that above we solved the simple harmonic oscillator (i.e., calculated the energies of all of the eigenstates) without needing to find explicit expressions for any of the first quantized eigenfunctions, ψn . This general feature of the second quantized approach is extremely advantageous when we are dealing with the complex many-body wavefunctions typical in condensed matter physics and chemistry. 10.1.2 Second Quantization for Light and Matter

We can extend the second quantization formalism to light and matter. Let us first consider bosons, which are not subject to the Pauli exclusion principle (e.g., phonons, photons, deuterium nuclei, 4 He atoms). We define the bosonic field ˆ annihilates a operator bˆ † (r) as creating a boson at position r; similarly, b(r) boson at position r. The bosonic field operators obey the commutation relations ˆ ˆ )] = 0, [bˆ † (r), bˆ † (r )] = 0, and [b(r), b(r ˆ [b(r), bˆ † (r )] = δ(r − r )

(10.12)

This is just the generalization of Eq. (10.6) for the field operators. We can create any state by acting products, or sums of products, of the bˆ † (r) on the vacuum state (i.e., the state that does not contain any bosons), which is usually denoted as |0. Many body wavefunctions for fermions (e.g., electrons, protons, neutrons, 3 He atoms) are complicated by the need for the antisymmetrization of the wavefunction (i.e., the wavefunction must change sign under the exchange of any two ˆ † (r) and fermions). Therefore, if we introduce the fermionic field operators ψ ˆ ψ(r), which, respectively, create and annihilate fermions at position r, we must make sure that any wavefunction that we can make by acting on some set of these operators on the vacuum state is properly antisymmetrized. This is ensured8 if one insists that the field operators anticommute, that is, if ˆ ψ ˆ † (r ) + ψ ˆ † (r )ψ(r) ˆ ˆ ˆ † (r )} ≡ ψ(r) = δ(r − r ) {ψ(r), ψ ˆ ˆ )} = 0 {ψ(r), ψ(r

ˆ†

ˆ†

{ψ (r), ψ (r )} = 0

(10.13) (10.14) (10.15)

This guarantee of an antisymmetrized wavefunction is one of the most obvious advantages of the second quantization formalism, as it is much easier than having to deal with the Slater determinants that are typically used to ensure the antisymmetrization of the many-body wavefunction in the first quantized formalism.2 For any practical calculation one needs to work with a particular basis set, {φi (r)}. The field operators can be expanded in an arbitrary basis set as

BRIEF INTRODUCTION TO SECOND QUANTIZATION NOTATION

ˆ ψ(r) =

313

cˆi φi (r)

(10.16)

cˆi† φ∗i (r)

(10.17)

i

ˆ † (r) = ψ

i

Thus, cˆi(†) annihilates (creates) a fermion in the state φi (r). These operators also obey fermionic anticommutation relations, {cˆi , cˆj† } = δij

(10.18)

{cˆi , cˆj } = 0

(10.19)

{cˆi† , cˆj† } = 0

(10.20)

As fermions obey the Pauli exclusion principle, there can be at most one fermion in a given state. We denote a state in which the i th basis function contains zero (one) particles by |0i (|1i ). Therefore, cˆi |1i = |0i cˆi |0i = 0 † cˆi |0i = |1i cˆi† |1i = 0

(10.21)

It is important to realize that the number 0 is very different from the state |0i . Any operator acting on a system of fermions can be expressed in terms of the cˆ operators. A particularly important example is the number operator, nˆ i ≡ cˆi† cˆi , which simply counts the number of particles in the state i , as can be confirmed by explicit calculation from Eqs. (10.21). The total number of particles

in the system is therefore simply the expectation value of the operator Nˆ = i nˆ i = i cˆi† cˆi . Importantly, because we can write any operator in terms of the cˆ operators, we can calculate any observable from the expectation value of some set of cˆ operators. Thus we have access to a complete description of the system from the second quantization formalism. Further, we can always write the wavefunction in terms of the cˆ operators if an explicit description of the wavefunction is required. For example, the sum of Slater determinants, φ (r ) φ2 (r1 ) + β φ3 (r1 ) φ4 (r1 ) (r1 , r2 ) = α 1 1 (10.22) φ1 (r2 ) φ2 (r2 ) φ3 (r2 ) φ4 (r2 ) describes the same state as | = (αcˆ1 cˆ2 + βcˆ3 cˆ4 )|0

(10.23)

where |0 = |01 , 02 , 03 , 04 , . . . is the vacuum state, as (r1 , r2 ) = r1 , r2 | (cf., e.g., Ref. 7). Often, in order to describe solid-state and chemical systems, one needs to describe a set of N electrons whose behavior is governed by a Hamiltonian of the form

314

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

H =

N n=1

⎤ 2 2 1 ∇ n ⎣− + U (rn ) + V (rn − rm )⎦ 2m 2 m=n ⎡

(10.24)

where V (rn − rm ) is the potential describing the interactions between electrons and U (ri ) is an external potential (including interactions with ions or nuclei, which may often be considered to be stationary on the time scales relevant to electronic processes, although we discuss effects due to the displacement of the nuclei in Section 10.6). In terms of our second quantization operators, this Hamiltonian may be written Hˆ = −

ij

tij cˆi† cˆj +

1 Vij kl cˆi† cˆk† cˆl cˆj 2

(10.25)

ij kl

where tij = −

d

Vij kl =

rφ∗i (r)

3

d 3r1

2 ∇ 2 − + U (r) φj (r) 2m

d 3 r2 φ∗i (r1 )φj (r1 )V (r1 − r2 )φ∗k (r2 )φl (r2 )

(10.26) (10.27)

and the labels i, j, k , and l are taken to define the spin as well as the basis function. This is exact, provided that we have an infinite complete basis. But practical calculations require the use of finite basis sets and often use incomplete basis sets. The simplest approach is simply to ignore this problem and calculate tij and Vij kl directly from the finite basis set. However, this is often not the best approach. We delay until Section 10.7 a detailed discussion of why this is and of the deep philosophical issues that it raises. We also delay until Section 10.7 discussion of how to calculate these parameters. Until then we simply assume that tij , Vij kl , and similar parameters required are known and focus instead on how to perform practical calculations using models of the form of Eq. (10.25) and closely related Hamiltonians. In what follows we assume that the states created by the cˆi† operators form an orthonormal basis. This greatly simplifies the mathematics but differs from the approach usually taken in introductory chemistry textbooks, as most quantum chemical calculations are performed in nonorthogonal bases for reasons of computational expedience. ¨ 10.2 HUCKEL OR TIGHT-BINDING MODEL

The simplest model with the form of Eq. (10.25) is usually called the H¨uckel model in the context of molecular systems9 and the tight-binding model in the context of crystals.10 In these models one makes the approximation that Vij kl = 0 for all i, j, k , and l . Therefore, these models explicitly neglect interactions between

¨ HUCKEL OR TIGHT-BINDING MODEL

315

electrons. The models are identical, but slightly different notation is standard in the different traditions. We assume that our basis set consists of orbitals centered on particular sites, as we will for all of the models considered in this chapter. These sites might, for example, be atoms in a molecule or solid, chemical groups within a molecule, p-d hybrid states in a transition metal oxide, entire molecules in a molecular crystal, or even larger structures. We will often use a nomenclature motivated by the case where the sites are atoms below; however, this does not mean that the mathematics is only applicable to that case. In the simplest case of only one orbital per spin state on each site † Hˆ tb = − tij cˆiσ cˆj σ (10.28) ij σ (†) annihilates (creates) an electron with spin σ in an orbital centered on where cˆiσ site i .

10.2.1 Molecules (the Huckel Model) ¨

The standard notation in this context is tii = −αi , tij = −βij if sites i and j are connected by a chemical bond, and tij = 0 otherwise. Note that the subscripts on α and β are also often dropped, but they are usually implicit; if the molecule contains more than one species of atom, the α’s will clearly be different on the different species and the β’s will depend on the species of each of the atoms between which the electron is hopping. Therefore, † † αi cˆiσ cˆiσ + βij cˆiσ cˆj σ (10.29) Hˆ H¨uckel = ij σ

iσ

where ij serves to remind us that the sum is only over those pairs of atoms joined by a chemical bond. Note that βij is typically negative. 10.2.1.1 Molecular Hydrogen Clearly, in H2 there is only a single atomic species. In this case one can set αi = α for all i without loss of generality. Further, as there is also only a single bond, we may choose βij = β, giving

Hˆ H¨uckel = α

σ

(nˆ 1σ + nˆ 2σ ) + β

σ

† † (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ )

(10.30)

where we have labeled the two atomic sites 1 and 2. This Hamiltonian has two eigenstates: one is known as the bonding orbital , 1 † † + cˆ2σ )|0 |ψbσ = √ (cˆ1σ 2 and the other is known as the antibonding orbital ,

(10.31)

316

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

antibonding orbital β β atomic orbital

atomic orbital bonding orbital

Fig. 10.1 (color online) Energy levels of the atomic and molecular orbitals in the H¨uckel description of H2 . The bonding orbital is |β| lower in energy than the atomic orbital, whereas the antibonding orbital is |β| higher in energy than the atomic orbital. Therefore, neutral H2 is stabilized by 2|β| relative to 2H.

1 † † |ψaσ = √ (cˆ1σ − cˆ2σ )|0 2

(10.32)

The bonding orbital has energy α + β, whereas the antibonding orbital has energy α − β. Recall that β < 0; therefore, every electron in the bonding state stabilizes the molecule by an amount |β|, whereas electrons in the antibonding state destabilize the molecule by an amount |β|, hence the nomenclature.† This is sketched in Fig. 10.1. Because Vij kl = 0, the electrons are noninteracting, so the molecular orbitals are not dependent on the occupation of other orbitals. Therefore to calculate the total energy of the ground state of the molecule, one simply fills up the states, starting with the lowest-energy states and respecting the Pauli exclusion principle. If the two protons are infinitely separated, β = 0 and the system has total energy N α, where N is the total number of electrons. H2 + has only one electron, which, in the ground state, will occupy the bonding orbital, so H2 + has a binding energy of β. H2 has two electrons; in the ground state these electrons have opposite spin and therefore can both occupy the bonding orbital. Thus, H2 has a binding energy of 2β. H2 − has three electrons, so while two can occupy the bonding state, one must be in the antibonding state; therefore, the binding energy is only β. Finally, H2 2− has four electrons, so one finds two in the each molecular orbital. Therefore, the bonding energy is zero: the molecule is predicted to be unstable. Thus, the H¨uckel model makes several predictions: neutral H2 is predicted to be significantly more stable than any of the ionic states; the two singly ionic species are predicted to be equally stable; and the doubly cationic species is predicted to be unstable. Further, the lowest optical absorption is expected to correspond to the transition between the bonding orbital and the antibonding † Note that in a nonorthogonal basis, the antibonding orbital may be destabilized by a greater amount than the bonding orbital is stabilized.

¨ HUCKEL OR TIGHT-BINDING MODEL

317

orbital. The energy gap for this transition is 2|β|. Therefore, the lowest optical absorption is predicted to be the same in the neutral species as in the singly cationic species. Further, this absorption is predicted to occur at a frequency with the same energy as the heat of formation for the neutral species. Although these predictions do capture qualitatively what is observed experimentally, they are certainly not within chemical accuracy (i.e., within kB T ∼ 1 kcal mol−1 ∼ 0.03 eV for T = 300 K). For example, the experimentally determined binding energies9 are 2.27 eV for H2 + , 4.74 eV for H2 , and 1.7 eV for H2 − , while H2 2− is indeed unstable. 10.2.1.2 π-Huckel Theory of Benzene For many organic molecules a model ¨ known as π-H¨uckel theory is very useful. In π-H¨uckel theory one considers only the π-electrons. A simple example is a benzene molecule. The hydrogen atoms have no π-electrons and therefore are not represented in the model. This leaves only the carbon atoms, so again we can set αi = α and βij = β. Because of the ring geometry of benzene (and assuming that the molecule is planar), the Hamiltonian becomes † † nˆ iσ + β (cˆiσ cˆi+1σ + cˆi+1σ cˆiσ ) (10.33) Hˆ H¨uckel = α iσ

iσ

where the addition in the site index is defined modulo six (i.e., site seven is site one). For benzene we have six solutions per spin state: 1 † † † † † † |ψA2u = √ (cˆ1σ + cˆ2σ + cˆ3σ + cˆ4σ + cˆ5σ + cˆ6σ )|0 6 1 † † † † † † + εcˆ2σ + ε2 cˆ3σ − cˆ4σ − εcˆ5σ − ε2 cˆ6σ )|0 |ψE1g = √ (cˆ1σ 6 1 † † † † † † = √ (c ˆ1σ − ε2 cˆ2σ − εcˆ3σ − cˆ4σ + ε2 cˆ5σ + εcˆ6σ )|0 |ψE1g 6 1 † † † † † † + ε2 cˆ2σ − εcˆ3σ + cˆ4σ + ε2 cˆ5σ − εcˆ6σ )|0 |ψE2u = √ (cˆ1σ 6 1 † † † † † † = √ (c + ε2 cˆ3σ + cˆ4σ − εcˆ5σ + ε2 cˆ6σ )|0 |ψE2u ˆ1σ − εcˆ2σ 6 and

1 † † † † † † − cˆ2σ + cˆ3σ − cˆ4σ + cˆ5σ − cˆ6σ )|0 |ψB2g = √ (cˆ1σ 6

where ε = eiπ/3 . These wavefunctions are sketched in Fig. 10.2. The energies of = α − |β|, EE these states are EA2u = α − 2|β|, EE1g = EE1g 2u = EE2u = α + 11,12 for the group |β|, and EB2g = α + 2|β|. The subscripts are symmetry labels D6h ; one should recall that because we are dealing with π-orbitals, all of the orbitals sketched here are antisymmetric under reflection through the plane of

318

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Fig. 10.2 (color online) Molecular orbitals for benzene from π-H¨uckel theory. Different colors indicate a change in sign of the wavefunction. In the neutral molecule the A2u and both E1g states are occupied, while the B2g and E2u states are virtual. Note that we have taken real superpositions9 of the twofold degenerate states in these plots.

the page. The degenerate (E1g and E2u ) orbitals are typically written or drawn rather differently (see Lowe and Peterson9 ). However, any linear combination of degenerate eigenstates is also an eigenstate; this representation was chosen as it highlights the symmetry of the problem. For a more detailed discussion of this problem, see Coulson’s Valence.13 10.2.1.3 Electronic Interactions and Parameterization of the Huckel Model ¨ As noted above, the H¨uckel model does not explicitly include interactions between electrons. This leads to serious qualitative and quantitative failures of the model, some of which we have seen above and discuss further below. However, given the (mathematical and conceptual) simplicity and the computational economy of the method, one would like to improve the method as far as possible. So far we have treated the theory as parameter free. However, if we treat the model as a semiempirical method instead, we can include some of the effects due to electron–electron interactions without greatly increasing the computational cost of the method. For example, one can make α dependent on the charge on the atom. This is reasonable, as the more electrons we put on an atom, the more difficult it is to add another, due to the additional Coulomb repulsion from the extra electrons. The simplest way to account for this is by use of the ω technique,9 where one replaces

αi → αi = αi + ω(q0 − qi )β

(10.34)

¨ HUCKEL OR TIGHT-BINDING MODEL

319

where qi is the charge on atom i, q0 is a (fixed) reference charge, and ω is a parameter. The ω technique suppresses the unphysical fluctuations of the electron density, which are often predicted by the H¨uckel model (cf. the discussion of H2 above). Similar techniques can also be applied to β. These parameterizations only slightly complicate the model and do not lead to a major inflation of the computational cost, but can significantly improve the accuracy of the predictions of the H¨uckel model.14 10.2.2 Crystals (the Tight-Binding Model)

For infinite systems it is necessary to work with a fixed chemical potential rather than a fixed particle number. Therefore, before we discuss the tight-binding model, we briefly review the chemical potential (see also the discussion by Aktins and de Paula5 of the chemical potential in a chemical context). 10.2.2.1 Chemical Potential When one is dealing with a large system, keeping track of the number of particles can become difficult. This is particularly true in the thermodynamic limit, where the number of electrons Ne ≡ Nˆ → ∞ and the volume of the system V → ∞ in such a way as to ensure that the electronic density, ne = Ne /V , remains constant. Lagrange multipliers15 are a powerful and general method for imposing constraints on differential equations (such as the Schr¨odinger equation) without requiring the solution of integrodifferential equations. Briefly, consider a function, f (x, y, z, . . .) that we wish to extremize (minimize or maximize) subject to a constraint which means that x, y, z, . . . are no longer independent. In general, we may write the constraint in the form φ(x, y, z, . . .) = 0. This allows us to define the function g(x, y, z . . . , λ) ≡ f (x, y, z, . . .) + λφ(x, y, z, . . .), where λ is known as a Lagrange multiplier. One may show15 that the extremum of g(x, y, z, . . . , λ) with respect to x, y, z, . . . and λ is the extremum of f (x, y, z, . . .) with respect to x, y, z, . . . subject to the constraint that φ(x, y, z, . . .) = 0. Typically, the problem we wish to solve in chemistry and condensed matter physics is to minimize the free energy, F (which reduces to the energy, E , at T = 0) subject to the constraint of having a fixed number of electrons (determined by the chemistry of the material in question). This suggests that one should simply introduce a Lagrange multiplier to resolve the difficulty of constraining the number of electrons in the thermodynamic limit. A suitable constraint could be introduced by adding the term λ(N0 − Nˆ ) to the Hamiltonian, where N0 is the chemically required number of electrons, and requiring that the free energy is an extremum with respect to λ. However, one can also impose the same constraint and achieve additional physical insight by subtracting the term μNˆ from the Hamiltonian and requiring that

N0 = −

∂F ∂μ

(10.35)

320

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

The chemical potential (for electrons), μ, is then given by μ=−

∂F ∂Ne

(10.36)

Therefore, specifying a system’s chemical potential is equivalent to specifying the number of electrons, but provides a far more powerful approach for bulk systems. Physically, this approach is equivalent to thinking of the system as being attached to an infinite bath of electrons (i.e., one is working in the grand canonical ensemble).16 Thus, the Fermi distribution for the system is given by f (E, T ) =

1 1+

e(E−μ)/kB T

(10.37)

Therefore, at T = 0 all of the states with energies lower than the chemical potential are occupied, and all of the states with energies greater than the chemical potential are unoccupied. Therefore, the Fermi energy, EF = μ(T = 0). Note that as F is temperature dependent, Eq. (10.36) shows that, in general, μ will also be temperature dependent.† Nevertheless, Eq. (10.37) gives a clear interpretation of the chemical potential at any nonzero temperature: μ(T ) is the energy of a state with a 50% probability of occupation at temperature T . 10.2.2.2 Tight-Binding Model For periodic systems (crystals) one usually refers to the H¨uckel model as the tight-binding model. Often, one only considers models with nearest-neighbor terms; that is, one takes tii = −εi , tij = t if i and j are at nearest-neighbor sites, and tij = 0 otherwise. Thus, for nearest-neighbor hopping only, † † Hˆ tb − μNˆ = −t cˆiσ cˆj σ + (εi − μ)cˆiσ cˆiσ (10.38) ij σ

iσ

where μ is the chemical potential and ij indicates that the sum is over nearest neighbors only. Further, if we consider materials with only a single atomic species, we can set εi = 0, yielding † † Hˆ tb − μNˆ = −t cˆiσ cˆj σ − μ cˆiσ cˆiσ (10.39) ij σ

iσ

10.2.2.3 One-Dimensional Chain The simplest infinite system is a chain with nearest-neighbor hopping only. As we are on a chain, the sites have a natural ordering and the Hamiltonian may be written as

† In

contrast, as EF is only defined at T = 0, it is not temperature dependent.

¨ HUCKEL OR TIGHT-BINDING MODEL

Hˆ tb − μNˆ = −t

† † (cˆiσ cˆi+1σ + cˆi+1σ cˆiσ ) − μ

iσ

† cˆiσ cˆiσ

321

(10.40)

iσ

We can solve this model exactly by performing a lattice Fourier transform. We begin by introducing the reciprocal space creation and annihilation operators:

and

1 cˆkσ eikRi cˆiσ = √ N k

(10.41)

1 † −ikRi † cˆiσ =√ cˆkσ e N k

(10.42)

where k is the lattice wavenumber or crystal momentum and Ri is the position of the i th lattice site. Therefore, 1 † cˆ cˆk σ ei(k −k)Ri [−t (eik a + e−ika ) − μ] Hˆ tb − μNˆ = N kσ

(10.43)

ikk σ

where a is the lattice constant (i.e., the distance between neighboring sites Ri and Ri+1 ). As (1/N ) i ei(k −k)Ri = δ(k − k),17 therefore, † † Hˆ tb − μNˆ = [−2t cos(ka)cˆkσ cˆkσ − μcˆkσ cˆkσ ] kσ

=

† (εk − μ)cˆkσ cˆkσ

(10.44)

kσ

where εk = −2t cos ka is known as the dispersion relation. Notice that Eq. (10.44) is diagonal (i.e., it depends only on number operator terms, † cˆkσ ). Therefore, the energy is just the sum of εk for the states kσ that nkσ = cˆkσ are occupied, and we have solved the problem. We plot the dispersion relation in Fig. 10.3a. For a tight-binding model, calculating the dispersion relation is equivalent to solving the problem. The chemical potential, μ, must be chosen to ensure that there are the physically required number of electrons. Changing the chemical potential has the effect of moving the Fermi energy up or down the band and hence changing the number of electrons in the system. For example (cf. Fig. 10.3b to d), in the problem above, the half-filled band corresponds to μ = 0, the quarter-filled band corresponds to μ = −t, and the three-quarter-filled band corresponds to μ = t. 10.2.2.4 Square, Cubic, and Hypercubic Lattices In more than one dimension the notation becomes slightly more complicated, but the mathematics does not necessarily become any more difficult. The simplest generalization of the chain we have solved above is the two-dimensional square lattice, where † † cˆiσ cˆj σ − μ cˆiσ cˆiσ (10.45) Hˆ tb − μNˆ = −t ij σ

iσ

322

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

(a) 2t

εk

t

0

–t

–2t –3

–2

–1

0 ka

1

2

3

–3

–2

–1

0 ka

1

2

3

–3

–2

–1

0 ka

1

2

3

(b) 2t

εk

t 0

–t

–2t

(c) 2t

εk

t

0

t

–2t

Fig. 10.3 (color online) (a) The dispersion relation, εk = −2t cos(ka), of the onedimensional tight-binding chain with nearest neighbour hopping only. (b) Shaded area shows the filled states for μ = t. (c) Shaded area shows the filled states for μ = t. (d) Shaded area shows the filled states for μ = t.

¨ HUCKEL OR TIGHT-BINDING MODEL

323

(d) 2t

εk

t

0

–t

–2t –3

–2

–1

0 ka

1

2

3

Fig. 10.3 (color online) (continued )

Recall that ij indicates that the sum is over nearest neighbors only. To solve this problem we simply generalize our reciprocal lattice operators to 1 cˆkσ eik·Ri cˆiσ = √ N k

(10.46)

1 † −ik·Ri † =√ cˆkσ e cˆiσ N k

(10.47)

where k = (kx , ky ) is the lattice wavevector or crystal momentum and Ri = (xi , yi ) is the position of the i th lattice site. We then simply repeat the process we used to solve the one-dimensional chain. As the lattice only contains bonds in perpendicular directions, the calculations for the x and y directions go through independently and one finds that Hˆ tb − μNˆ =

† (εk − μ)cˆkσ cˆkσ

(10.48)

kσ

where the dispersion relation is now εk = −2t (cos kx ax + cos ky ay ) and aν represents the lattice constants in the ν direction. A three-dimensional cubic lattice is not any more difficult. In this case, k = (kx , ky , kz ) and the solution is of the form of Eq. (10.48) but with εk = −2t (cos kx ax + cos ky ay + cos kz az ). Indeed, as long as we keep all the bonds mutually perpendicular, we can keep generalizing this solution to higher dimensions. This may sound somewhat academic, as no materials live in more than three dimensions, but the infinite-dimensional hypercubic lattice has become important in recent years because many models that include interactions can be solved exactly in infinite dimensions, as we discuss in Section 10.3.4.2.

324

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

(a)

(b)

(c)

(d)

Fig. 10.4 (color online) (a) Hexagonal (triangular), (b) anisotropic triangular, (c) honeycomb, and (d) kagome lattices. The hexagonal lattice contains two inequivalent types of lattice site, some of which are labeled A and B. The sets of equivalent sites are referred to as sublattices.

10.2.2.5 Hexagonal and Honeycomb Lattices Even if the bonds are not all mutually perpendicular the solution to the tight-binding model can still be found by Fourier-transforming the Hamiltonian. Three important examples of such lattices are the hexagonal lattice (which is often referred to as the triangular lattice, although this is formally incorrect), the anisotropic triangular lattice, and the honeycomb lattice, which are sketched in Fig. 10.4. For each lattice the solution is of the form of Eq. (10.48). For the hexagonal lattice,

√ kx ax 3 ky ay cos εk = −2t cos kx ax − 4t cos 2 2

(10.49)

For the anisotropic triangular lattice, εk = −2t (cos kx ax + cos ky ay ) − 2t cos(kx ax + ky ay )

(10.50)

The honeycomb lattice has an important additional subtlety: that there are two inequivalent types of lattice site (see Fig. 10.4c), which it is worthwhile to work through. We begin by introducing new operators, cˆiνσ , which annihilate an electron with spin σ on the νth sublattice in the i th unit cell, where ν = A or B.

¨ HUCKEL OR TIGHT-BINDING MODEL

325

Therefore, we can rewrite Eq. (10.45) as Hˆ tb = −t

† cˆiAσ cˆj Bσ + cˆj†Bσ cˆiAσ

ij σ

† cˆ 0 1 cˆiAσ iAσ = −t 1 0 cˆiBσ cˆiBσ ij σ

† cˆ 0 kAσ = −t cˆkBσ h∗k kσ

√ 3ky )a/2

where hk = eikx a + e−i(kx + εk = ±t|hk |

= ±t 3 + 2 cos

√

hk 0

√ 3ky )a/2 .

+ e−i(kx −

√ 3 ky a + 4 cos

cˆkAσ cˆkBσ

(10.51)

Therefore,

3 ky a 3kx a cos 2 2

(10.52)

We plot this dispersion relation in Fig. 10.5. The most interesting features of this band structure are the Dirac points. The Dirac points are √ located at k = nK + mK , where √ n and m are integers, K = (2π/3a, 2π/3 3a), and K = (2π/3a, −2π/3 3a). To see why these points are interesting, consider a point K + q in the neighborhood of K. Recalling that cos(K + q) = cos K − q sin K + 12 q 2 cos K + · · ·, one finds that for small |q|, εK+q = vF |q| + · · ·

(10.53)

3 2 1 εk t

0 –1 –2 –3 3

2

1 ky

0 –1 –2 –3

–3

–2

–1

1

0

2

kx

Fig. 10.5 Dirac dispersion of the honeycomb lattice.

3

326

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

where vF = 3ta/2 is known as the Fermi velocity. This result should be compared with the relativistic result Ek2 = m2 c4 + 2 c2 |k|2

(10.54)

where m is a particle’s rest mass and c is the speed of light. This reduces to the famous E = mc2 for k = 0, but for massless particles such as photons, one finds that Ek = c|k|. Thus, the low-energy electronic excitations on a honeycomb lattice behave as if they are massless relativistic particles, with the Fermi velocity playing the role of the speed of light in the theory. Therefore, much excitement18 has been caused by the recent synthesis of atomically thick sheets of graphene,19 in which carbon atoms form a honeycomb lattice. In graphene vF 1 × 106 m s−1 , two orders smaller than the speed of light in the vacuum. This has opened the possibility of exploring and controlling “relativistic” effects in a solid-state system.18 10.3 HUBBARD MODEL

So far we have neglected electron–electron interactions. In real materials the electrons repel each other, due to the Coulomb interaction between them. The most obvious extension to the tight-binding model that describes some of the electron–electron interactions is to allow only on-site interactions (i.e., if Vij kl = 0 if and only if i, j, k, and l all refer to the same orbital). For one orbital per site we then have the Hubbard model, Hˆ Hubbard = −t

ij σ

† cˆiσ cˆj σ + U

† † cˆi↑ cˆi↑ cˆi↓ cˆi↓

(10.55)

i

where we have assumed nearest-neighbor hopping only. It follows from Eq. (10.27) that U > 0 (i.e., electrons repel one another). 10.3.1 Two-Site Hubbard Model: Molecular Hydrogen H2

The two-site Hubbard model is a nice context in which to consider some of the basic properties of the chemical bond. The two-body term in the Hubbard model greatly complicates the problem relative to the tight-binding model. Therefore, the Hubbard model also presents a nice context in which to introduce one of the most important tools in theoretical physics and chemistry: mean-field theory. 10.3.1.1 Mean-Field Theory, the Hartree–Fock Approximation, and Molecular Orbital Theory To construct a mean-field theory of any two as-yet-unspecified physical quantities, m = m + δm and n = n + δn, where n(m) is the mean value of n (m) and δn (δm) are the fluctuations about the mean, which are assumed to be small, one notes that

HUBBARD MODEL

327

mn = (m + δm)(n + δn) = m n + m δn + δmn + δm δn ≈ m n + m δn + δmn

(10.56)

Thus, mean-field approximations neglect terms that are quadratic in the fluctuations. Hartree theory is a mean field in the electron density; that is, cˆα† cˆβ cˆγ† cˆδ = [cˆα† cˆβ + (cˆα† cˆβ − cˆα† cˆβ )][cˆγ† cˆδ + (cˆγ† cˆδ − cˆγ† cˆδ )] ≈ cˆα† cˆβ cˆγ† cˆδ + cˆα† cˆβ cˆγ† cˆδ − cˆα† cˆβ cˆγ† cˆδ

(10.57)

However, it was quickly realized that this does not allow for electron exchange; that is, one should also include averages such as cˆα† cˆδ . Therefore, a better mean-field theory is Hartree–Fock theory, which includes these terms. However, because of the limited interactions included in the Hubbard model, Hartree theory is identical to Hartree–Fock theory if one assumes that spin-flip terms are † cˆi↓ = 0), which we will. negligible (i.e., that cˆi↑ The Hartree–Fock approximation to the Hubbard Hamiltonian is therefore † † † † † cˆiσ cˆj σ + U cˆi↓ + cˆi↑ cˆi↑ cˆi↓ cˆi↓ Hˆ HF = −t cˆi↑ cˆi↑ cˆi↓ ij σ

= −t

ij σ

i

† cˆiσ cˆj σ

+U

† † cˆi↑ cˆi↓ cˆi↓ − cˆi↑ † ni↑ cˆi↓ cˆi↓

+

† ni↓ cˆi↑ cˆi↑

− ni↑ ni↓

(10.58)

i

† where niσ = cˆiσ cˆiσ . Thus, we have a Hamiltonian for a single electron moving in the mean field of the other electrons. Note that this Hamiltonian is equivalent to the ω-method parameterization of the H¨uckel model [see Section 10.2.1.3, particularly Eq. (10.34)] if we set ω = U/β. Thus, the ω method is just a parameterization of the Hubbard model solved in the Hartree–Fock approximation. The Hubbard model with two sites and two electrons can be taken as a model 0 , the two elecfor molecular hydrogen. In the Hartree–Fock ground state, |HF trons have opposite spin and each occupies the bonding state, which we found to be the ground state of the H¨uckel model in Section 10.2.1.1: † † † † 0 |HF = |ψb↓ ⊗ |ψb↑ = 12 (cˆ1↑ + cˆ2↑ )(cˆ1↓ + cˆ2↓ )|0 † † † † † † † † cˆ1↓ + cˆ1↑ cˆ2↓ − cˆ1↓ cˆ2↑ + cˆ2↑ cˆ2↓ )|0 = 12 (cˆ1↑

(10.59) (10.60)

0 is just a product of two single-particle wavefunctions [one for Notice that |HF the spin-up electron and another for the spin-down electron; cf. Eq. (10.59)].

328

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Thus, we say that the wavefunction is uncorrelated and that the two electrons are unentangled. An important prediction of the Hartree–Fock theory is that if we pull the protons apart, we are equally likely to get two hydrogen atoms (H + H) or two hydrogen ions (H+ + H− ). This is not what is observed experimentally. In reality the former is far more likely. 10.3.1.2 Heitler–London Wavefunction and Valence-Bond Theory Just a year after the appearance of Schr¨odinger’s wave equation,20 Heitler and London21 proposed a theory of the chemical bond based on the new quantum mechanics. Explaining the nature of the chemical bond remains one of the greatest achievements of quantum mechanics. Heitler and London’s theory led to the valence-bond theory of the chemical bond.22 The two-site Hubbard model of H2 is the simplest context in which to study this theory. The Heitler–London wavefunction is

1 † † † † 0 = √ (cˆ1↑ cˆ2↓ − cˆ1↓ cˆ2↑ )|0 |HL 2

(10.61)

Notice that the wavefunction is correlated, as it cannot be written as a product of a wavefunction for each of the particles. Equivalently, one can say that the two electrons are entangled. The Heitler–London wavefunction overcorrects the physical errors in the Hartree–Fock molecular orbital wavefunction, as it predicts zero probability of H2 dissociating to an ionic state but is, nevertheless, a significant improvement on molecular orbital theory. 10.3.1.3 Exact Solution of the Two-Site Hubbard Model The Hilbert space of the two-site, two-electron Hubbard model is sufficiently small that we can solve it analytically; nevertheless, this problem can be greatly simplified by using the symmetry properties of the Hamiltonian. First, note that the total spin operator commutes with the Hamiltonian equation (10.55), as none of the terms in the Hamiltonian cause spin flips. Therefore, the energy eigenstates must also be spin eigenstates. For two electrons this means that all of the eigenstates will be either singlets (S = 0) or triplets (S = 1). Let us begin with the triplet states, |1m . Consider a state with two spin-up electrons, |11 . Because there is only one orbital per site, the Pauli exclusion principle ensures that there will be exactly one electron per site † † cˆ2↑ 0). The electrons cannot hop between sites, as the (i.e., |11 = cˆ1↑ presence of the other electron and the Pauli principle forbid it. Therefore, † † cˆ2σ )|11 = 11 |(−t cˆ2σ cˆ1σ )|11 = 0 for σ =↑ or ↓. There is exactly 11 |(−t cˆ1σ

† † cˆi↑ cˆi↓ cˆi↓ |11 = 0. Thus, the total one electron on each site, so 11 |U i cˆi↑ 1 energy of this state is E1 = 0. † † cˆ2↓ |0 and E1−1 The same chain of reasoning shows that |1−1 = cˆ1↓ √ = 0. It then follows from spin rotation symmetry that |10 = (1/ 2) † † † † cˆ2↓ + cˆ1↓ cˆ2↑ )|0 and E10 = 0. (cˆ1↑

HUBBARD MODEL

329

As the Hilbert space contains six states, this leaves three singlet states. A convenient basis for these is formed by state and √ the † Heitler–London † † † the two charge-transfer states: |HL = (1/ 2)(cˆ1↑ cˆ2↓ − cˆ1↓ cˆ2↑ )|0, |ct+ = √ √ † † † † † † † † cˆ1↓ + cˆ2↑ cˆ2↓ )|0, and |ct− = (1/ 2)(cˆ1↑ cˆ1↓ − cˆ2↑ cˆ2↓ )|0. Note (1/ 2)(cˆ1↑ † that |HL and |ct+ are even under “inversion” symmetry, which swaps the site labels 1 ↔ 2, whereas |ct− is odd under inversion symmetry. As the Hamiltonian is symmetric under inversion the eigenstates will have a definite parity, so |ct− is an eigenstate, with energy Ect− = U . The other two singlet states are not distinguished by any symmetry of the Hamiltonian, so they do couple, yielding the Hamiltonian matrix HL |Hˆ Hubbard |HL HL |Hˆ Hubbard |ct+ H = ct+ |Hˆ Hubbard |HL ct+ |Hˆ Hubbard |ct+ 0 −2t = (10.62) −2t U √ This has eigenvalues, ECF = 12 (U − U 2 + 16t 2 ) √ U 2 + 16t 2 ). The corresponding eigenstates are |CF = cos θ|HL + sin θ|ct+ cos θ † † † † = √ (cˆ1↑ cˆ2↓ − cˆ1↓ cˆ2↑ ) 2 sin θ † † † † + √ (cˆ1↓ cˆ1↑ + cˆ2↓ cˆ2↑ ) |0 2

and

ES 2 = 12 (U +

(10.63)

and |S2 = sin θ|HL + cos θ|ct+ sin θ † † † † cˆ2↓ − cˆ1↓ cˆ2↑ ) = √ (cˆ1↑ 2 cos θ † † † † + √ (cˆ1↓ cˆ1↑ + cˆ2↓ cˆ2↑ ) |0 (10.64) 2 √ where tan θ = (U − U 2 + 16t 2 )/4t. For U > 0, as is required physically, the state |CF is the ground state for all values of U /t. |CF is often called the Coulson–Fischer wavefunction. Inspection of Eq. (10.63) reveals that for U/t → ∞, the Coulson–Fischer state tends to the Heitler–London wavefunction, while for U/t → 0 we regain the molecular orbital picture (Hartree–Fock wavefunction). It may not be immediately obvious |HL is even under √ inversion symmetry, but this is eas√ that † † † † † † † † cˆ1↓ − cˆ2↓ cˆ1↑ )|0 = (1/ 2)(−cˆ1↓ cˆ2↑ + cˆ1↑ cˆ2↓ )|0 = |HL , ily confirmed as Iˆ|HL = (1/ 2)(cˆ2↑ ˆ where I is the inversion operator, which swaps the labels 1 and 2. †

330

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

10.3.2 Mott Insulators and the Mott–Hubbard Metal–Insulator Transition

In 1949, Mott23 asked an apparently simple question with a profound and surprising answer. As we have seen above, for the two-site Hubbard model both the molecular orbital (Hartree–Fock) and valence-bond (Heitler–London) wavefunctions are just approximations of the exact (Coulson–Fischer) wavefunction. Mott asked whether the equivalent statement is true in an infinite solid and, surprisingly, found that the answer is no. Further, Mott showed that the Hartree–Fock and Heitler–London wavefunctions predict very different properties for crystals. One of the most important properties of a crystal is its conductivity. In a metal the conductivity is high and increases as the temperature is lowered, whereas in a semiconductor or an insulator the conductivity is low and decreases as the temperature is lowered. These behaviors arise because of fundamental differences between the electronic structures of metals and semiconductors/insulators.10 In metals there are excited states at arbitrarily low energies above the Fermi energy. This means that even at the lowest temperatures, electrons can move in response to an applied electric field. In semiconductors and insulators there is an energy gap between the highest occupied electronic state and the lowest unoccupied electronic state at zero temperature. This means that a thermal activation energy must be provided if electrons are to move in response to an applied field. The difference between semiconductors and insulators is simply the size of the gap; therefore, we will not distinguish between the two below and will refer to any material with a gap as an insulator. Consider a Hubbard model at half-filling, that is, with the same number of electrons as lattice sites. For a macroscopic current to flow, an electron must move from one lattice site (leaving an empty site with a net positive charge) to a distant site (creating a doubly occupied site with a net negative charge). The net charges may move through the collective motions of the electrons. One could keep track of this by describing the movement of all the electrons, but it is easier to introduce an equivalent description where we treat the net charges as particles moving in a neutral background. Therefore, we refer to the positive charge as a holon and the negative charge as a doublon. In the ground state of valencebond theory, all of the sites are neutral and there are no holons or doublons [cf. Eq. (10.61)]. However, it is reasonable to postulate that there are low-lying charge-transfer excited states and hence thermal states that contain a few doublons and holons. These doublons and holons interact via the Coulomb potential, V (r) = −e2 /κr, where κ is the dielectric constant of the crystal. We know from the theory of the hydrogen atom (or, better, positronium; see Gasiorowicz7 ) that this potential gives rise to bound states. Therefore, one expects that in valencebond theory, holons and doublons are bound and that separating holon–doublon pairs costs a significant amount of energy. Thus, one expects the number of distant holon–doublon pairs to decrease as the temperature is lowered. Therefore, valence-bond theory predicts that a half-filled Hubbard model is an insulator. In contrast, molecular orbital theory has large numbers of holons and doublons [cf. Eq. (10.60), which suggests that for an N -site model there will be N /2 neutral sites, N /4 empty sites, and N /4 doubly occupied sites]. Mott reasoned

HUBBARD MODEL

331

that if there are many holon–doublon pairs “it no longer follows that work must necessarily be done to form some more.” This is because the holon and doublon now interact via a screened potential, V (r) = −(e2 /κr) exp(−qr), where q is the Thomas–Fermi wavevector (see Ashcroft and Mermin10 ). For sufficiently large q there will be no bound states, hence molecular orbital theory predicts that the half-filled Hubbard model is metallic. Thus, Mott argued that there are two (local) minima of the free energy in a crystal (see Fig. 10.6). One of the minima corresponds to a state with no holon–doublon pairs that is well approximated by a valence-bond wavefunction and is now known as the Mott insulating state. The second minimum corresponds to a state with many doublon–holon pairs that is well approximated by a molecular orbital wavefunction and is metallic. As we saw above, valencebond theory works well for U t and molecular orbital theory works well for U t. Therefore, in the half-filled Hubbard model we expect a Mott insulator for large U /t and a metal for small U /t. Further the “double-well” structure of the energy predicted by Mott’s argument (Fig. 10.6) suggests that there is a first-order metal–insulator phase transition, known as the Mott transition. Mott predicted that this metal–insulator transition can be driven by applying pressure to a Mott insulator. This has now been observed in a number of systems; perhaps the purest examples are the organic charge-transfer salts (BEDT-TTF)2 X.24 It is interesting to note that this infusion of chemical ideas into condensed matter physics has remained important in studies of the Mott transition. Of particular note is Anderson’s resonating valence-bond theory of superconductivity in high-temperature superconductors,26,27 which describes superconductivity in a doped Mott insulator in terms of a generalization of the valence-bond theory discussed above. This theory can also be modified to describe superconductivity on the metallic side of the Mott transition for a half-filled lattice. This theory then

Fig. 10.6 (color online) Mott’s proposal for the energy of the Hubbard model as a function of the number of holon–doublon pairs, np , at low (zero) temperature(s) for large and small U /t.

332

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

provides a good description of the superconductivity observed in (BEDT-TTF)2 X salts.28 Note that theories such as Hartree–Fock and density functional25 that do not include the strong electronic correlations present in the Hubbard model do not predict a Mott insulating state. Thus, weakly correlated theories make the qualitatively incorrect prediction that materials such as NiO, V2 O3 , La2 CuO4 , and κ-(BEDT-TTF)2 Cu[N(CN)2 ]Cl are metals, whereas experimentally, all are insulators. We will discuss a quantitative theory of the Mott transition in Section 10.3.3.2. 10.3.3 Mean-Field Theories for Crystals 10.3.3.1 Hartree–Fock Theory of the Hubbard Model: Stoner Ferromagnetism In a manner similar to that in which we constructed the Hartree–Fock meanfield theory for the two-site Hubbard model in Section 10.3.1.1, we can also construct a Hartree–Fock theory of the infinite lattice Hubbard model. Again, we simply replace the number operators in the two-body term by their mean † † values, niσ ≡ cˆiσ cˆiσ , plus the fluctuations about the mean, (cˆiσ cˆiσ − niσ ), and neglect terms that are quadratic in the fluctuations:

U

† † cˆi↑ cˆi↑ cˆi↓ cˆi↓ = U

i

† † [ni↑ + (cˆi↑ cˆi↑ − ni↑ )][ni↓ + (cˆi↓ cˆi↓ − ni↓ )]

i

U

† † [ni↓ cˆi↑ cˆi↑ + ni↑ cˆi↓ cˆi↓ − ni↑ ni↓ ]

(10.65)

i

If we make the additional approximation that niσ = nσ for all i (i.e., that the system is homogeneous and does not spontaneously break translational symmetry), we find that the Hartree-Fock Hamiltonian for the Hubbard model is † † cˆiσ cˆj σ + (U nσ − μ)cˆiσ cˆiσ − UN n↑ n↓ (10.66) Hˆ HF − μNˆ = −t ij σ

iσ

where N is the number of lattice sites and σ is the opposite spin to σ. It is convenient to write this Hamiltonian in terms of the total electron density, n = n↑ + n↓ , and the magnetization density, m = n↑ − n↓ , which gives Hˆ HF − μNˆ = −t

ij σ

† cˆiσ cˆj σ − μ

† cˆiσ cˆiσ

iσ

1 1 † (n − +U + (n + m)cˆi↓ cˆi↓ − (n + m)(n − m) 2 2 4 i Um Un NU 2 (n − m2 ) nˆ kσ − = nˆ kσ − μ − ε0k + σ 2 2 4 1

kσ

† m)cˆi↑ cˆi↑

kσ

(10.67)

333

HUBBARD MODEL

where ε0k is the dispersion relation for U = 0 and σ = ±1 =↑↓. The last term is just a constant and will not concern us greatly. The penultimate term is the “renormalized” chemical potential; that is, the chemical potential, μ, of the system with U = 0 is decreased by Un/2 due to the interactions. The first term is just the renormalized dispersion relation; in particular, we find that if the magnetization density is nonzero the dispersion relation for spin-up electrons is different from that for spin-down electrons (see Fig. 10.7). It is important to note that the Hartree–Fock approximation has reduced the problem to a single-particle (singledeterminant) theory. Thus, we can write Hˆ HF − μNˆ =

(ε∗kσ − μ∗ )nˆ kσ −

kσ

NU 2 (n − m2 ) 4

(10.68)

where ε∗kσ = ε0k − 12 σU m and μ∗ = μ − 12 U n. We can now calculate the magnetization density (magnetic moment): m = n ↑ − n↓ 0 = dε[D↑ (ε − μ∗ ) − D↓ (ε − μ∗ )] =

−∞ 0 −∞

dε D0 ε − 12 U m + 12 U n − μ − D0 ε + 12 U m + 12 U n − μ

≡ f (m) = D0 (0)U m + O(m2 )

(10.69)

where D0 (ε) = ∂N0 (ε)/∂ε|ε is the density of states (DOS; see Ashcroft and Mermin10 ) per spin for U = 0, N0 (ε) is the number of electrons (per spin species)

Fig. 10.7 (color online) Dispersion relations for spin-up and spin-down electrons in the Hartree–Fock theory of the Hubbard chain (Stoner model of ferromagnetism) with m = 0.8t/U .

334

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Fig. 10.8 How to find the self-consistent solution of Eq. (10.69). If the convergence works well, one can take α = 1, but for some problems convergence can be reached more reliably with a small value of α (often a value as small as ∼ 0.05 is used).

for which ε0k ≤ ε for U = 0, Dσ (ε) = ∂Nσ (ε)/∂ε|ε is the full interacting DOS for spin σ electrons, and Nσ (ε) is the number of electrons with spin σ for which εkσ ≤ ε. The standard way to solve mean-field theories, known as the method of self-consistent solution, is illustrated in Fig. 10.8. The major difficulty with self-consistent solutions is that it is not possible to establish whether or not one has found all of the self-consistent solutions, and therefore it is not possible to establish whether or not one has found the global minimum. Therefore, it is prudent to try a wide range of initial guesses for m (or whatever variable the initial guess is made in). Clearly, m = 0 is always a solution of Eq. (10.69), and for U D0 (0) < 1 this turns out to be the only solution. But for U D0 (0) > 1 there are additional solutions with m = 0. This is easily understood from the sketch in Fig. 10.9. Furthermore, the m = 0 solutions typically have lower energy than the m = 0 solution, and therefore for U D0 (0) > 1 the ground state is ferromagnetic. U D0 (0) ≥ 1 is known as the Stoner condition for ferromagnetism. For the Stoner condition to be satisfied, a system must have narrow bands [small t, and hence large D(0)] and strong interactions (large U ). There are three elemental ferromagnets, Fe, Co, and Ni, each of which is also metallic. As the Hartree–Fock theory of the Hubbard model predicts metallic magnetism if the Stoner criterion is satisfied and these materials have narrow bands of strongly interacting electrons, it is natural to ask whether this is a good description of these materials. However, if one extends the treatment above to finite temperatures,29 one finds that the Hartree–Fock theory of the Hubbard model does not provide a good theory of the three elemental magnets. The Curie temperatures, TC (i.e., the temperature at which the material becomes ferromagnetic) of Fe, Co, and Ni are ∼ 1000 K (see, e.g., Table 33.1 of Ashcroft and Mermin10 ). Hartree–Fock theory predicts

335

f(m)

HUBBARD MODEL

m

Fig. 10.9 (color online) Graphical solution of the self-consistency equation [Eq. (10.69)] for the Stoner model of ferromagnetism.

that Tc ∼ U m0 , where m0 is the magnetization at T = 0. If the parameters in the Hubbard model are chosen so that Hartree–Fock theory reproduces the observed m0 , the predicted critical temperature is ∼ 10, 000 K. This order-of-magnitude disagreement with experiment results from the failure of the mean-field Hartree–Fock approximation to account properly for the fluctuations in the local magnetization. This is closely related to the (incorrect) prediction of the Hartree–Fock approximation that there are no local moments above Tc . (Experimentally local moments are observed above Tc .) However, for weak ferromagnets, such as ZrZn2 (Tc ∼ 30 K) the Hartree–Fock theory of the Hubbard model provides an excellent description of the behavior observed.30 The effects missed by Hartree–Fock theory are referred to as electronic correlations. The dramatic failure of Hartree–Fock theory in Fe, Co, and Ni shows that electron correlations are very important in these materials, as do other comparisons of theory and experiment.31 However, it is important to note that mean-field theory is not limited to Hartree–Fock theory (although the terms are often, but imprecisely, used synonymously). Rather, Hartree–Fock theory is the mean-field theory of the electronic density. By constructing mean-field theories of other properties it is possible to construct mean-field theories that capture (some) electronic correlations. We now consider an example of a rather different mean-field theory. 10.3.3.2 Gutzwiller Approximation, Slave Bosons, and the Brinkman–Rice Metal–Insulator Transition In 1963, Gutzwiller32 proposed a variational wavefunction for the Hubbard model: (1 − αnˆ i↑ nˆ i↓ )|0 |G = i nˆ i↑ nˆ i↓ |0 (10.70) = exp −g i

336

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

where g = − ln(1 − α) is a variational parameter and |0 is the ground state for uncorrelated electrons. One should note that the Gutzwiller wavefunction is closely related to the coupled cluster ansatz,1 which is widely used in both physics and chemistry. Gutzwiller used this ansatz to study the problem of itinerant ferromagnetism. This leads to an improvement over the Hartree–Fock theory discussed above. However, in 1970, Brinkman and Rice33 showed that this wavefunction also describes a metal–insulator transition, now referred to as a Brinkman–Rice transition. Rather than studying this wavefunction in detail, we use an equivalent technique known as slave bosons. This has the advantage of making it clear that the Brinkman–Rice theory is just a mean-field description of the Mott transition. The i th site in a Hubbard model has four possible states: the site can be empty, |ei ; can contain a single spin σ (=↑ or ↓) electron |σi ; or can contain two electrons, |di . The Kotliar–Ruckenstein slave boson technique introduces an overcomplete description of these states: |ei = eˆi† |0i

(10.71)

† † cˆiσ |0i |σi = pˆ iσ

(10.72)

† † cˆi↓ |0i |di = dˆi† cˆi↑

(10.73)

† , and dˆi† are bosonic creation operators which correspond to empty, where eˆi† , pˆ iσ partially filled, and doubly occupied sites. |0i is a state with no fermions and no bosons on site i ; note that this is not a physically realizable state. This transformation is not only kosher, but also exact, as long as we also introduce the constraints

eˆi† eˆi +

† pˆ iσ pˆ iσ + dˆi† dˆi = 1

(10.74)

σ

which ensures that there is exactly one boson per site and therefore that each site is either empty, partially occupied, or doubly occupied, and † † cˆiσ cˆiσ − pˆ iσ pˆ iσ − dˆi† dˆi = 0

(10.75)

which ensures that if a site contains a spin σ electron, it is either singly occupied (with spin σ) or doubly occupied. Writing the Hubbard Hamiltonian in terms of the slave bosons yields † † † dˆi dˆi zˆ iσ cˆiσ cˆj σ zˆ j σ + U (10.76) Hˆ Hubbard = −t ij σ

i

where zˆ j σ = eˆj† pˆ j σ + pˆ j†σ dˆj . We now make a mean-field approximation and replace the bosonic operators by the expectation values: ei = e, pi↑ = pi↓ = p, di = d. Note that we have

HUBBARD MODEL

337

additionally assumed that the system is homogeneous (the expectation values do not depend on i ) and paramagnetic (pi↑ = pi↓ ). Therefore, the constraints reduce to |e|2 + 2|p|2 + | d|2 = 1

(10.77)

and † |p|2 + | d|2 = cˆiσ cˆiσ =

n 2

(10.78)

where n is the average number of electrons per site. This amounts only to enforcing the constraints, on average. This theory does not reproduce the correct result for U = 0. However, this deficiency can be fixed if zˆ j σ is replaced by the “renormalized” quantity, z˜ j σ , defined such that ˜zj†σ z˜ j σ =

(n/2) − | d|2 d + 1 − n + | d|2 (1 − n/2) (n/2)

(10.79)

Let us specialize to a half-filled band, n = 1. The constraints now allow us to eliminate |p|2 = 12 − |d|2 and |e|2 = |d|2 . Thus, we find that Hˆ Hubbard −t

1 2 8 (|d|

† − 2|d|4 )cˆiσ cˆj σ + UN0 |d|2

ij σ

= 18 (| d|2 − 2| d|4 )

ε0k nˆ kσ + UN0 |d|2

(10.80)

kσ

where ε0k is the dispersion for U = 0 and N is the number of lattice sites. Recall that |d|2 = di† di (i.e., |d|2 is the probability of site being doubly occupied). We construct a variational theory by ensuring that the energy is minimized with respect to |d|, which yields ∂E ε0k nˆ kσ + 2U N0 |d| = 0 = 14 (| d| − 4| d|3 ) ∂| d| kσ

(10.81)

Equation (10.81) allows one to solve the problem self-consistently (see Fig. 10.8). For small U this equation has more than one minimum and the lowest-energy state has |d|2 > 0, which corresponds to a correlated metallic state (the details of this minimum depend on ε0k ). But above some critical U the ground-state solution has |d|2 = 0, which corresponds to no doubly occupied states (i.e., the Mott insulator). Thus, the dependence of the energy on the number holon-doublon pairs (np = |d|2 ) calculated from the mean-field slave boson theory is exactly as Mott predicted on rather general grounds (shown in Fig. 10.6).

338

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

10.3.4 Exact Solutions of the Hubbard Model 10.3.4.1 One Dimension Lieb and Wu34 famously solved the Hubbard chain at T = 0 using the Bethe ansatz.35,36 Lieb and Wu found that the half-filled Hubbard chain is a Mott insulator for any nonzero U . Nevertheless, the Bethe ansatz solution is not straightforward to understand, and weighty textbooks have been written on the subject.35,36 10.3.4.2 Infinite Dimensions: Dynamical Mean-Field Theory As one increases the dimension of a lattice, the coordination number (the number of nearest neighbors for each lattice site) also increases. In infinite dimensions each lattice site has infinitely many nearest neighbors. For a classical model, mean-field theory becomes exact in infinite dimensions, as the environment (the infinite number of nearest neighbors) seen by each site is exactly the same as the mean field. However, quantum mechanically, things are complicated by the internal dynamics of the site. In the Hubbard model each site can contain zero, one, or two electrons, and a dynamic equilibrium between the different charge and spin states is maintained. However, the environment is still described by a mean field, even though the dynamics are not. Therefore, although the Hartree–Fock theory of the Hubbard model does not become exact in infinite dimensions, it is possible to construct a theory that treats the on-site dynamics exactly and the spatial correlations at the mean-field level; this theory is known as dynamic mean-field theory (DMFT).37 The importance of DMFT is not in the somewhat academic limit of infinite dimensions. Rather, DMFT has become an important approximate theory in the finite numbers of dimensions relevant to real materials.37 It has been found that DMFT captures a great deal of the physics of strongly correlated electrons. Typically, the most important correlations are on-site and therefore are described correctly by DMFT. These include the correlations that are important in metallic magnetism38 and many other strongly correlated materials.24,37 Cluster extensions to DMFT, such as cellular dynamical mean-field theory (CDMFT) and the dynamical cluster approximation (DCA), which capture some of the nonlocal correlations, have led to further insights into strongly correlated materials.39 Considerable success has also been achieved by combining DMFT with density functional theory.40 10.3.4.3 Nagaoka Point The Nagaoka point in the phase diagram of the Hubbard model is the U → ∞ limit when we add one hole to a half-filled system. Nagaoka rigorously proved41,42 that at this point the state that maximizes the total spin of the system [i.e., the state with Sz = (N − 1)/2, for an N -site lattice] is an extremum in energy (i.e., either the ground state or the highest-lying excited state). On most bipartite lattices (cf. Fig. 10.11a) one finds that this “Nagaoka state” is indeed the ground state.42 However, on frustrated lattices (Fig. 10.11b) the Nagaoka state is typically only the ground state for one sign of t.43 It is quite straightforward to understand why the Nagaoka state is often the ground state. As we are considering the U → ∞ limit there will strictly be no

HEISENBERG MODEL

339

double occupation of any sites. One therefore need only consider the subspace of states with no double occupation. As none of these states contain any potential energy (i.e., terms proportional to U ), the ground state will be the state that minimizes the kinetic energy (the term proportional to t). Thus, the ground state is the state that maximizes the magnitude of the kinetic energy with a negative sign. In the Nagaoka state all of the electrons align, which means that the holon can hop unimpeded by the Pauli exclusion principle, thus maximizing the magnitude of the kinetic energy. It is a simple matter to check whether this is the ground state or the highest-lying excited state, as we just compare the energy of the Nagaoka state with that of any other state satisfying the constraint of no double occupation. Nagaoka’s rigorous treatment has not been extended to doping by more than one hole and it remains an outstanding problem to further understand this interesting phenomenon, which shares important features with the magnetism observed in the elemental magnets38 and many strongly correlated materials.43 10.4 HEISENBERG MODEL

Like the Stoner ferromagnetism we discussed above in the context of the Hartree–Fock solution for the Hubbard model (Section 10.3.3.1) and Hund’s rules (which we discuss in Section 10.5.2), the Heisenberg model is an important paradigm for understanding magnetism. The Heisenberg model does not provide a realistic description of the three elemental ferromagnets (Fe, Co, and Ni) as they are metals, whereas the Heisenberg model only describes insulators. However, as we will see in Section 10.4.3, the Heisenberg model is a good description of Mott insulators such as La2 CuO4 (the parent compound of the high-temperature superconductors) and κ-(BEDT-TTF)2 Cu[N(CN)2 ]Cl (the parent compound for the organic superconductors). The Heisenberg model also plays an important role in the valence-bond theory of the chemical bond.44 In the Heisenberg model one assumes that there is a single (unpaired) electron localized at each site and that the charge cannot move. Therefore, the only degrees of freedom in the Heisenberg model are the spins of each site (the model can also be generalized to spin > 12 ). The Hamiltonian for the Heisenberg model is Hˆ Heisenberg = Jij Sˆ i · Sˆ j (10.82) ij

y † σ αβ cˆiβ is the spin operator on site i, σ = where Sˆi = (Sˆix , Sˆi , Sˆiz ) = 12 αβ cˆiα (σx , σy , σz ) is the vector of Pauli matrices, and Jij is the exchange energy between sites i and j .

10.4.1 Two-Site Model: Classical Solution

In the classical Heisenberg model one replaces the spin operator, Sˆ i , with a classical spin (i.e., a real vector, Si ). Thus, on two sites, with J12 = J , the energy of the model is

340

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS (2) EHeisenberg = J S1 · S2 = J |S1 ||S2 | cos φ

(10.83)

where φ is the angle between the two spins (vectors). The classical energy is minimized by φ = π for J > 0 and φ = 0 for J 0 the lowest-energy solution is for the two spins to point antiparallel (i.e., in opposite directions to one another); we refer to this as the antiferromagnetic solution. For J 0, we cannot optimize the energy of each bond individually. When this is the case one says that the lattice is frustrated . For a frustrated lattice with S = 12 , we expect the solution for J > (3) > −3J /4, and thus one expects the difference in 0 to have energy EHeisenberg energy between this state and the ferromagnetic state to be <JNz /4. The concept of frustration can also be generalized to itinerant systems where a similar reduction in the bandwidth of the itinerant electrons is found.43 Having outlined our expectations, let us now consider the three-site Heisenberg model more carefully. The energy is given by (3) =J EHeisenberg

ij

Si · Sj

(10.97)

HEISENBERG MODEL

345

Without loss of generality we can choose S1 = S1 (1, 0, 0), S2 = S2 (cos φ2 , sin φ2 , 0), and S3 = S3 (cos θ3 cos φ3 , cos θ3 sin φ3 , sin θ3 ). Thus, for S1 = S2 = S3 = 12 , (3) = EHeisenberg

J [cos φ2 + cos θ3 cos(φ2 − φ3 ) + cos θ3 cos φ3 ] 4

(10.98)

Physically, we seek the minimum energy, which yields the conditions (3) ∂EHeisenberg

∂θ3 (3) ∂EHeisenberg ∂φ3 (3) ∂EHeisenberg

∂φ2

=

J sin θ3 [cos(φ2 − φ3 ) + cos φ3 ] = 0 4

=

J cos θ3 [sin(φ2 − φ3 ) − sin φ3 ] = 0 4

J = − [cos θ3 sin(φ2 − φ3 ) + sin φ2 ] = 0 4

For J > 0 the global minimum is, unsurprisingly, θ3 = φ2 = φ3 = 0 (i.e., ferromagnetism). The energy of the ferromagnetic state is 3J /4. For J < 0 there are several degenerate minima, which all show the same physics. For simplicity we will just consider the minimum θ3 = 0, φ2 = 2π/3, and φ3 = 4π/3. In this solution each of the spins points 120◦ away from each of the other spins; hence, this is known as the 120◦ state. It is left as an exercise to the reader to identify the other solutions, to show that there are none with lower energy than those discussed above, and to show that all of the degenerate solutions are physically equivalent. The energy of the 120◦ state is −3J /8 and hence the energy difference between the ferromagnetic state and the 120◦ state is just 9J /8, less than we would expect (JNz /4 = 3J /2 for N = 3, z = 2) for a bipartite lattice. 10.4.5 Three-Site Model: Exact Quantum Mechanical Solution

Group theory, the mathematics of symmetry, allows one to solve the quantum spin- 12 three-site Heisenberg model straightforwardly. Unfortunately, space does not permit an introduction to the relevant group theory. Therefore, the reader who is not familiar with the mathematics is advised either to refer to one of the many excellent textbooks on the subject (e.g., Tinkham11 or Lax12 ) or, failing that, simply to check that the wavefunctions derived by the group-theoretic arguments below are indeed eigenstates. The Hamiltonian is Hˆ (3) Sˆ i · Sˆ j =J Heisenberg

ij

=J

1 ij

2

(Sˆi+ Sˆj− + Sˆi− Sˆj+ ) + Sˆiz Sˆjz

(10.99)

346

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

We begin by noting that 2 ⊗ 2 ⊗ 2 = 2 ⊕ 2 ⊕ 4† ; that is, a system formed from three spin- 12 particles will have two doublets (with twofold-degenerate spin- 12 eigenstates) and one quadruplet (with fourfold-degenerate spin- 32 eigenstates). There are only four possible quadruplet states consistent with C3 point-group symmetry‡ of the model. Each of these belongs to the A irreducible representation of C3 . They are 3/2

|ψ3/2 = |↑↑↑ 1 1/2 |ψ3/2 = √ (|↓↑↑ + |↑↓↑ + |↑↑↓) 3 1 −1/2 |ψ3/2 = √ (|↑↓↓ + |↓↑↓ + |↓↓↑) 3 −3/2

|ψ3/2 = |↓↓↓ where |αβγ = |S1z , S2z , S3z with α, β, and γ = ↑ or ↓. Each of these states has energy E = 3J /4, and they are the (degenerate) ground states for J < 0. We are left with the four doublet states. These belong to the two-dimensional E irreducible representation of C3 , and as the Hamiltonian is time-reversal symmetric, all four doublet states are degenerate. Explicitly the states are 1 1/2 |ψ1/2 = √ (|↓↑↑ + ei2π/3 |↑↓↑ + e−i2π/3 |↑↑↓) 3 1 −1/2 |ψ1/2 = √ (|↑↓↓ + ei2π/3 |↓↑↓ + e−i2π/3 |↓↓↑) 3 ˜ 1/2 = √1 (|↓↑↑ + e−i2π/3 |↑↓↑ + ei2π/3 |↑↑↓) |ψ 1/2 3 1 ˜ −1/2 = √ (|↑↓↓ + e−i2π/3 |↓↑↓ + ei2π/3 |↓↓↑) |ψ 1/2 3 Each of these states has energy E = −5J /4 and they are the (degenerate) ground states for J > 0. Thus, the energy difference between the highest spin state and the lowest spin state is 2J . From the solution to the two-site model (Section 10.4.2), we expected each of the three bonds to yield an energy difference of J between the lowest and highest spin states. Thus, the frustration has a similar effect on both the quantum and classical models (i.e., frustration lowers the energy difference between the highest spin and lowest spin states). †

In this notation the integers are the degeneracy of the state. might, reasonably, take the view that the model has either D3h or C3v . In fact, the arguments in this section go through almost identically for either of these symmetries (with appropriate changes in notation), due to the homomorphisms from these groups to C3 . We use C3 notation for simplicity.

‡ One

HEISENBERG MODEL

347

10.4.6 Heisenberg Model on Infinite Lattices

The Heisenberg model can be solved exactly in one dimension, and we discuss this further below, but not in any other finite dimension. However, in more than one dimension, physics of the Heisenberg model is typically very different from that in one dimension, so we will begin by discussing, qualitatively, the semiclassical spin-wave approximation for the Heisenberg model, which captures many important aspects of magnetism. A quantitative formulation of this theory can be found in many textbooks (e.g., Ashcroft and Mermin10 or R¨ossler29 ). In inelastic neutron scattering experiments a neutron may have its spin flipped by its interaction with the magnet; this causes a spin 1 excitation in the material. The conceptually simplest spin 1 excitation would be to flip one (spin- 12 ) spin; in a one-dimensional ferromagnetic Heisenberg model, this state has energy 2|J | greater than the ground state. However, a much lower energy excitation is a “spin wave,” where each spin is rotated a small amount from its nearest neighbors (see Fig. 10.12). In a one-dimensional ferromagnetic Heisenberg model, spin waves have excitation energies of ωk = 2|J |(1 − cos ka), where a is the lattice constant.29 Note, in particular, that the excitation energy vanishes for long-wavelength (small-k ) spin waves. This spin-wave spectrum can indeed be observed directly in neutron-scattering experiments from suitable materials,47 and the spectrum is found to be in good agreement with the predictions of the semiclassical theory in many materials. One can also quantize the semiclassical theory by making a Holstein–Primakoff transformation.29 This yields a description of the low-energy physics of the Heisenberg model in terms of noninteracting bosons, known as magnons, which have the same dispersion relation as the classical spin waves. Similar spin-wave and magnon descriptions can be constructed straightforwardly for the antiferromagnetic Heisenberg model.29 The effective low-energy physics of the one-dimensional Heisenberg model is, as noted above, rather different from the semiclassical approximation. To understand this, it is helpful to think of the Heisenberg model as a special case of the XXZ model :

(a)

(b)

Fig. 10.12 (color online) (a) Classical ground state of a ferromagnetic Heisenberg chain; (b) spin-wave excitation with wavelength λ = 1/k in the same model.

348

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

HXXZ = Jxy

y y

x (Six Si+1 + Si Si+1 ) + Jz

i

z Siz Si+1

(10.100)

i

which reduces to the Heisenberg model for Jxy = Jz = J . For Jz < Jxy < 0, the model displays an exotic quantum phase known as a Luttinger liquid . (At Jxy = Jz the model undergoes a quantum phase transition from the Luttinger liquid to an ordered phase.48 ) On the energy scales relevant to chemistry, one does not need to worry about the fact that protons and neutrons are made up of smaller particles (quarks). This is because the quarks are confined within the proton or neutron.49 Similarly, in a normal magnet it does not matter that the material is made up of spin- 12 particles (electrons). As described above, on the energy scales relevant to magnets, the spins are confined into spin-1 particles, magnons. However, magnons can be described in terms of two spin- 12 spinons, which are confined inside the magnon. In the Luttinger liquid the spinons are deconfined; that is, the spinons can move independent of one another (see Fig. 10.13). As the magnon is a composite particle made from two spinons, this is often referred to as fractionalization. A key prediction of this theory is that the spinons display a continuum of excitations in neutron-scattering experiments (as opposed to the sharp dispersion predicted for magnons). The two-spinon continuum has indeed been observed in a number of quasi-one-dimensional materials.50

(a)

(b)

(c)

(d)

Fig. 10.13 (color online) Spinons in a one-dimensional spin chain. (a) Local antiferromagnetic correlations. (b) A neutron scattering off the chain causes one spin (circled) to flip. (c,d) Spontaneous flips of adjacent pairs of spins due to quantum fluctuations allow the spinons (circled) to propagate independently. A key open question is: Can this free propagation occur in two-dimensions, or do interactions confine the spinons? (Modified from Ref. 81.)

OTHER EFFECTIVE LOW-ENERGY HAMILTONIANS

349

An open research question is: Does fractionalization occur in higher dimensions? Because of the success of spin-wave theory (implying confined spinons) in describing magnetically ordered materials, one does not expect fractionalization in materials with magnetic order. Therefore, one would like to investigate quasi-two- or three-dimensional materials whose low-energy physics is described by spin Hamiltonians (such as the Heisenberg model) but that do not order magnetically even at the lowest temperatures. Such materials are collectively referred to as spin liquids. There is a long history of theoretical contemplation of spin liquids, which suggests that frustrated magnets and insulating systems near to the Mott transition are strong candidates to display spin-liquid physics. However, evidence for real materials with spin-liquid ground states has been scarce until very recently,51 but there is now evidence for spin liquids in the triangular lattice compound κ-(BEDT-TTF)2 Cu(CN)3 ,24,52 the kagome lattice (see Fig. 10.4) compound ZnCu3 (OH)6 Cl2 ,53 and the hyperkagome lattice compound Na4 Ir3 O8 .54 It remains to be seen whether any of these materials support fractionalized excitations.

10.5 OTHER EFFECTIVE LOW-ENERGY HAMILTONIANS FOR CORRELATED ELECTRONS 10.5.1 Complete Neglect of Differential Overlap, the Pariser–Parr–Pople Model, and Extended Hubbard Models

We now consider another model for which the quantum chemistry and condensed matter physics communities have different names. These models belong to class of models known as complete neglect of differential overlap (CNDO). For a pair of orthogonal states, φ(x) and ψ(x), the ∞integral over all space of the overlap of the two wavefunctions vanishes [i.e., −∞ φ(x)ψ(x)dx = 0]. If the differential overlap vanishes, the overlap of the two wavefunctions vanishes at every point x +δ in space [i.e., limδ→0 x00 φ(x)ψ(x)dx = 0 for all x0 ]. The CNDO approximation is simply to assume that the differential overlap between all basis states is negligible. Thus CNDO implies that Vij kl = Viikk δij δkl (cf. Section 10.1.2) and the general CNDO Hamiltonian is Hˆ CNDO = −

† tij cˆiσ cˆj σ +

ij σ

Vij nˆ iσ nˆ j σ

(10.101)

ij σσ

† cˆiσ . The Pariser–Parr–Pople where Vij ≡ Viijj and the number operator nˆ iσ ≡ cˆiσ (PPP) model is the CNDO approximation in a basis that includes only the πelectrons. Often, a H¨uckel-like notation is used with Vij = γij ; thus,

Hˆ PPP =

iσ

† αi cˆiσ cˆiσ +

ij σ

† βij cˆiσ cˆj σ +

ij σσ

γij nˆ iσ nˆj σ

(10.102)

350

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

The extended Hubbard model, as with the plain Hubbard model, is typically studied in a basis with one orbital per site. Further, one often makes the approximation that Vii = U, Vij = V , if i and j are nearest neighbors and Vij = 0 otherwise. This yields Hˆ eH = −

† tij cˆiσ cˆj σ + U

ij σ

nˆ i↑ nˆ i↓ + V

i

nˆ iσ nˆ j σ

(10.103)

ij σσ

One can, of course, go beyond CNDO. The most general possible model for two identical sites with a single orbital per site is Hˆ eH2 = −

† † cˆ2σ + cˆ2σ cˆ1σ ) t − X(nˆ 1σ + nˆ 2σ ) (cˆ1σ σ

+U

nˆ i↑ nˆ i↓ + V nˆ 1 nˆ 2 + J S1 · S2

i † † † † + P (cˆ1↑ cˆ1↓ cˆ2↑ cˆ2↓ + cˆ2↑ cˆ2↓ cˆ1↑ cˆ1↓ )

(10.104)

† σ αβ cˆiβ , σ αβ is the vector of Pauli matrices, J is where nˆ i = σ nˆ iσ , Sˆ i = αβ cˆiα the direct exchange interaction, X is the correlated hopping amplitude, and P is the pair hopping amplitude. 10.5.2 Larger Basis Sets and Hund’s Rules

Thus far we have focused mainly on models with one orbital per site. Often, this is not appropriate: for example, if one were interested in chemical bonding or materials containing transition metals. Many of the models discussed in this chapter can be extended straightforwardly to include more than one orbital per site. However, while writing down models with more than one orbital per site is not difficult, these models do contain significant additional physics. Some of the most important effects are known as Hund’s rules.1 These rules have important experimental consequences, from atomic physics to biology. To examine Hund’s rules, let us consider the atomic limit (t = 0) of an extended Hubbard model with two electrons in two orbitals per site: Hˆ eH1s2o = U

nˆ μ↑ nˆ μ↓ + V0 nˆ 1 nˆ 2 + JH Sˆ 1 · Sˆ 2

(10.105)

μ

† cˆ , n where μ = 1 or 2 labels the orbitals, nˆ μσ = cˆμσ ˆ μσ , Sˆ μ = μσ ˆ μ = σn

† αβ cˆμβ , U is the Coulomb repulsion between two electrons in the same αβ cˆμα σ orbital, V0 is the Coulomb repulsion between two electrons in different orbitals, and JH is the Hund’s rule coupling between electrons in different orbitals. Notice that the Hund’s rule coupling is an exchange interaction between orbitals.

OTHER EFFECTIVE LOW-ENERGY HAMILTONIANS

351

Further, if we compare the Hamiltonian with the definition given in Eq. (10.28), we find that 3 −JH = d r1 d 3 r2 φ∗1 (r1 )φ2 (r1 )V (r1 − r2 )φ∗2 (r2 )φ1 (r2 ) 3 ∼ d r1 d 3 r2 |φ1 (r1 )|2 V (r1 − r2 )|φ2 (r2 )|2 ≥0

(10.106)

as V (r1 − r2 ) is positive semidefinite. Therefore, typically, JH < 0; that is, the Hund’s rule coupling favors the parallel alignment of the spins in a half-filled system. U is the largest energy scale in the problem, so, for simplicity, let us consider the √ case U → ∞. For JH = 0 there are four degenerate ground states: a singlet, (1/ 2)(| ↑↓ − | ↓↑) (where the first arrow refers to the spin of the electron in orbital 1 and the √ second arrow refers to the spin in orbital 2), and a triplet: | ↑↑, | ↓↓, and (1/ 2)(| ↑↓ − | ↓↑). But for J > 0 the energy of the triplet states is JH lower than that of the singlet state. Indeed, even if we relax the condition U → ∞, the triplet state remains lower in energy than the singlet state, as physically we require that U > JH . One can repeat this argument for any number of electrons in any number of orbitals, and one always finds that the highest spin state has the lowest energy. However, if one studies models with more than one site and moves away from the atomic limit (t = 0), one finds that there is a subtle competition between the kinetic (hopping) term and the Hund’s rule coupling which means that the high spin state is not always the lowestenergy state. Many such interesting effects can be understood on the basis of a two-site generalization of this two-orbital model.55 10.5.3 Ionic Hubbard Model

Thus far we have assumed that all sites are identical. Of course, this is not always true in real materials. In a compound, more than one species of atom may contribute to the low-energy physics,56 or different atoms of the same species may be found at crystallographic distinct sites.43,57 A simple model that describes this situation is the ionic Hubbard model: † cˆiσ cˆj σ + U nˆ i↑ nˆ i↓ + εi nˆ iσ (10.107) Hˆ iH = −t ij σ

i

iσ

where εi = tii is the site energy, which will be taken to be different on different sites. Note that in the standard form of the ionic Hubbard model, all sites are assumed to have the same U . An important application of the ionic Hubbard model is in describing transition metal oxides.56 Typically, εi is larger on the transition metal site than on the oxygen site; therefore, the oxygen orbitals are nearly filled. This means that there

352

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

is a low hole density in the oxygen orbitals and hence that electronic correlations are less important for the electrons in the oxygen orbitals than for electrons in transition metal orbitals. If the difference between εi on the oxygen sites and εi on the transition metal sites is large enough, the oxygen orbitals are completely filled in all low-energy states and therefore need not feature in the low-energy description of the material. However, just because the oxygen orbitals do not appear explicitly in the effective low-energy Hamiltonian of the material does not mean that the oxygen does not have a profound effect on low-energy physics. To see this, consider a toy model with two metal sites (labeled 1 and 2) and one oxygen site (labeled O), whose Hamiltonian is Hˆ iH3 = −t

σ

† † † † (cˆ1σ cˆOσ + cˆOσ cˆ1σ + cˆ2σ cˆOσ + cˆOσ cˆ2σ ) +

iσ

2

(nˆ 1σ + nˆ 2σ − nˆ Oσ ) (10.108)

as sketched in Fig. 10.14, which is just the ionic Hubbard model with U = 0 and = ε1 − εO = ε2 − εO > 0. With three electrons in the system and t = 0, the ground state is fourfold degenerate, the ground states have two electrons on the O atom and the other electron on one of the metal atoms. If we now consider finite, but small t , we can construct a perturbation in t/. One √ theory † † † † finds that there is a splitting between the bonding, (1/ 2)(cˆ1σ + cˆ2σ )cˆO↑ cˆO↓ |0 √ † † † † and antibonding, (1/ 2)(cˆ1σ − cˆ2σ )cˆO↑ cˆO↓ |0, states. The processes that lead to this splitting are sketched in Fig. 10.15. Therefore, our effective low-energy Hamiltonian is a tight-binding model involving just the metal atoms: Hˆ eff = −t ∗

σ

† † (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ )

(10.109)

where, to second order in t/, the effective metal-to-metal hopping integral is given by t∗ = −

t2

(10.110)

Fig. 10.14 (color online) Toy model for a transition metal oxide, Hamiltonian equation (10.108), with two transition metal sites (1 and 2) and a single oxygen site (O).

HOLSTEIN MODEL

353

E=E0

t

E–E0=

t

E=E0

Fig. 10.15 (color online) Processes described by Hamiltonian equation (10.108) that give rise to the effective hopping integral between the two transition metal atom sites.

Note that even though t is positive, t ∗ < 0 (or, equivalently, β∗ > 0), in contrast to our naive expectation that hopping integrals are positive (β < 0; cf. Section 10.2).

10.6 HOLSTEIN MODEL

So far we have assumed that the nuclei or ions form a passive background through which the electrons move. However, in many situations this is not the case. Atoms move and these lattice/molecular vibrations interact with the electrons via the electron–phonon/vibronic interaction. One of the simplest models of such effects is the Holstein model, which we discuss below. Electron–vibration interactions play important roles across science. In physics, electron–phonon interactions can give rise to superconductivity,58 spin and charge density waves,59 polaron formation,60 and piezoelectricity.58 In chemistry, vibronic interactions affect electron-transfer processes,61 Jahn–Teller effects, spectroscopy, stereochemistry, activation of chemical reactions, and catalysis.62 In biology the vibronic interactions play important roles in photoprotection,63 photosynthesis,64 and vision.65 It is therefore clear that one of the central tasks for condensed matter theory and theoretical chemistry is to describe electron–vibration interactions.

354

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

In general, one may write the Hamiltonian of a system of electrons and nuclei as Hˆ = Hˆ e + Hˆ n + Hˆ en

(10.111)

where Hˆ e contains those terms that affect only the electrons, Hˆ n contains those terms that affect only the nuclei, and Hˆ en describes the interactions between the electrons and the nuclei. Hˆ e might be any of the Hamiltonians we have discussed above. However, for the Holstein model one assumes a tight-binding form for Hˆ e . In the normal-mode approximation,62 which we will make, one treats molecular and lattice vibrations as harmonic oscillators (cf. Section 10.1.1). As the ions carry a charge, any displacement of the ions from their equilibrium positions will change the potential felt by the electrons. The Holstein model assumes that each vibrational mode is localized on a single site. For this to be the case, the site must have some internal structure (i.e., the site cannot correspond to a single atom). Therefore, the Holstein model is more appropriate for molecular solids than for simple crystals. For small displacements, xiμ , of the μth mode of the i th lattice site, we can perform a Taylorexpansion in the dimensionless normal coordinate of the vibration, Qiμ = xiμ miμ ωiμ /, where miμ and ωiμ are, respectively, the mass and the frequency of the μth mode on the i th site, and we find that ∂tij † Qiμ (cˆiσ cˆj σ + cˆj†σ cˆiσ ) + · · · . (10.112) Hˆ en = ∂Qiμ ij σμ

In the Holstein model one assumes that the derivative vanishes for i = j . We may quantize the vibrations in the usual way (cf. Section 10.1.1), which yields † † Hˆ en = giμ (aˆ iμ + aˆ iμ )cˆiσ cˆiσ (10.113) iσμ (†) destroys (creates) a quantized vibration in the μth mode on the i th where aˆ iμ

† site, giμ = 2−1/2 ∂tii /∂Qiμ , and Hˆ n = iμ ωiμ aˆ iμ aˆ iμ . Thus,

Hˆ Holstein = −t

ij σ

† cˆiσ cˆj σ +

† ωiμ aˆ iμ aˆ iμ +

iμ

† † giμ (aˆ iμ + aˆ iμ )cˆiσ cˆiσ

iσμ

(10.114) 10.6.1 Two-Site Holstein Model

If we assume that there is only one electron and one mode per site, the Holstein model simplifies to † † † † (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ ) + ω aˆ i aˆ i + g (aˆ i + aˆ i )nˆ i Hˆ Holstein = −t σ

i

i

(10.115)

HOLSTEIN MODEL

355

† on two symmetric sites, where nˆ i = σ nˆ iσ = σ cˆiσ cˆiσ . It is useful to change the basis in√which we consider the phonons to that of in-phase (symmetric), sˆ = √ (aˆ 1 + aˆ 2 )/ 2, and out-of-phase (antisymmetric), bˆ = (aˆ 1 − aˆ 2 )/ 2, vibrations. In this basis one finds that Hˆ Holstein = Hˆ s + Hˆ be

(10.116)

g Hˆ s = ωˆs † sˆ + √ (ˆs † + sˆ )(nˆ 1 + nˆ 2 ) 2

(10.117)

where

and Hˆ be = −t

σ

g † † ˆ nˆ 1 − nˆ 2 ) (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ ) + ωbˆ † bˆ + √ (bˆ † + b)( 2

(10.118)

Note that nˆ 1 + nˆ 2 = N , the total number of electrons in the problem. As N is a constant of the motion, the dynamics of the electrons cannot affect the symmetric vibrations, and vice versa. Hence all of the interesting effects are contained in Hˆ be and we need only study this Hamiltonian below. 10.6.1.1 Diabatic Limit, –hω t In the diabatic limit the vibrational modes are assumed to adapt themselves instantaneously to the particle’s position. Thus,

g ˆ nˆ 1 − nˆ 2 ) = ωbˆ † bˆ ± √g (bˆ † + b) ˆ ωbˆ † bˆ + √ (bˆ † + b)( 2 2

(10.119)

The plus sign is relevant when the electron is located on site 1 and the minus sign is relevant when the electron is on site 2. We now introduce the displaced oscillator transformation, 1 g † = bˆ † ± √ bˆ± 2 ω

(10.120)

Therefore, we find that Hˆ be = −t

σ

† † † ˆ † ˆ (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ ) + ω(bˆ+ b+ + bˆ− b− ) −

g2 2 ω2

(10.121)

It is important to note that the operators bˆ+ and bˆ− satisfy the same commutation relations as the bˆ operator; therefore, they describe bosonic excitations. We define the ground states of the displaced oscillators by bˆ− |0− = 0 and bˆ+ |0+ = 0. Therefore, ˆ + = − √1 g |0+ b|0 2 ω

(10.122)

356

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

and hence √

2g |0+ bˆ− |0+ = − ω

(10.123)

Similarly, √ bˆ+ |0− =

2g |0− ω

(10.124)

√ that is, |0± is an eigenstate of bˆ∓ with eigenvalue ∓ 2g/ω. The eigenstates of bosonic annihilation operators are known as coherent states.66 Equations (10.122) to (10.124) therefore show that the ground state of one of the bˆ± operators may be written as a coherent state of the other operator67 : √ 2g 1 ˆ † ± b∓ |0∓ |0± = exp − ω 2

(10.125)

Therefore, g2 0+ |0− = exp − 2 2 ω

(10.126)

which is known as the Franck–Condon factor. The Franck–Condon factor describes the fact that in the diabatic limit, the bosons cause a “drag” on the electronic hopping. That is, we can describe the solution of the diabatic limit in terms of an effective two-site tight-binding model if we replace t by g2 t ∗ = t0+ |0− = t exp − 2 2 ω

(10.127)

Thus, the hopping integral is renormalized by the interactions of the electron with the vibrational modes (cf. Section 10.7). This renormalization is also found in the solution for an electron moving on a lattice in the diabatic limit. In this context the exponential factor is known as polaronic band narrowing.60 The exponential factor results from the small overlap of the two displaced operators and may be thought of as an increase in the effective mass of the electron. – ω t We begin by noting that as there is only one 10.6.1.2 Adiabatic Limit, h electron, the spin of the electron only leads to a trivial twofold degeneracy and therefore can be neglected without loss of generality. A useful notational change † † cˆ1σ − cˆ2σ cˆ2σ is to introduce a pseudospin notation where we define σˆ z = cˆ1σ

HOLSTEIN MODEL

357

† † and σˆ x = cˆ1σ cˆ2σ + cˆ2σ cˆ1σ . Therefore, the one-electron two-site Holstein model Hamiltonian becomes

g ˆ σz Hˆ sb = −t σˆ x + ωbˆ † bˆ + √ (bˆ † + b)ˆ 2

(10.128)

which is often referred to as the spin-boson model . Let us now replace the bosonic operators by position and momentum operators for the harmonic oscillator defined as ˆ† ˆ (b + b) (10.129) xˆ = 2mω and pˆ = i

mω ˆ † ˆ (b − b) 2

(10.130)

Therefore, mω 1 pˆ 2 2 ˆ + mωxˆ + g xˆ σˆ z Hsb = −t σˆ x + 2m 2

(10.131)

The adiabatic limit is characterized by a sluggish bosonic bath that responds only very slowly to the motion of the electron (i.e., pˆ 2 /2m → 0), which it is often helpful to think of as the m → ∞ limit. Further, in the adiabatic limit the Born–Oppenheimer approximation2,67 holds, which implies that the total wavefunction of the system, |, is a product of a electronic (pseudospin) wavefunction, |φe , and a vibrational (bosonic) wavefunction, |ψv (i.e., | = |φe ⊗ |ψv ). Therefore, the harmonic oscillator will be in a position eigenstate and we may replace the position operator, x, ˆ by a classical position x , yielding

1 mω x σˆ z + mωx 2 Hˆ sb = −t σˆ x + g 2 mω 1 g x −t = + mωx 2 x −t −g mω 2

(10.132) (10.133)

where in the second line we have simply switched to the matrix representation of the Pauli matrices. This is easily solved and one finds that the eigenvalues are 1 E± = mωx 2 ± 2 ≈

mω 2 2 g x t2 +

mωg 2 x 2 1 mωx 2 ± ±t 2 2t

(10.134) (10.135)

358

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Fig. 10.16 (color online) Energies of the ground and excited states for a single electron in the two-site Holstein model in the adiabatic weak coupling limit (t g ω), calculated from Eq. (10.134). x is the position of the harmonic oscillator describing out-of-phase vibrations.

where Eq. (10.135) holds in the weak-coupling limit, gx t. We plot the variation of these eigenvalues with x in this limit in Fig. 10.16. Notice that for the electronic ground state, E− , the lowest-energy states have x = 0. This is an example of spontaneous symmetry breaking,68 as the ground state of a system has a lower symmetry than the Hamiltonian of the system. Thus, the system must “choose” either the left well or the right well (but not both) in order to minimize its energy.

10.7 EFFECTIVE HAMILTONIAN OR SEMIEMPIRICAL MODEL?

The models discussed in this chapter are generally known as either empirical or semiempirical models in a chemical context and as effective Hamiltonians in the physics community. Here the difference is not just nomenclature but is also indicative of an important difference in the epistemological status awarded to these models by the two communities. In this section I describe two different attitudes toward semiempirical models and effective Hamiltonians and discuss the epistemological views embodied in the work of two of the greatest physicists of the twentieth century. 10.7.1 Diracian Worldview

Paul Dirac famously wrote69 that “the fundamental laws necessary for the mathematical treatment of a large part of physics and the whole of chemistry are thus

EFFECTIVE HAMILTONIAN OR SEMIEMPIRICAL MODEL?

359

completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved.” There is clearly a great deal of truth in the statement. In solid-state physics and chemistry we know that the Schr¨odinger equation provides an extraordinarily accurate description of the phenomena observed. Gravity, the weak and strong nuclear forces, and relativistic corrections are typically unimportant; thus, all of the interactions boil down to nonrelativistic electromagnetic effects. Dirac’s world view is realized in the ab initio approach to electronic structure, wherein one starts from the Hartree–Fock solution to the full Schr¨odinger equation in some small basis set. One then adds in correlations via increasingly complex approximation schemes and increases the size of the basis set, in the hope that with a sufficiently large computer one will find an answer that is “sufficiently close” to the exact solution (full CI in an infinite complete basis set). In the last few decades rapid progress has been made in ab initio methods due to an exponential improvement in computing technology, methodological progress, and the widespread availability of implementations of these methods.70 However, this progress is unsustainable: The complexity recognized by Dirac eventually limits the accuracy possible from ab initio calculations. Indeed, solving the Hamiltonian given in Eq. (10.24) is known to be computationally difficult. Feynman proposed building a computer that uses the full power of quantum mechanics to carry out quantum simulations.71 Indeed, the simplest of all quantum chemical problems, the H2 molecule in a minimal basis set, has been solved on a prototype quantum computer.72 But while even a rather small scale quantum computer (containing just a few hundred qubits72 ) would provide a speed-up over classical computation, it is believed that the solution of Hamiltonian (10.24) remains difficult even on a quantum computer [i.e., it is believed that even a quantum computer could not solve Hamiltonian equation (10.24) in a time that grows only polynomially with the size of the system73 ]. Further, simple extensions of these arguments provide strong reasons to believe that there is no efficiently computable approximation to the exact functional in density functional theory.73 Therefore, it appears that the equations will always remain “too complex to be solved” directly. This suggests that semiempirical models will always be required for large systems. 10.7.2 Wilsonian Project

Typically, one is only interested in a few low-energy states of a system, perhaps the ground state and the first few excited states. Therefore, as long as our model gives the correct energies for these low-energy states, we should regard it as successful. This apparently simple realization, particularly as embodied by Wilson’s renormalization group,74 has had profound implications throughout modern physics from high-energy particle physics to condensed matter physics. The basic idea of renormalization is remarkably simple. Imagine starting with some system that has a large number of degrees of freedom. As we have noted,

360

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

for practical purposes we care only about the lowest-energy states. Therefore, one might be tempted to simplify the description of the system by discarding the highest-energy states. However, simply discarding such states will cause a shift in the low-energy spectrum. Therefore, one must remove the high-energy states that complicate the description and render the problem computationally intractable in such a way as to preserve the low-energy spectrum. This is often referred to as “integrating out” the high-energy degrees of freedom (because of the way this process is carried out in the path-integral formulation of quantum mechanics75 ). Typically, integrating out the high-energy degrees of freedom causes the parameters of the Hamiltonian to “flow” or “run” (i.e., change their values). When this happens, one says that the parameters are renormalized. A simple example is the Coulomb interaction between the two electrons in a neutral helium atom. For simplicity, let’s imagine trying to calculate just the ground-state energy. We begin by analyzing the problem in the absence of a Coulomb interaction between the two electrons. In the ground state both electrons occupy the 1s orbital. We would like to work in as small a basis set as possible. The simplest approach is just to work in the minimal basis set, which in this case is just the two 1s spin-orbitals, φ1sσ (r). The total energy of a He atom neglecting the interelectron Coulomb interaction is −108.8 eV (relative to the completely ionized state). Now we restore the Coulomb repulsion between electrons. A simple question is: How much does this change the total energy of the He atom? In the minimal basis set the solution seems straightforward: 1s2 |V |1s2 =

∞ −∞

d 3 r1

∞ −∞

d 3 r2

e2 |φ1s↑ |2 |φ1s↓ |2 4πε0 |r1 − r2 |

34.0 eV

(10.136)

Therefore, it is tempting to conclude that we can model the He atom by a one-site Hubbard model with U = 1s2 |V |1s2 . However, this yields a total energy for the He atom of −74.8 eV, which is not particularly close to the experimental value of −78.975 eV.7 Let us then continue to consider the problem in the basis set of the hydrogenic atom, which is complete due to the spherical symmetry of the Hamiltonian. One can now straightforwardly carry out a perturbation theory around the noninteracting electron solution, where we take H0 =

2

i=1

2 ∇i2 e2 − − 2m πε0 |ri |

(10.137)

and H1 =

e2 4πε0 |r1 − r2 |

(10.138)

EFFECTIVE HAMILTONIAN OR SEMIEMPIRICAL MODEL?

361

A detailed description of this perturbation theory is given in Chapter 18 of Gasiorowicz.7 However, for our discussion, the key point is that in this perturbation theory, the term 1s2 |V |1s2 is simply the first-order correction to the ground-state energy. It is therefore clear why the minimal basis set gives such a poor result: It ignores all the higher-order corrections to the total energy. The failure of the simple minimal basis set calculation does not, however, mean that the effective Hamiltonian approach also fails, despite the fact that the effective Hamiltonian is also in an extremely small basis set. Rather, one must realize that as well as the first-order contributions, U also contains contributions from higher orders in perturbation theory. It is therefore possible, although extremely computationally demanding, to calculate the parameters for effective Hamiltonians from this type of perturbation theory.76 A more promising approach, which has been applied to a number of molecular crystals,77,78 is to use atomistic calculations to parameterize an effective Hamiltonian. For example, density functional theory gives quite reasonable values for the total energy of the ground state of many molecules. Therefore, one approach to calculating the Hubbard U is to calculate the ionization energy, I = E0 (N − 1) − E0 (N ), and the electron affinity, A = E0 (N ) − E0 (N + 1), of the molecule, where E0 (n) is the ground-state energy of the molecule when it contains n electrons and N is the filling corresponding to a half-filled band. One finds that U = I − A = E0 (N + 1) + E0 (N − 1) − 2E0 (N ). A simple way to see this is that if we assume the molecule is neutral when it contains N electrons, then U corresponds to the energy difference in the charge disproportionation reaction 2M M+ + M− for two well-separated molecules, M. A more extensive discussion of this approach is given by Scriven et al.77 It is worth noting that we have actually carried out this program of parameterizing effective Hamiltonians three times in the discussion above. In Section 10.4.3 we showed that the Heisenberg model is an effective low-energy model for the half-filled Hubbard model in the limit t/U → 0. In Section 10.5.3 we derived an effective tight-binding model that involved only the metal sites from an ionic Hubbard model of a transition metal oxide. Finally, in Section 10.6.1.1 we showed that vibronic interactions lead to an effective tight binding model describing the low-energy physics of the Holstein model in the diabatic limit, and that in this model the quasiparticles (electron-like excitations) are polarons, a bound state of electrons and vibrational excitations with a mass enhanced over that of the bare electron. However, to date, the most important method for parameterizing effective Hamiltonians has been to fit the parameters to a range of experimental data—whence the name semiempirical . Of course, experimental data contain all corrections to all orders; therefore, this is indeed an extremely sensible thing to do. But it is important to understand that empiricism is not a dirty word. Indeed, empiricism is what distinguishes science from other belief systems. Further, this empirical approach is exactly the approach that the mathematics tells one to take. It is also important to know that no quantum chemical or solid-state calculation is truly ab initio—the nuclear and electronic masses and the charge

362

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

on the electron are all measured rather than calculated. Indeed, the modern view of the “standard model” of particle physics is that it, too, is an effective low-energy model.49 For example, in quantum electrodynamics (QED), the quantum field theory of light and matter, the bare charge on the electron is, for all practical purposes, infinite. But the charge is renormalized to the value seen experimentally in a manner analogous to the renormalization of the Hubbard U of He discussed above. Therefore, as we do not at the time of writing know the correct mathematical description of processes at higher energies, all of theoretical science should, perhaps, be viewed as the study of semiempirical effective low-energy Hamiltonians.79 Finally, the most important point about effective Hamiltonians is that they promote understanding. Ultimately, the point of science is to understand the phenomena we observe in the world around us. Although the ability to perform accurate numerical calculations is important, we should not allow this to become our main goal. The models discussed above provide important insights into the chemical bond, magnetism, polarons, the Mott transition, electronic correlations, the failure of mean-field theories, and so on. All of these effects are much more difficult to understand simply on the basis of atomistic calculations. Further, many important effects seen in crystals, such as the Mott insulator phase, are not found methods such as density functional theory or Hartree–Fock theory, while post-Hartree–Fock methods are not practical in infinite systems. Thus effective Hamiltonians have a vital role to play in developing the new concepts that are required if we are to understand the emergent phenomena found in molecules and solids.80 Acknowledgments

I would like to thank Balazs Gy¨orffy, who taught me that “you can’t not know” many of things discussed above. I also thank James Annett, Greg Freebairn, Noel Hush, Anthony Jacko, Bernie Mostert, Seth Olsen, Jeff Reimers, Edan Scriven, Mike Smith, Eddy Yusuf, and particularly, Ross McKenzie, for many enlightening conversations about the topics discussed and for showing me that chemistry is a beautiful and rich subject with many simplifying principles. I would also like to thank Bernd Braunecker, Karl Chan, Sergio Di Matteo, Anthony Jacko, Ross McKenzie, Seth Olsen, Eddie Ross, and Kristian Weegink for their insightful comments on an early draft of the chapter. I am supported by a Queen Elizabeth II fellowship from the Australian Research Council (project DP0878523).

REFERENCES 1. Fulde, P. Electron Correlations in Molecules and Solids, Springer-Verlag, Berlin, 1995. 2. Schatz, G. C.; Ratner, M. A. Quantum Mechanics in Chemistry, Prentice Hall, Englewoods Cliffs, NJ, 1993.

REFERENCES

363

3. Mahan, G. D.; Many-Particle Physics, Kluwer Academic, New York, 2000. 4. Goldstein, H.; Poole, C.; Safko, J. Classical Mechanics, Addison-Wesley, Reading, MA, 2002. 5. Atkins, P.; de Paula, J. Atkins’ Physical Chemistry, Oxford University Press, Oxford, UK, 2006. 6. See, e.g., Rae, A. I. M. Quantum Mechanics, Institute of Physics Publishing, Bristol, UK, 1996. 7. See, e.g., Gasiorowicz, S. Quantum Physics, Wiley, Hoboken, NJ, 2003. 8. Jordan, P.; Wigner, E. Z. Phys. 1928, 47 , 631–651. 9. Lowe, J. P.; Peterson, K. A. Quantum Chemistry, Elsevier, Amsterdam, 2006. 10. Ashcroft, N. W.; Mermin, N. D. Solid State Physics, Holt, Rinehart and Winston, New York, 1976. 11. Tinkham, M. Group Theory and Quantum Mechanics, McGraw-Hill, New York, 1964. 12. Lax, M. Symmetry Principles in Solid State and Molecular Physics, Wiley, New York, 1974. 13. McWeeny, R. Coulson’s Valence, Oxford University Press, Oxford, UK, 1979. 14. Brogli, F.; Heilbronner, E. Theor. Chim. Acta 1972, 26 , 289–299. 15. See, e.g., Arfken, G. Mathematical Methods for Physicists, 3rd ed., Academic Press, Orlando, FL, 1985. 16. Mandl, F. Statistical Physics, Wiley, Chichester, UK, 1998. 17. See pp. 799–800 in Ref. 15. 18. (a) Castro Neto, A. H.; Guinea, F.; Peres, N. M. R.; Novoselov, K. S.; Geim, A. K. Rev. Mod. Phys. 2009, 81 , 109–162. (b) Castro Neto, A. H.; Guinea, F.; Peres, N. M. R. Phys. World 2006, 19 , 33–37. 19. (a) Novoselov, K. S.; Geim, A. K.; Morozov, S. V.; Jiang, D.; Zhang, Y.; Dubonos, S. V.; Gregorieva, I. V.; Firsov, A. A. Science 2004, 306 , 666–669. (b) Choucair, M.; Thordarson, P.; Stride, J. A. Nature Nanotechnol . 2009, 4 , 30–33. 20. Schr¨odinger, E. Ann. Phys. 1926, 79 , 361–428. 21. Heitler, W.; London, F. Z. Phys. 1927, 44 , 455–472. 22. Pauling, L. The Nature of the Chemical Bond and the Structure of Molecules and Crystals, Cornell University Press, Ithaca, NY, 1960. 23. Mott, N. F. Proc. R. Soc. A 1949, 62 , 416–422. 24. Powell, B. J.; McKenzie, R. H. J. Phys. Condens. Matter 2006, 18 , R827–R865. 25. Cohen, A. J.; Mori-Sanchez, P.; Yang, W. T. Science 2008, 321 , 792–794. 26. (a) Anderson, P. W. Science 1987, 235 , 1196–1198. (b) Zhang, F. C.; Gross, C.; Rice, T. M.; Shiba, H. Supercond. Sci. Technol . 1988, 1 , 36–46. 27. Anderson, P. W. Phys. Today 2008, 61 (4), 8–9. 28. Powell, B. J.; McKenzie, R. H. Phys. Rev. Lett. 2005, 94 , 047004; Gan, J. Y.; Chen, Y.; Su, Z. B.; Zhang, F. C. Phys. Rev. Lett. 2005, 94 , 067005; Liu, J.; Schmalian, J.; Trivedi, N. Phys. Rev. Lett. 2005, 94 , 127003. 29. R¨ossler, U. Solid State Theory, Springer-Verlag, Berlin, 2004. 30. Mohn, P.; Wohlfarth, E. P. J. Magn. Magn. Mater. 1987, 68 , L283–L285. 31. Jacko, A. C.; Fjærestad, J. O.; Powell, B. J. Nature Phys. 2009, 5 , 422–425.

364

32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.

45. 46. 47.

48. 49. 50. 51. 52. 53. 54. 55. 56. 57.

58. 59.

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Gutzwiller, M. C. Phys. Rev. Lett. 1963, 10 , 159–162. Brinkmann, W. F.; Rice, T. M. Phys. Rev. B 1970, 2 , 4302–4304. Lieb, E. H.; Wu, F. Y. Phys. Rev. Lett. 1968, 20 , 1445–1448. Essler, F. H. L.; Frahm, H.; G¨ohmann, F.; Kl¨umper, A.; Korepin, V. E. The OneDimensional Hubbard Model , Cambridge University Press, Cambridge, UK, 2005. Tsvelik, A. M. Quantum Field Theory in Condensed Matter Physics, Cambridge University Press, Cambridge, UK, 1996. Kotliar, G.; Vollhardt, D. Phys. Today 2004, 57 (3), 53–59. Kollar, M.; Strack, R.; Vollhardt, D. Phys. Rev. B 1996, 53 , 9225–9231. Maier, T.; Jarrell, M.; Pruschke, T.; Hettler, M. H. Rev. Mod. Phys. 2005, 77 , 1027–1080. Kotliar, G.; Savrasov, S. Y.; Haule, K.; Oudovenko, V. S.; Parcollet, O.; Marianetti, C. A. Rev. Mod. Phys. 2006, 78 , 865–951. Nagaoka, Y. Phys. Rev . 1966, 145 , 392–405. Tian, G. J. Phys. A 1990, 23 , 2231–2236. Merino, J.; Powell, B. J.; McKenzie, R. H. Phys. Rev. B 2006, 73 , 235107. Shaik, S.; Hiberty, P. C. Valence bond theory: its history, fundamentals, and applications—a primer. In Reviews in Computational Chemistry, Lipkowitz, K. B., Larter, R., and Cundari, T. R., Eds., Wiley-VCH, Hoboken, NJ, 2004, pp. 1–100. Sakurai, J. J. Modern Quantum Mechanics, Addison-Wesley, Reading, MA, 1994. Chao, K. A.; Spałek, J.; Ole´s, A. M. J. Phys. C 1977, 10 , L271–L276. Brockhouse, B. N. Slow neutron spectroscopy and the grand atlas of the physical world. In Nobel Lectures in Physics, 1991–1995 , Ekspong, G., Ed.; World Scientific, Singapore, 1997. Also available at http://nobelprize.org/nobel_prizes/physics/ laureates/1994/brockhouse-lecture.html. Zaliznyak, I. A. Nature Mater. 2005, 4 , 273–275. Griffiths, D. Introduction to Elementary Particles, Wiley-VCH, Weinheim, Germany, 2008. (a) Coldea, R.; Tennant, D. A.; Tylczynski, Z. Phys. Rev. B 2003, 68 , 134424. (b) Lake, B.; Tennant, D. A.; Frost, C. D.; Nagler, S. E. Nature Mater. 2005, 4 , 329–334. Lee, P. A. Science 2008, 321 , 1306–1307. Shimizu, Y.; et al. Phys. Rev. Lett. 2003, 91 , 107001. Helton, J.; et al. Phys. Rev. Lett. 2007, 98 , 107204. Okamoton, Y.; et al. Phys. Rev. Lett. 2007, 99 , 137207. Raczkowski, M.; Fr´esard, R.; Ole´s, A. M. J. Phys. Condens. Matter 2006, 18 , 7449–7469. Sarma, D. D. J. Solid State Chem. 1990, 88 , 45–52. (a) Merino, J.; Powell, B. J.; McKenzie, R. H. Phys. Rev. B 2009, 79 , 161103(R). (b) Merino, J.; McKenzie, R. H.; Powell, B. J. Phys. Rev. B 2009, 80 , 045116. (c) Powell, B. J.; Merino, J.; McKenzie, R. H. Phys. Rev. B 2009, 80 , 085113. See, e.g., Ziman, J. M. Electrons and Phonons, Oxford University Press, Oxford, UK, 1960. For a review, see Gr¨uner, G. Density Waves in Solids, Perseus Publishing, Cambridge, UK, 1994.

REFERENCES

365

60. See, e.g., Alexandrov, A. S.; Mott, N. F. Polarons and Bipolarons, World Scientific, Singapore, 1995. 61. For a review, see Marcus, R. A. Rev. Mod. Phys. 1993, 65 , 599–610. 62. See, e.g., Bersuker, I. B. The Jahn–Teller Effect and Vibronic Interactions in Modern Chemistry, Plenum Press, New York, 1984. 63. (a) Olsen, S.; Riesz, J.; Mahadevan, I.; Coutts, A.; Bothma, J. P.; Powell, B. J.; McKenzie, R. H.; Smith, S. C.; Meredith, P. J. Am. Chem. Soc. 2007, 129 , 6672–6673. (b) Meredith, P.; Powell, B. J.; Riesz, J.; Nighswander-Rempel, S.; Pederson, M. R.; Moore, E. Soft Matter 2006, 2 , 37–44. 64. Reimers, J. R.; Hush, N. S. J. Am. Chem. Soc. 2004, 126 , 4132–4144. 65. Hahn, S.; Stock, G. J. Phys. Chem. B 2000, 104 , 1146–1149. 66. Walls, D. F.; Milburn, G. J. Quantum Optics, Springer-Verlag, Berlin, 2006. 67. Weiss, U. Quantum Dissipative Systems, World Scientific, Singapore, 2008. 68. For an introductory discussion of broken symmetry, see, e.g., Blundell, S. J. Magnetism in Condensed Matter , Oxford University Press, Oxford, UK, 2001. For a more advanced discussion, see, e.g., Anderson, P. W. Basic Notions of Condensed Matter Physics, Benjamin-Cummings, Menlo Park, CA, 1984. 69. Dirac, P. Proc. R. Soc. A 1929, 123 , 714–733. 70. (a) Pople, J. A. Rev. Mod. Phys. 1999, 71 , 1267–1274. (b) Truhlar, D. G. J. Am. Chem. Soc. 2008, 130 , 16824–16827. 71. Feynman, R. P. Int. J. Theor. Phys. 1982, 21 , 467–488. 72. Lanyon, B. P.; Whitfield, J. D.; Gillet, G. G.; Goggin, M. E.; Almeida, M. P.; Kassal, I.; Biamonte, J. D.; Mohseni, M.; Powell, B. J.; Barbieri, M.; Aspuru-Guzik, A.; White, A. G. Nature Chem. 2010, 2 , 106–111. 73. Schuch, N.; Verstraete, F. Nature Phys. 2009, 5 , 732–735. 74. Goldenfeld, N. D. Lectures on Phase Transitions and the Renormalisation Group, Addison-Wesley, Reading, MA, 1992. 75. See, e.g., Wen, X.-G. Quantum Field Theory of Many-Body Systems, Oxford University Press, Oxford, UK, 2004. 76. (a) Freed, K. F. Acc. Chem. Res. 1983, 16 , 137–144. (b) Gunnarsson, O. Phys. Rev. B 1990, 41 , 514–518. (c) Iwata, S.; Freed, K. F. J. Chem. Phys. 1976, 65 , 1071–1088. (d) Graham, R. L.; Freed, K. F. J. Chem. Phys. 1992, 96 , 1304–1316. (e) Martin, C. M.; Freed, K. F. J. Chem. Phys. 1994, 100 , 7454–7470. (f) Stevens, J. E.; Freed, K. F.; Arendt, F.; Graham, R. L. J. Chem. Phys. 1994, 101 , 4832–4841. (g) Finley, J. P.; Freed, K. F. J. Chem. Phys. 1995, 102 , 1306–1333. (h) Stevens, J. E.; Chaudhuri, R. K.; Freed, K. F. J. Chem. Phys. 1996, 105 , 8754–8768. (i) Chaudhuri, R. K.; Freed, K. F. J. Chem. Phys. 2003, 119 , 5995–6002. (j) Chaudhuri, R. K.; Freed, K. F. J. Chem. Phys. 2005, 122 , 204111. 77. (a) Scriven, E.; Powell, B. J. J. Chem. Phys. 2009, 130 , 104508. (b) Phys. Rev. B . 2009, 80, 205107. 78. (a) Martin, R. L.; Ritchie, J. P. Phys. Rev. B 1993, 48 , 4845–4849. (b) Antropov, V. P.; Gunnarsson, O.; Jepsen, O. Phys. Rev. B 1992, 46 , 13647–13650. (c) Pederson, M. R.; Quong, A. A. Phys. Rev. B 1992, 46 , 13584–13591. (d) Brocks, G.; van den Brink, J.; Morpurgo, A. F. Phys. Rev. Lett. 2004, 93 , 146405. (e) Cano-Cort´es, L.; Dolfen, A.; Merino, J.; Behler, J.; Delley, B.; Reuter, K.; Koch, E. Eur. Phys. J. B 2007, 56 , 173–176.

366

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

79. For an accessible and highly outspoken discussion of these ideas, see Laughlin, R. B.; Pines, D. Proc. Natl. Acad. Sci. USA 2000, 97 , 28–31; Laughlin, R. B. A Different Universe, Basic Books, New York, 2005. 80. Anderson, P. W. Science 1972, 177 , 393–396. 81. Powell, B. J. Chem. Aust. 2009, 76 , 18–21.

PART D Advanced Applications

11

SIESTA: Properties and Applications MICHAEL J. FORD School of Physics and Advanced Materials, University of Technology, Sydney, NSW, Australia

SIESTA provides access to the usual set of properties common to most DFT implementations:

• • • • • • • • • • • • •

Total energy, charge densities, and potentials Atomic forces and unit cell stresses Geometry specification in Cartesian and/or internal z -matrix coordinates Geometry optimization using the conjugate gradient, modified Broyden and Fire algorithms, and simulated annealing Total and partial densities of states Band dispersions Constant energy, temperature, or pressure molecular dynamics Simulation of scanning tunneling microscope images according to the Tersoff–Hamann approximation Electron transport properties using the nonequilibrium Green’s function approach Optical properties and the frequency-dependent dielectric function within the random phase approximation and using first-order time-dependent perturbation theory Phonon spectrum and vibrational frequencies Mulliken population analysis Born charges

In this chapter a number of these properties are discussed through examples relevant to nanoscience and technology. The SIESTA methodology is described Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

369

370

SIESTA: PROPERTIES AND APPLICATIONS

in detail in Chapter 2; the present chapter is intended as an accompaniment. The first three examples illustrate the general capabilities of the SIESTA code for problems containing relatively small numbers of atoms and that are amenable to standard diagonalization to solve the self-consistent problem. The last example illustrates the divide-and-conquer linear-scaling capabilities to tackle problems containing large numbers of atoms.

11.1 ETHYNYLBENZENE ADSORPTION ON AU(111)

There has been considerable interest for some time in self-assembled monolayers (SAMs) in nanotechnology. They are relatively easy to prepare on a variety of surfaces, gold being the most common, with a wide range of molecules forming ordered molecular layers.1 – 3 They are a useful platform for controlling surface properties and providing functionality with applications in, for example, molecular electronics.4,5 The alkynyl group as method of anchoring SAMs to gold surfaces is a promising candidate to study. It should provide an unbroken conjugated pathway to the gold surface, unlike thiol linkers, and a wide range of terminal alkynes can be synthesized.6 Ethynylbenzene is a simple representative example of this class of molecule; there is some experimental evidence that it binds to gold surfaces and nanoparticles, although these studies are inconclusive about the nature of the bond.7,8 The calculations described below attempt to answer the question of whether this molecule is likely to form SAMs and the likely adsorption geometries and energetics.9,10 The computational conditions first have to be established and an appropriate representation of the semi-infinite surface in terms of a multilayer slab needs to be determined. The slab needs to contain sufficient layers that the center of the slab is relatively bulklike, or in this particular case so that a molecule adsorbed on one side of the slab is not influenced by the other surface. Conversely, the slab should not be too big, such that the calculations are prohibitively large. Figure 11.1 shows the convergence of surface charge density above the slab layer and convergence of the workfunction for an Au(111) slab as a function of the number of layers. Convergence of the workfunction with two computational parameters, reciprocal space grid (k -grid), and orbital confinement (energy shift) are also shown in Fig. 11.1A. The workfunction is calculated as the difference between the electrostatic potential in vacuum (i.e., at a position in the unit cell far above the surface) and the Fermi level. The charge density and density difference are extracted from the density matrix (saved to file at each SCF step) using the DENCHAR utility at the points of a userspecified plane, or volume. Charge densities can then be visualized using standard plotting packages. Alternatively, the charge densities and potentials evaluated over the real space grid used to represent the density matrix can be written to file directly from SIESTA by setting the appropriate input flags. These are written unformatted and need to be processed for plotting. The GRID2CUBE utility

ETHYNYLBENZENE ADSORPTION ON AU(111)

371

3 RMS MAX

dq (e– Bohr–3)

2.5 2 1.5 1 0.5 0

0

2

4

6 8 Number of layers

10

12

14

(A) 1

Au(111) work function (eV)

5

2

7x7

4

0.1

0.02

6

8

23 x 23

15 x 15

19 x 19

10

5 13 x 13

3

13

4 20 3

2

Layers Energy shift K-points

50

(B)

˚ above the Au(111) slab surface. Values are maximum Fig. 11.1 (A) Charge density 1 A and the RMS difference is with respect to a 13-layer slab. (B) Convergence of workfunction with number of slab layers, energy shift parameter (mRy), and k -point grid. [From Ref. 13 and R. C. Hoft, N. Armstrong, M. J. Ford, and M. B. Cortie, J. Phys. Condens. Matter, 19 215206 (2007), with permission. Copyright © IOP Publishing.]

will generate formatted output from these files in the format of a GAUSSIAN cube file. The calculations in Fig. 11.1 are for a 1 × 1 unit cell in the plane of the surface, that is, one atom per layer. The equivalent of a double-zeta plus polarization

372

SIESTA: PROPERTIES AND APPLICATIONS

(DZP) basis set is used. A generalized-gradient approximation to the exchangecorrelation functional according to Perdew–Burke–Ernzerhof (GGA-PBE)11 and a real-space integration grid with a 300-Ry cutoff are employed (1 Ry = 0.5 atomic unit of energy = ca.13.6 eV). It is often advisable to use a fine real-space grid to avoid numerical errors; the time penalty for such a grid is not generally a limiting factor. A cutoff of 300 Ry is well converged. A Troullier–Martins pseudopotential12 with scalar relativistic corrections is used to represent the core Au electrons, with a valence of 5d10 6s. Cutoff radii for each of the angular ˚ for s and p, 1.48 A ˚ for d momentum channels of the pseudopotential are 2.32 A and f. The quality of these pseudopotentials has been checked in the usual way by comparing against all electron calculations for the atom; they reproduce well the bulk properties of gold (lattice parameter, cohesive energy, and bulk modulus).13 It is interesting to note that values for the total and cohesive energies of bulk gold do not vary much between a single-zeta plus polarization (SZP) and a DZP basis set, while DZ is considerably worse. Where computational cost is a limiting factor, an SZP basis may be acceptable, although for adsorption energies DZP is probably necessary. √ The Au(111) surface is unusual in that it reconstructs to form a 3 × 22 struc˚ 14 although there is evidence that this reconstructure with a period of about 63 A, tion is lifted in the presence of adsorbed molecules.14,15 More recently, experimental measurements and calculations suggest that thiolate adsorption drives an alternative gold adatom structure and that these adatoms are an integral part of the adsorption motif.16 – 18 A detailed analysis of these points is beyond the scope of the present chapter, where we are more interested in demonstrating the utility of the SIESTA methodology. Accordingly, a bulk terminated Au(111) surface is assumed. Temperature smearing of the electron occupation is employed in these calculations to assist convergence of the SCF steps. Both the standard Fermi–Dirac function and the function proposed by Methfessel and Paxton19 are implemented in SIESTA. In this case it is the free energy F (T ) that is minimized during selfconsistency. The total energy in the athermal limit is then approximated by the expression Etot (T = 0) = 12 [Etot (T ) + F (T )]

(11.1)

The degree of smearing is determined by specifying a fictitious temperature to the electron distribution; in this case, a temperature corresponding to 25 meV is used. Charge density close to the slab surface has converged by four layers and thereafter oscillates slightly. The charge density should be a reasonable indicator of how the adsorption properties will converge. The workfunction is less sensitive to the number of slab layers and the k -grid. Again four layers and a 15 × 15 kgrid are reasonably converged. Only one k -point is required perpendicular to the surface because there is no periodicity in this direction. The workfunction is very sensitive to the energy shift, with values as small as 0.1 mRy required for good

ETHYNYLBENZENE ADSORPTION ON AU(111)

373

convergence. This level is impractical for realistic surface adsorption calculations, as it is extremely time intensive. It is worth noting that the converged value of the workfunction calculated here is 5.13 eV, compared with an experimental value of 5.31 eV.20 The conclusion from the data in Fig. 11.1 is that a four-layer slab is the minimum for obtaining reasonably converged results. Calculations of ethynylbenzene adsorption support this conclusion, the binding energy is converged to within about 0.05 eV for four layers and is essentially fully converged at seven layers. Two additional factors need to be considered when assessing adsorption calculations: basis set superposition errors (BSSE) and dipole corrections. BSSE is inherent in the use of atom-centered basis sets. The binding energy, EB , is determined from calculations of the total energies of slab + adsorbate, ET , slab alone, ES , and adsorbate alone, EA , according to EB = ET − (ES + EA )

(11.2)

The numbers of basis functions used to describe the two fragments, slab and adsorbate, are smaller than for the total system, leading to fewer variational degrees of freedom and hence overestimates of the total energies. Although this error is small for the total energies, it can amount to about 10% of the binding energy calculated from the difference of total energies according to Eq. (11.2). Here the established method of counterpoise correction is used to remove this effect.21 The same set of basis functions are used in the two fragment calculations, with zero charge assigned to those basis functions associated with the missing atoms, a procedure commonly referred to as ghosting. This is implemented in SIESTA by assigning the corresponding negative atomic number to ghosted atoms. The efficacy of counterpoise corrections has been debated in the literature and demonstrated to “correct” the binding energy in the wrong direction in certain circumstances22 ; it is however, a well-established and widely used technique. Dipole corrections are an artifact of periodic boundary conditions and arise in situations where an asymmetric geometry is used.23 Periodicity perpendicular to the slab surface imposes the condition that the potential must be identical at the cell boundary above and below the slab. However, if the slab is asymmetric, as is the case where adsorption occurs on only one slab surface, physically the potential is not identical and approaches different asymptotic values above and below. This leads to the presence of an additional unphysical potential that can distort optimized geometries and binding energies. One solution to this problem is to introduce a fictitious dipole charge layer in the vacuum portion of the unit cell parallel to the slab surface that can be included in the self-consistent field. This is not implemented in SIESTA. The problem can obviously be avoided by always using symmetric geometries, at the expense of requiring more atoms. In the present application this dipole layer is neglected, having little effect on optimized geometries and contributing less than 1% to binding energies. For more polar bonds between surface and adsorbate, one might expect the situation to be considerably worse.

374

SIESTA: PROPERTIES AND APPLICATIONS

Figure 11.2 shows the convergence of binding energy for ethynylbenzene on Au(111) against the number of k -points and energy shift. An energy shift of 5 mRy and 15 k -points gives well-converged values with binding energies reliable to better than 0.05 eV. The number of k -points corresponds to a 5 × 5 grid giving 15 symmetry unique points. SIESTA uses inversion symmetry in the reciprocal

Relative Binding Energy (eV)

0.5

0

–0.5

–1

–1.5

-2

0

20

40 60 Number of k-points

80

100

(A)

Relative Binding Energy (eV)

0.05

0

–0.05

–0.1

–0.15

–0.2 0.1

1 Energy Shift (mRy)

10

(B)

Fig. 11.2 Convergence of binding energy with (A) the number of k -points and (B) the energy shift. Binding energies are relative to value at the largest k -point grid and smallest energy shift.

ETHYNYLBENZENE ADSORPTION ON AU(111)

375

cell to generate the k -grid. Fewer k -points (by a factor of 3) are needed here compared with the previous analysis because the unit cell is now a 3 × 3 supercell in order to accommodate the adsorbate and reduce interactions between periodic images. The use of strictly localized orbitals is an advantage in this regard because multipole interactions between periodic images of the molecule tend to zero quite rapidly with increasing unit cell size. The interaction here is essentially zero. The likely adsorption motifs for ethynylbenzene on the gold surface are shown in Fig. 11.3. For the ethynylbenzene radical (Fig. 11.3A) the terminal C—H bond has been cleaved and the H atom removed. One might expect this to be the

(A)

(B)

(C)

Fig. 11.3 Potential configurations of surface-bound ethynylbenzene molecule: (A) ethnylbenzene radical with terminal H atom removed; (B) vinylidene; (C) flat configuration. (From Ref. 10.)

376

SIESTA: PROPERTIES AND APPLICATIONS

most promising candidate for SAM formation. Two additional configurations are also possible, one where a 1,2 hydrogen shift has occurred to give vinylidene (Fig. 11.3B) and a second where the C—C triple bond opens up to give the flat configuration (Fig. 11.3C). The latter two configurations are potential intermediates to the final state of the strongly bound radical by removal of the hydrogen atom. Reactions of metals with ethynylbenzene are known to proceed via a 1,2 hydrogen shift to form metal vinylidenes.24 The likely absorption sites are first identified by scanning the adsorbate across the surface with the adsorbate geometry held rigid. This involves a large number of single-point energy calculations and is therefore carried out at a low computational level. Once the potential energy surface has been mapped out roughly in this way, full geometry optimizations are carried out at a higher level using a four-layer slab, a 3 × 3 × 1 k-grid, and a 5-mRy energy shift. Both adsorbate and the first layer of Au surface atoms are optimized to 0.04 eV/Ang. Although this is a relatively weak force tolerance, binding energies do not change appreciably when the tolerance is improved to 0.01 eV/Ang. Final binding energies are calculated using optimum geometries from the previous step, calculated at a higher level (seven slab layers, 5 × 5 × 1 k-grid) and are converged to better than 0.05 eV. Further relaxation at the final step is not necessary, as it does not affect the binding energies or geometries appreciably. Table 11.1 gives the final binding energies and adsorption sites for the three motifs in Fig. 11.3. All three motifs form strong covalent bonds to the surface, in contrast to thiol molecules where the interaction is weaker if the terminal hydrogen is not removed. Mulliken overlap populations give an indication of the character of the bond, and for both the ethynylbenzene radical and vinylidene there is considerable overlap (greater than 0.12) between three of the surface Au atoms and the nearest C atom. Adsorption heights, optimum adsorption sites, and binding energies are also nearly the same for these two motifs, suggesting they both interact with the surface in a similar manner. The flat geometry is bound through two C atoms, each forming a single bond with a surface Au atom. Again, Mulliken overlap populations suggest a covalent bond. Overall energies in going from the gas-phase molecule in its relaxed geometry to the surface-bound species are exothermic for vinylidene and energy neutral for the flat geometry. The latter value is below the reliability of the calculations.

TABLE 11.1 Binding Energies and Adsorption Sites Energy (eV)

Vinylidene Flat geometry Ethynylbenzene a

Site

Binding

Overalla

fcc atop fcc

−2.45 −1.84 −2.99

−0.24 0.03 2.54

Overall energies are energies of the surface-bound species relative to the relaxed, isolated molecule and slab.

DIMERIZATION OF THIOLS ON AU(111)

377

This is despite a relatively large geometry change upon absorption. These two configurations are therefore likely intermediates to the formation of a SAM. Indeed, previous surface-enhanced Raman (SERS) experiments suggest the possibility that ethynylbenzene can adsorb onto a gold surface in the flat geometry.7 For ethynylbenzene, C—H bond cleavage is calculated for the gas-phase molecule and leads to a very endothermic overall energy upon adsorption. Reaction energies for formation of a SAM can be estimated from the calculations described above. C6 H5 C2 H + Aun → C6 H5 C2 —Aun + 12 H2 C6 H5 C2 H + Aun → [C6 H5 C2 —Aun ]− + H+ As well as C—H bond cleavage (first reaction), deprotonation (second reaction) also needs to be considered. Either of the two reactions can proceed directly or through the vinylidene or flat intermediates. Thus, calculating reaction energies for all three pathways gives a check on the reliability of the estimates since they should all give the same value. The first reaction is slightly endothermic, with an energy of about 0.5 eV; the range of values for the three pathways is 0.4 eV. Using a value for the proton solvation energy of25 11.4 eV gives a more endothermic reaction in the second case, with a value of 1.7 eV, but with more consistent values for the three pathways varying only by 0.1 eV. These calculations demonstrate that the ethynylbenzene moiety is indeed a promising alternative to thiols for formation of SAMs on Au(111). It is strongly bound to the surface, yet has a small diffusion barrier, less than 0.2 eV,9 between hollows, a site that will allow ordering of the molecules. This linkage scheme may be more oxidatively stable than sulfur, and preparation of monolayers with double-ended molecules should be possible without the problem of forming multilayers. The vinylidene intermediate is a candidate pathway, although from these calculations it is difficult to determine whether subsequent C—H bond cleavage or deprotonation will lead to the surface-bound radical. The latter is known to be the case in the synthesis of metal complexes of ethynylbenzene.24

11.2 DIMERIZATION OF THIOLS ON AU(111)

This example serves to illustrate the advantage of internal coordinates in surface adsorption studies. Geometries can be specified in the z -matrix format in SIESTA,26 where one atom is specified in Cartesian coordinates and the remaining molecule is specified in terms of bond lengths, bond angles, and torsion angles relative to this atom. The objective in this example is to map out the potential energy surface (PES) for adsorption of methanethiolate and benzenethiolate on the Au(111) surface in detail and to estimate the dissociation barrier of the dimer, dimethyldisulfide, on this surface.27 Previous computational studies have already reported the energetics28,29 of dimerization, but not the dynamics. They find that

378

SIESTA: PROPERTIES AND APPLICATIONS

dissociation of the surface-bound disulfide is favored, although agreement with available experimental data is limited. Even for these relatively simple molecules there are sufficient degrees of freedom that mapping out the complete PES is not trivial. Generally, PES maps have been limited to a small subset of degrees of freedom and have been created by scanning rigid molecules across the surface.30,31 Using internal coordinates to describe the molecule, it is possible to perform constrained optimizations at each point on the PES and hence map this surface more completely. Figure 11.4A shows the two thiolate molecules calculated here; note that the terminal hydrogen has been removed, and as a consequence, the sulfur is strongly chemisorbed to the surface. It has been pointed out in the literature that the term thiolate is misleading, as it implies an ionic bond to the surface, whereas it is actually closer to a covalently bound “thiyl.”31 Here we use the nomenclature prevalent in the literature. Mixed coordinates are used, with a z -matrix to specify

(A)

(B)

Fig. 11.4 (A) Adsorption of benzenethiolate (left) and methanethiolate onto the Au(111) surface; (B) path for the PES scan relative to surface Au atoms. Second and third layers of gold atoms are depicted by successively smaller spheres. (From Ref. 27.)

DIMERIZATION OF THIOLS ON AU(111)

379

the adsorbate and Cartesian coordinates for the Au slab. For each adsorbate the PES is mapped along the atop–bridge–atop path shown in Fig. 11.4B. At each step in the PES a constrained optimization is performed with the position of the sulfur atom fixed relative to the Au surface while its height above the surface is allowed to vary. The rest of the molecule and the surface layer of Au atoms are fully relaxed. Mapping the PES in this much detail using Cartesian coordinates is not practicable. It is also possible to decouple optimization of the bond lengths and bond angles with the z -matrix approach and to specify different force tolerances for each. This is particularly advantageous where the PES is very flat in one coordinate compared to the other. This is the case for many molecular adsorption problems, where the PES is quite flat with respect to tilting of the molecular axis relative to the surface. With Cartesian coordinates it can be difficult to find the minimum of such a surface. Provided that there is little or no coupling between coordinates, such as in cyclic molecules, internal coordinates can also lead to efficiency gains in the optimization process, as they lead to better preconditioning of the optimization algorithm. Table 11.2 compares geometry optimizations using z -matrix and Cartesian coordinates within the SIESTA code for some simple molecules.26 The conjugate gradient algorithm is used in all cases, with the optimization being performed to three levels of force convergence and with different numbers of degrees of freedom. In the z -matrix optimization for N atoms, an unconstrained optimization can be achieved with 3N − 6 variables whereas 3N − 3 are required for Cartesian coordinates. This is because in addition to fixing the coordinates of one atom (the reference atom), in the z -matrix approach it is also possible to fix the three rotational degrees of freedom for the entire molecule. The z -matrix approach performs better for both the simple water molecule and acyclic hexanedithiol molecule. In the latter case, fixing either three or six degrees of freedom reduces the number of CG steps for z -matrix optimization very considerably. Conversely, fixing degrees of freedom in Cartesian coordinates increases the number of steps. This is because the method used (there is no Hessian matrix) is not sensitive to the translational invariance. For the cyclic benzene molecule, Cartesian coordinates improve optimization because internal coordinates are coupled to each other. The same final geometries are obtained irrespective of the coordinates used and number of degrees of freedom in the optimization. The computational conditions used here are essentially the same as those used for the geometry optimizations of ethynylbenzene described above. The force ˚ for bond lengths and 0.0009 eV/deg for angles. tolerances are set to 0.04 eV/A Optimizations are performed using the conjugate gradient (CG) method. The forces are calculated by direct differentiation of the energy and are generated in the same section of code within SIESTA. The CG method is a variant of steepest descent but avoids its pitfall of successive steps being perpendicular to each other. Instead, they are constructed to be conjugate to the previous gradient and as far as possible from all previous steps. In this method it is only necessary to store information from the last CG step rather than building up the full

380

SIESTA: PROPERTIES AND APPLICATIONS

TABLE 11.2 Number of Conjugate Gradient Steps Required to Optimize the Geometry of Three Molecules in Z-Matrix and Cartesian Coordinates

Molecule Water

No. of Atoms 3

Coordinates Cartesian z -matrix

Benzene

12

Cartesian z -matrix

Hexanedithiol

22

Cartesian z -matrix

No. of CG Stepsa

No. of Variables

I

II

III

6 9 2 3 6 9 33 36 2 11 30 33 36 63 66 60 63 66

15 35 6 3 3 4 25 7 7 12 47 45 44 76 44 20 24 32

15 37 8 6 6 19 33 9 11 14 57 58 55 108 46 33 39 397

15 40 8 9 9 21 36 9 18 20 69 63 66 171 81 44 115

Source: Ref. 26. a Columns I, II, and III represent progressively stricter convergence criteria for lengths and angles: ˚ 0.0009 eV/deg); II, (0.02 eV/A, ˚ 0.0004 eV/deg); and III, (0.01 eV/A, ˚ namely, I, (0.04 eV/A, 0.0002 eV/deg). For the Cartesian coordinate optimizations the angle tolerance is to be ignored.

Hessian matrix for the entire optimization. SIESTA writes the previous step to disk at every CG step, allowing for easy restarts of optimizations. In principle, for M nuclei, the CG method should converge in less than 3M steps. However, due to numerical errors and the fact that the potential energy surface does not necessarily have the assumed quadratic form, more steps are often required. Both Fletcher–Reeves and Polak–Ribiere CG algorithms are implemented in SIESTA, although the latter is the default and preferred option, as it reportedly performs better where the minimum is not quadratic (details of the implementations are given elsewhere32 ). The modified Broyden33 method is also available in SIESTA. In principle, the modified Broyden method, a quasi-Newton–Raphson method, would be extremely efficient if the Jacobian were known and could easily be inverted. However, this is not the case in practice; rather, the Jacobian is updated over successive steps. It is also possible to find optimum geometries using molecular dynamics (MD), and SIESTA has implemented both simulated annealing, where the temperature of the MD simulation is gradually reduced to a target temperature, and quenching, where the velocity components of the nuclei are set to zero if they are opposite the corresponding force. Although relatively easy to implement, these MD-based schemes are often not competitive compared

DIMERIZATION OF THIOLS ON AU(111)

381

with the sophisticated line search–based algorithms mentioned previously. More recently, FIRE34 (scheme for fast inertial relaxation engine), a new MD-based optimization method has been reported that is competitive and can be used easily for systems containing millions of degrees of freedom. The PESs for the two monomers are shown in Fig. 11.5. It is interesting to note that with the current z -matrix constrained optimization, the hexagonal close-packed (hcp) and face-centered cubic (fcc) hollow adsorption sites are local maxima for both PESs. By contrast, a Cartesian coordinate–based scan will yield local minima at these two sites; previous studies find this result.28 Bili´c et al.31 also find the hollow sites to be saddle points for two-layer slab calculations, but minima for a four-layer calculation. There is also no barrier to diffusion at the bridge site, in contrast to some previous calculations where the PES is mapped by scanning a rigid molecule.28,30 The PES in this region is sensitive to the tilt angle of the molecule and also its orientation. The minimum on both sides of the bridge site is with the tail group tilted back over the bridge (i.e., as the bridge is traversed from one side to the other, the tail of the molecule swings around rather than remaining fixed in orientation). Adsorption energies of 1.85 and 1.43 eV are calculated for the optimum sites for methanethiolate and benzenethiolate, respectively. This is in good agreement with previous calculations.28,31 Optimum geometries for adsorption of the dimers are shown in Fig. 11.6; here the entire dimer and surface layer are relaxed. The SIESTA implementation of z -matrix coordinates is particularly convenient for this example. Multiple z matrix blocks can be defined, making it possible to have separate sets of internal 0.4 Methanethiolate Benzenethiolate

Relative energy (eV)

0.3 fcc hcp 0.2 atop

atop

0.1 bridge

0 –3

–2

–1

0

1

2

3

Coordinate relative to bridge-site (A)

Fig. 11.5 PES for methanethiolate and benzenethiolate along the atop–bridge–atop path on the Au(111) surface.

382

SIESTA: PROPERTIES AND APPLICATIONS

(A)

(B)

Fig. 11.6 Relaxed geometries for the two thiol dimers diphenyldisulfide (left) and dimethyldisulfide (right). Two different perspectives for each are shown in (A) and (B). (From Ref. 27.)

coordinates centered around each S atom. Adsorption occurs through the sulfur atoms, with each S atom in the dimer adsorbed near the atop site and displaced slightly toward the bridge site. The two S atoms are at similar heights above the surface. Previous studies using Cartesian coordinates find a different optimum geometry with S atoms nearer the bridge sites and at different heights above the surface.28,35,36 If the calculations here are repeated using Cartesian coordinates, this previously reported minimum appears to become a local minimum. This result further illustrates the robustness of internal coordinate descriptions for molecular adsorption. Both dimers are energetically unfavorable on the surface relative to two isolated monomers, by 0.41 and 0.62 eV for dimethyldisulfide and diphenyldisulfide, respectively. This is despite the fact that geometry optimizations find a local minimum and do not dissociate the dimer. This would suggest that there is an activation barrier to dissociation. To explore this point the PES for dissociation of dimethyldisulfide was mapped and is shown in Fig. 11.7A. One S atom is fixed at its optimum site while the other is scanned over the surface with a constrained optimization of the molecule performed at each point. The PES in Fig. 11.7A

DIMERIZATION OF THIOLS ON AU(111)

383

(A)

(B)

Fig. 11.7 (A) Spin-restricted PES for dissociation of dimethyldisulfide. Contours are in 0.05-eV intervals relative to energy minimum; position of surface atop and bridge sites are shown; one S atom is fixed at x = 1.05 and y = 2.27 A. (B) Spin-unrestricted PES along the dissociation path shown in (A). Units of spin are number of electrons. (From Ref. 27.)

384

SIESTA: PROPERTIES AND APPLICATIONS

was mapped using spin-restricted calculations for computational efficiency. This will give a reasonable idea of the PES shape and help identify the dissociation path. A spin-unrestricted scan along this path is then performed, with the results shown in Fig. 11.7B. As expected, DFT does not describe the region where the bond is dissociating very well; Fig. 11.7B shows that there is significant spin contamination around the saddle point. Away from this point, where the spin is zero the DFT energies are presumably quite reliable and allow us to estimate the height of the dissociation barrier to lie between 0.3 and 0.35 eV. The barrier for formation of the dimer from two surface-bound isolated monomers is estimated to lie between 0.71 and 0.76 eV.

11.3 MOLECULAR DYNAMICS OF NANOPARTICLES

So far, only ground-state properties at 0 K have been discussed. Molecular dynamics (MD) is the standard method of introducing the motion of the atomic nuclei into the problem and hence simulating various temperature-dependent properties, such as phonon spectra or melting behavior. The MD capabilities implemented in SIESTA will be illustrated in this section, where the melting behavior of the 20-atom gold cluster is examined.37,38 This particular size cluster is interesting because its optimum geometry is an ordered tetrahedral pyramid and is isolated by about 1 eV from its nearest-lying isomer, at least as determined in 0 K DFT calculations.39 – 42 There is experimental evidence that this structure is indeed the optimum.43 The standard Verlet algorithm44,45 is implemented in SIESTA to propagate the MD trajectory in time. A detailed description of this algorithm and other established components of MD are given in many textbooks, for example.46 Here the initial velocities are chosen from the Maxwell–Boltzmann distribution corresponding to a specified temperature. The total energy of the system is then kept constant throughout the trajectory: the microcanonical ensemble. Motion of the center of mass of the system is frozen out initially, although rotational motion currently is not. Nonperiodic systems such as clusters and molecules can pick up slight center-of-mass kinetic energy over a long trajectory due to numerical errors. Rotational motion is generally very small to start but can become appreciable over a long trajectory. Specifying a fine integration grid can help prevent these problems. In this example, thermal behavior in the canonical ensemble is calculated using the Nos´e –Hoover47,48 thermostat to maintain constant temperature. Briefly, in this method the system is connected to a heat bath that can transfer energy into or out of the system to attempt to maintain constant temperature. The heat bath is realized by coupling a fictitious degree of freedom to the system. The degree of coupling is determined by the Nos´e mass, which controls quite sensitively the dynamics of the simulation. Constant-pressure simulations are also implemented in SIESTA using the Parrinello–Rahman method49 – 51 where again an effective mass must be set in order to carefully thermostat the trajectory

MOLECULAR DYNAMICS OF NANOPARTICLES

385

correctly. Constant-temperature and constant-pressure methods can be combined into a single simulation. The critical parameter to optimize is the time step; this must be small enough to capture the atomic motion but not too small that only short total times can be sampled. The MD time step is traditionally determined according to the following rule of thumb: dt =

1 1 10 cωmax

(11.3)

where c is the speed of light and ωmax is the highest vibrational frequency. The vibrational frequencies are determined by calculating the force matrix in SIESTA and then finding the eigenvalues of this matrix using the VIBRA utility supplied with SIESTA. The energy-shift parameter needs to be set to a small value, typically better than 5 mRy, to avoid negative frequencies for the optimized structure. For the present 20-atom gold cluster the maximum frequency is 221 cm−1 , corresponding to a time step of 15 fs.52 The time step can be analyzed more rigorously by monitoring the conservation of total energy of the extended system (i.e., the 20-atom cluster plus the Nos´e thermostat). In the present example time steps up to about 3 fs conserve this total energy well during the MD trajectory, but significant variations occur above this value. The time step is set to 2.5 fs for all the simulations presented here. A large value of Nos´e mass results in low coupling to the reservoir and leads to large temperature fluctuations and relatively constant total energy; thermostating is ineffective in this case. A low value, on the other hand, restrains the temperature oscillations and can lead to poor equilibration and overdamping of the dynamics. One way to assess the appropriate Nos´e mass value is to observe temperature fluctuations over a number of MD steps and decide on a suitable level of temperature fluctuation. Alternatively, the statistical convergence of the trajectory can be examined where the average values of the temperature, or equivalently, kinetic energy of the ions and higher moments of these quantities are observed. While the average is a good indicator that the ensemble is converging to the correct temperature, higher moments are a more sensitive indicator of the temperature fluctuations and statistical quality.53 The average kinetic energy of the ions < KEion > and second moment < (KEion − < KEion >)2 > are shown in Fig. 11.8 for a thermostat temperature of 900 K over 45,000 MD steps (112.5 ps) and a Nos´e mass of 50 Ry· fs2 . The energy shift is set to 20 mRy, the real-space grid is cut off to 100 Ry, and the LDA exchange-correlation function is used. Both quantities converge reasonably well over the entire trajectory but require about 10,000 steps to equilibrate. The average kinetic energy and its second moment converge to values corresponding to temperatures of 900 and 821 K, respectively.53 The second moment gives slightly different ensemble average temperature because it is more sensitive to temperature fluctuations. Higher moments can be calculated to give an indication of statistical quality. These results indicate that the current number of MD steps is sufficient to provide a good statistical ensemble and that

386

SIESTA: PROPERTIES AND APPLICATIONS 2.212

0.2

2.212

2.211

0.1

2.21 0.05

2.21 2.209

0

2.209 2.208

300 cm−1 ), the contribution of these motions to the overall partition function is negligible at room temperature (i.e., Qvib,i ≈ 1), and thus the error incurred in treating these modes as harmonic oscillators is not significant. However, for the low-frequency torsional modes, these errors can be significant and a more rigorous treatment is often necessary; this is especially the case for the reactions of relevance to free-radical polymerization.8,11,12b,c,l Ideally, one should solve the Schr¨odinger equation for the full multidimensional potential energy surface representing all active modes of a molecule, and use the resulting energy levels in Eq. (13.7) to obtain the partition functions; however, this is impractical for larger molecules. Instead, the approach that is usually adopted is to apply the harmonic oscillator approximation to all 3N − 6 internal modes of a molecule (as in the standard formulas above), but then multiply the resulting vibrational partition function by a correction factor for each internal hindered rotor partition function. This factor is calculated as the ratio of the 1D-HR partition function to the corresponding “pure” vibrational partition function, as calculated from the second derivative of the rotational potential at the minimum-energy structure. Using approximations such as this, the 1D-HR model has been shown to provide reasonable results in situations where testing against more sophisticated treatments is possible.61 To obtain the 1D-HR partition function for any given low-frequency torsional mode, we first need to compile the full rotational potential V (θ) for the mode in question; studies have shown that a resolution of 60◦ is sufficient for accurate results.62 The potential should be compiled as a relaxed scan (i.e., at each dihedral angle, the dihedral angle is frozen but the rest of the molecule is fully optimized) and, as in ordinary geometry optimizations, low levels of theory, such as B3LYP/6-31G(d) are usually sufficiently accurate. Having obtained the potential, this is then used to solve the one-dimensional Schr¨odinger equation for a rigid rotor: −

2 d 2 + V (θ) = εi 2Ir dθ2

(13.33)

In this equation is the wavefunction, ε is the energy, Ir is the reduced moment of inertia, and V (θ) is the rotational potential, which for this purpose should be supplied at a high resolution. To this end, the 60◦ resolution potential is fitted with a Fourier series of up to 18 terms and then reevaluated at a resolution of

CALCULATION OF KINETICS AND THERMODYNAMICS

465

1.2◦ . The reduced moment of inertia (Ir ) is assumed to be independent of θ and is calculated from the optimized geometry using the equation for I (2,3) , as defined by East and Radom.63 There is no analytical solution to this Schr¨odinger equation; however, it can be solved numerically for the eigenvalues, ε, by converting it into the Hill differential equation. Having obtained the energy levels, these are then summed in order to obtain the partition function via Eq. (13.7), in the usual manner. A program called T-CHEM for performing these calculations is freely available at http://rsc.anu.edu.au/∼cylin/scripts.html. Finally, in addition to the approach described above, there are a number of lower-cost methods available for calculating hindered-rotor partition functions; some of which (such as the Pitzer tables64 ) are applicable only for potentials that can be described by a pure cosine function; others are approximations designed for use with any type of partition function. It is beyond the scope of this chapter to detail these here, but a description and evaluation of these methods is found in the literature.62,65,66 13.4.5 Solvent Effects

The methodology described thus far is designed to reproduce chemically accurate values of the rate and equilibrium constants for gas-phase systems, and the vast majority of computational studies of radical polymerization in the literature have indeed been performed in the gas phase. In many situations, the effects of solvents on radical reactions are relatively minor and the gas-phase calculations are indicative of solution-phase behavior. For example, gas-phase calculations of the propagation rate coefficients of vinyl chloride and acrylonitrile were able to reproduce the experimental (solution-phase) rate coefficients for these monomers to within a factor of 2, and solvation effects (as calculated using simple continuum models) were minor.8 Gas-phase studies of the equilibrium constants in certain RAFT polymerizations have also reproduced experimental data to within chemical accuracy, for both small model reactions16a and polymeric systems.11e,18 Nonetheless, there are free-radical polymerizations, such as those of monomers, that are capable of undergoing hydrogen bonding or other specific interactions with the solvent, where strong solvent effects have been well documented experimentally.6c,67 Not unexpectedly in such cases, there can be very large differences between the gas-phase rate coefficients calculated and the corresponding solution-phase values. For example, in a recent computational study68 of the propagation rate coefficient of ethyl-α-hydroxymethacrylate (EHMA) the gas-phase rate coefficient calculated differed from the corresponding solution-phase experimental values69 by more than five orders of magnitude. In such cases, the correct treatment of solvent effects is therefore crucial. Unfortunately, the development of cost-effective methods for treating the solvent in chemical reactions is an ongoing area of research and there have been relatively few benchmarking studies for the specific case of radical polymerization. Nonetheless, it is worth making a few general comments on the main strategies that are available for modeling solvation effects.

466

FREE-RADICAL POLYMERIZATION

The simplest and most computationally efficient methods are continuum models, in which each solute molecule is embedded in a cavity surrounded by a dielectric continuum of permittivity ε.70 Most models, of which the ab initio conductor-like solvation model (COSMO)71 and the polarizable continuum model (PCM)72 are prominent examples, also include terms for the nonelectrostatic contributions of the solvent, such as dispersion, repulsion, and cavitation. Some of the more recent models also incorporate more sophisticated treatments of the solvent itself. For example, COSMO-RS73 is a variant of the COSMO model that describes the interactions in a fluid as local interaction of molecular surfaces, the interaction energies being quantified by the values of the two screening charge densities that form a molecule contact. SM674 (Solvent Model 6) is based on a generalized Born approach, which uses a long-range dielectric continuum to treat bulk electrostatics effects combined with short-range atomic surface tensions to account for first-shell solvent effects. Continuum solvation models can be invoked in most of the leading computational chemistry software packages, and the reader is referred to their respective manuals for specific implementation details. However, the following general points should be noted. First, continuum solvation models rely upon empirically optimized parameters, and it is important to choose radii and levels of theory that are optimized for the specific method in use. As always, the choice of solvation method for any particular system should be determined through assessment studies. Second, the specification of a particular solvent depends on several parameters in addition to the dielectric constant, including the volume, density, and solvent radius. If using a nondefault solvent model, care must be taken to set all of these parameters appropriately. Third, since the levels of theory used for solvation energy calculations, typically small basis set HF or B3LYP calculations, are not usually sufficiently accurate for gas-phase energetics, the total free energies in solution should be calculated via a simple thermodynamic cycle as follows: Gsoln = Ggas + Gsolv + G1atm→1M

(13.34)

In this equation, Ggas is the gas-phase free energy of reaction, which is calculated separately at a high level of theory, and Gsolv , the free energy of solvation, should not be confused with the total free energy of reaction in solution. In some software packages, additional keyword(s) are required for the solvation free energy (the difference of the gas- and solution-phase free energies at the same level of theory) to be calculated. In GAUSSIAN, the SCFVAC keyword is used for this purpose. The final term in Eq. (13.34), G1atm→1M , is required for converting from the gas-phase standard state for an ideal gas (typically, 1 atm) to 1 M in solution, and is given by G1atm→1M = nRT ln(V ) = nRT ln

RT P

(13.35)

CALCULATION OF KINETICS AND THERMODYNAMICS

467

where n is the number of moles of gas change from reactants to products. As an example, at room temperature (298.15 K) and standard pressure (1 atm), this term has a value of 7.9 kJ mol−1 . Finally, having made the correction for the change in state, G1atm→1M , the standard unit of concentration in the rate and equilibrium constant expressions [Eqs. (13.5) and (13.6)] becomes c◦ = 1 mol L−1 , rather than its value for an ideal gas (e.g., 0.0408 mol L−1 at room temperature and standard pressure). Continuum models are designed to reproduce bulk or macroscopic behavior and can fare extremely well in certain applications, not least the prediction of solvation energies of stable organic molecules.74,75 Continuum models have been applied to radical polymerization processes with mixed results. In an early study, Thickett and Gilbert12g used a simple PCM model to study the effect of solvent on acrylic acid propagation, confirming experimental observations76 that aqueous solvation substantially lowers the reaction barrier. However, it was noted in this work that the levels of theory used in the gas- and solution-phase calculations were not accurate enough for quantitative predictions of the reaction rate. As noted above, in our study of vinyl chloride and acrylonitrile propagation, we found that continuum models slightly improved the agreement between theory and experiment; however, in those systems the solvation effects were very small and well within the uncertainty of the experimental and theoretical data.8 More encouragingly, we have found that the combination of high-level ab initio calculations with continuum solvation models can reproduce one- and two-electron redox potentials of a wide range of open- and closed-shell systems,77 including systems directly relevant to atom transfer radical polymerization.9e,f In such systems, the solvation effects are very large, due to the presence of charged species. Nonetheless, in other systems, the continuum solvation models have failed to redress the deviations of theory and experiment. For the problematic EHMA system described above, the use of PCM solvation energies actually increased the deviation between theory and experiment from five orders of magnitude to as much as eight orders of magnitude, depending on the solvent.68 This is presumably because continuum models do not take into account the hydrogen-bonding interactions, expected to be important in this system. Indeed, similar failures have been noted in other (non-polymer-related) systems where hydrogen bonding is important.78 Moreover, even where explicit solute–solvent interactions can be neglected, the use of continuum models to study polymerization kinetics is likely to be problematic. This is because the results obtained using continuum models are highly sensitive to the choice of cavities, and these are typically parameterized to reproduce the free energies of solvation for a set of small stable organic molecules. As a result, the choice of appropriate cavities for weakly bound species such as transition structures can be difficult.75 For problematic systems where strong explicit solute–solvent interactions are important, the inclusion of explicit solvent molecules in the ab initio calculation is necessary. Ideally, one should include many explicit solvent molecules in the calculation and try to reproduce bulk behavior via molecular dynamics or Monte Carlo simulations, combined with the imposition of periodic boundary

468

FREE-RADICAL POLYMERIZATION

conditions.79 However, such calculations are hampered by problems such as the lack of potentials that can adequately describe both cluster and bulk behavior and the rapid increase in the conformational possibilities as the number of individual components increases. As a result, such approaches are not currently practical for polymerization systems. A less computationally demanding approach, known as a cluster-continuum model,80 is to include a small number of explicit solvent molecules in the calculation (effectively treating them as additional reactants), while modeling the remaining solvation effects via a continuum model. However, choosing an appropriate number of explicit solvent molecules and their location, without testing all possibilities exhaustively, is always problematic, particularly for larger molecules. Further work is required to design practical guidelines for applying these methods to polymerization systems. In the meantime, it is worth noting that very promising results have recently been obtained without the need for explicit solvent molecules using COSMO-RS solvation energies in conjunction with the standard high-level gas-phase methodology.8b To date, this approach has been evaluated only for the propagation kinetics of methyl acrylate and vinyl acetate, two systems where simple continuum models fail.8b If its excellent performance can be maintained for other problematic systems, this methodology will further expand the scope of computational radical polymerization. 13.5 CONCLUSIONS

Computational quantum chemistry has much to offer the experimental polymer chemist. At the microscopic level, it can be used to clarify the reaction mechanism and explain the effects of substituents on the individual reactions, thereby facilitating the rational design of optimal control agents. At the macroscopic level, it can be used to build accurate kinetic models for simulating the outcome of polymerization processes as a function of the reaction conditions, for use in process optimization and control. However, the success of computational chemistry is crucially dependent on choosing realistic model reactions and applying accurate computational procedures; simultaneously satisfying these competing demands has, until recently, been difficult. Nonetheless, in recent years the development of new cost-effective computational methods, along with concurrent increases in computing power, has at last brought chemical accuracy within reach. Although the treatment of solvent effects remains problematic, even here, computational quantum chemistry has now proven itself a reliable and useful tool and an important complement to experiment. REFERENCES 1. For more information on the chemistry and kinetics of free-radical polymerization, see, e.g., (a) Matyjaszewski, K.; Davis, T. P. Handbook of Radical Polymerization, Wiley, Hoboken, NJ, 2002. (b) Moad, G.; Solomon, D. H. The Chemistry of FreeRadical Polymerization, Pergamon Press, Oxford, UK, 1995. (c) Odian, G. Principles of Polymerization, Wiley-Interscience, New York, 1991.

REFERENCES

2. 3. 4. 5. 6.

7.

8. 9.

10.

11.

12.

469

Kamigaito, M.; Satoh, K. Macromolecules 2008, 41 , 269–276. Moad, G.; Rizzardo, E.; Thang, S. H. Aust. J. Chem. 2005, 58 , 379–410. Matyjaszewski, K. Prog. Polym. Sci . 2005, 30 , 858–875. Hawker, C. J.; Bosman, A. W.; Harth, E. Chem. Rev . 2001, 101 , 3661–3688. (a) Coote, M. L.; Zammit, M. D.; Davis, T. P. Trends Polym. Sci . 1996, 4 , 189–196. (b) van Herk, A. M. Macromol. Theory Simul . 2000, 9 , 433–441. (c) Beuermann, S.; Buback, M. Prog. Polym. Sci . 2002, 27 , 191–254. (d) Barner-Kowollik, C.; Buback, M.; Egorov, M.; Fukuda, T.; Goto, A.; Olaj, O. F.; Russell, G. T.; Vana, P.; Yamada, B.; Zetterlund, P. B. Prog. Polym. Sci . 2005, 30 , 605–643. Barner-Kowollik, C.; Buback, M.; Charleux, B.; Coote, M. L.; Drache, M.; Fukuda, T.; Goto, A.; Klumperman, B.; Lowe, A. B.; McLeary, J. B.; Moad, G.; Monteiro, M. J.; Sanderson, R. D.; Tonge, M. P.; Vana, P. J. Polym. Sci. A 2006, 44 , 5809–5831. See, e.g., (a) Izgorodina, E. I.; Coote, M. L. Chem. Phys. 2006, 324 , 96–110. (b) Lin, C. Y.; Izgorodina, E. I.; Coote, M. L. Macromolecules 2010, 43 , 533–560. (a) Gillies, M. B.; Matyjaszewski, K.; Norrby, P.-O.; Pintauer, T.; Poli, R.; Richard, P. Macromolecules 2003, 36 , 8551–8559. (b) Singleton, D. A.; Nowlan, D. T., III; Jahed, N.; Matyjaszewski, K. Macromolecules 2003, 36 , 8609–8616. (c) Matyjaszewski, K.; Poli, R. Macromolecules 2005, 38 , 8093–8100. (d) Lin, C. Y.; Coote, M. L.; Petit, A.; Richard, P.; Poli, R.; Matyjaszewski, K. Macromolecules 2007, 40 , 5985–5994. (e) Tang, W.; Kwak, Y.; Braunecker, W.; Tsarevsky, N. V.; Coote, M. L.; Matyjaszewski, K. J. Am. Chem. Soc. 2008, 130 , 10702–10713. (f) Lin, C. Y.; Coote, M. L.; Gennaro, A.; Matyjaszewski, K. J. Am. Chem. Soc., 2008 130 , 12762–12774. (a) Marsal, P.; Roche, M.; Tordo, P.; de Sainte Claire, P. J. Phys. Chem. A 1999, 103 , 2899–2905. (b) Gigmes, D.; Gaudel-Siri, A.; Marque, S. R. A.; Bertin, D.; Tordo, P.; Astolfi, P.; Greci, L.; Rizzoli, C. Helv. Chim. Acta 2006, 89 , 2312–2326. (c) Kaim, A.; Megiel, E. J. Polym. Sci. A 2005, 44 , 914–927. (d) Kaim, A. J. Polym. Sci. A 2006, 45 , 232–241. (e) Megiel, E.; Kaim, A. J. Polym. Sci. A 2008, 46 , 1165–1177. (a) Farmer, S. C.; Patten, T. E. J. Polym. Sci. A 2002, A40 , 555–563. (b) Coote, M. L.; Radom, L. J. Am. Chem. Soc. 2003, 125 , 1490–1491. (c) Coote, M. L.; Radom, L. Macromolecules 2004, 37 , 590–596. (d) Coote, M. L. Macromolecules 2004, 37 , 5023–5031. (e) Feldermann, A.; Coote, M. L.; Stenzel, M. H.; Davis, T. P.; Barner-Kowollik, C. J. Am. Chem. Soc. 2004, 126 , 15915–15923. (f) Coote, M. L.; Henry, D. J. Macromolecules 2005, 38 , 1415–1433. (g) Coote, M. L. J. Phys. Chem. A 2005, 109 , 1230–1239. (h) Coote, M. L.; Krenske, E. H.; Izgorodina, E. I. Macromol. Rapid Commun. 2006, 27 , 473–497. (i) Izgorodina, E. I.; Coote, M. L. Macromol. Theory Simul . 2006, 15 , 394–403. (j) Lin, C. Y.; Coote, M. L. Aust. J Chem. 2009, 62 , 1479–1483. (a) Leroy, G.; Dewispelaere, J.-P.; Benkadour, H.; Wilante, C. Macromol. Theory Simul . 1996, 5 , 269–289. (b) Heuts, J. P. A.; Gilbert, R. G.; Radom, L. J. Phys. Chem. 1996, 100 , 18997–19006. (c) Huang, D. M.; Monteiro, M. J.; Gilbert, R. G. Macromolecules 1998, 31 , 5175–5187. (d) Toh, J. S.-S.; Huang, D. M.; Lovell, P. A.; Gilbert, R. G. Polymer 2001, 42 , 1915–1920. (e) Filley, J.; McKinnon, J. T.; Wu, D. T.; Ko, G. H. Macromolecules 2002, 35 , 3731–3738. (f) Zhan, C.G.; Dixon, D. A. J. Phys. Chem. A 2002, 106 , 10311–10325. (g) Thickett, S. C.; Gilbert, R. G. Polymer 2004, 45 , 6993–6999. (h) Van Cauter, K.; Hemelsoet, K.; Van Speybroeck, V.; Reyniers, M. F.; Waroquier, M. Int. J. Quantum Chem. 2004,

470

13.

14.

15.

16.

17.

18. 19. 20. 21. 22. 23.

FREE-RADICAL POLYMERIZATION

102 , 454–460. (i) Salman, S.; Albayrak, A. Z.; Avci, D.; Aviyente, V. J. Polym. Sci. A 2005, 43 , 2574–2583. (j) G¨unaydin, H.; Salman, S.; T¨uz¨un, N. S.; Avci, D.; Aviyente, V. Int. J. Quantum Chem. 2005, 103 , 176–189. (k) Van Cauter, K.; Van Speybroeck, V.; Vansteenkiste, P.; Reyniers, M.-F.; Waroquier, M. ChemPhysChem 2006, 7 , 131–140. (l) Degirmenci, I.; Avci, D.; Aviyente, V.; Van Cauter, K.; Van Speybroeck, V.; Waroquier, M. Macromolecules 2007, 40 , 9590–9602. (a) Purmova, J.; Pauwels, K. F. D.; van Zoelen, W.; Vorenkamp, E. J.; Schouten, A. J.; Coote, M. L. Macromolecules 2005, 38, 6352–6366. (b) Van Cauter, K.; Van Den Bossche, B. j.; Van Speybroeck, V.; Waroquier, M. Macromolecules 2007, 40 , 1321–1331. (c) Purmov´a, J.; Pauwels, K. F. D; Agostini, M.; Bruinsma, M.; Vorenkamp, E. J.; Schouten, A. J.; Coote, M. L. Macromolecules 2008, 41 , 5527–5539. (a) Heuts, J. P. A.; Sudarko; Gilbert, R. G. Macromol. Symp. 1996, 111 , 147–157. (b) Heuts, J. P. A.; Gilbert, R. G.; Maxwell, I. A. Macromolecules 1997, 30 , 726–736. (c) Coote, M. L.; Davis, T. P.; Radom, L. Theochem 1999, 461–462 , 91–96. (d) Coote, M. L.; Davis, T. P.; Radom, L. Macromolecules 1999, 32 , 5270–5276. (e) Coote, M. L.; Davis, T. P.; Radom, L. Macromolecules 1999, 32 , 2935–2940. (f) Cieplak, P.; Kaim, A. J. Polym. Sci. A 2004, 42 , 1557–1565. Barner-Kowollik, C. W.; Coote, M. L.; Davis, T. P.; Stenzel, M. H.; Theis, A. Polymerization agent, International Patent WO2006122344 A1, 2006. http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=WO2006122344&F=0. (a) Ah Toy, A.; Chaffey-Millar, H.; Davis, T. P.; Stenzel, M. H.; Izgorodina, E. I.; Coote, M. L.; Barner-Kowollik, C. Chem. Commun. 2006, 835–837. (b) ChaffeyMillar, H.; Izgorodina, E. I.; Barner-Kowollik, C.; Coote, M. L. J. Chem. Theory Comput. 2006, 2 , 1632–1645. (a) Hodgson, J. L.; Coote, M. L. Macromolecules 2005, 38 , 8902. (b) Coote, M. L.; Hodgson, J. L.; Krenske, E. H.; Namazian, M.; Wild, S. B. Aust. J. Chem. 2007, 60 , 744–753. Coote, M. L.; Izgorodina, E. I.; Krenske, E. H.; Busch, M.; Barner-Kowollik, C. Macromol. Rapid Commun. 2006, 27 , 1015–1022. McLeary, J. B.; Calitz, F. M.; McKenzie, J. M.; Tonge, M. P.; Sanderson, R. D.; Klumperman, B. Macromolecules 2004, 37 , 2382–2394. Coote, M. L. Macromol. Theory Simul . 2009, 18 , 388–400. See, e.g., Heuts, J. P. A.; Russell, G. T. Eur. Polym. J . 2006, 42 , 3–20. Coote, M. L.; Davis, T. P. Prog. Polym. Sci . 1999, 24 , 1217–1251. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A., Jr.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N. P.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M. N.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B. A.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith,

REFERENCES

24.

25.

26.

27.

28. 29.

471

T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A. Gaussian 03, Revision B.03 , Gaussian Inc., Pittsburgh, PA, 2003. Werner, H.-J.; Knowles, P. J.; Lindh, R.; Manby, F. R.; Sch¨utz, M.; Celani, P.; Korona, T.; Rauhut, G.; Amos, R. D.; Bernhardsson, A.; Berning, A.; Cooper, D. L.; Deegan, M. J. O.; Dobbyn, A. J.; Eckert, F.; Hampel, C.; Hetzer, G.; Lloyd, A. W.; McNicholas, S. J.; Meyer, W.; Mura, M. E.; Nicklass, A.; Palmieri, P.; Pitzer, R.; Schumann, U.; Stoll, H.; Stone, A. J.; Tarroni, R.; Thorsteinsson, T. MOLPRO, Version 2006.1 , a package of ab initio programs, http://www.molpro.net. Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S. J.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J. Comput. Chem. 1993, 14 , 1347. Shao, Y.; Molnar, L. F.; Jung, Y.; Kussmann, J.; Ochsenfeld, C.; Brown, S. T.; Gilbert, A. T. B.; Slipchenko, L. V.; Levchenko, S. V.; O’Neill, D. P.; DiStasio, R. A.; Lochan, R. C.; Wang, T.; Beran, G. J. O.; Besley, N. A.; Herbert, J. M.; Lin, C. Y.; Van Voorhis, T.; Chien, S. H.; Sodt, A.; Steele, R. P.; Rassolov, V. A.; Maslen, P. E.; Korambath, P. P.; Adamson, R. D.; Austin, B.; Baker, J.; Byrd, E. F. C.; Dachsel, H.; Doerksen, R. J.; Dreuw, A.; Dunietz, B. D.; Dutoi, A. D.; Furlani, T. R.; Gwaltney, S. R.; Heyden, A.; Hirata, S.; Hsu, C. P.; Kedziora, G.; Khalliulin, R. Z.; Klunzinger, P.; Lee, A. M.; Lee, M. S.; Liang, W.; Lotan, I.; Nair, N.; Peters, B.; Proynov, E. I.; Pieniazek, P. A.; Rhee, Y. M.; Ritchie, J.; Rosta, E.; Sherrill, C. D.; Simmonett, A. C.; Subotnik, J. E.; Woodcock, H. L.; Zhang, W.; Bell, A. T.; Chakraborty, A. K.; Chipman, D. M.; Keil, F. J.; Warshel, A.; Hehre, W. J.; Schaefer, H. F.; Kong, J.; Krylov, A. I.; Gill, P. M. W.; Head-Gordon, M. Phys. Chem. Chem. Phys. 2006, 8 , 3172. Bylaska, E. J.; de Jong, W. A.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Valiev, M.; Wang, D.; Apra, E.; Windus, T. L.; Hammond, J.; Nichols, P.; Hirata, S.; Hackler, M. T.; Zhao, Y.; Fan, P.-D.; Harrison, R. J.; Dupuis, M.; Smith, D. M. A.; Nieplocha, J.; Tipparaju, V.; Krishnan, M.; Wu, Q.; Voorhis, T. V.; Auer, A. A.; Nooijen, M.; Brown, E.; Cisneros, G.; Fann, G. I.; Fruchtl, H.; Garza, J.; Hirao, K.; Kendall, R.; Nichols, J. A.; Tsemekhman, K.; Wolinski, K.; Anchell, J.; Bernholdt, D.; Borowski, P.; Clark, T.; Clerc, D.; Dachsel, H.; Deegan, M.; Dyall, K.; Elwood, D.; Glendening, E.; Gutowski, M.; Hess, A.; Jaffe, J.; Johnson, B.; Ju, J.; Kobayashi, R.; Kutteh, R.; Lin, Z.; Littlefield, R.; Long, X.; Meng, B.; Nakajima, T.; Niu, S.; Pollack, L.; Rosing, M.; Sandrone, G.; Stave, M.; Taylor, H.; Thomas, G.; von Lenthe, J.; Wong, A.; Zhang, Z. NWChem: A Computational Chemistry Package for Parallel Computers, Version 5.1 , Pacific Northwest National Laboratory, Richland, WA, 2007. Velde, G. T.; Bickelhaupt, F. M.; Baerends, E. J.; Guerra, C. F.; Van Gisbergen, S. J. A.; Snijders, J. G.; Ziegler, T. J. Comput. Chem. 2001, 22 , 931. Stanton, J. F.; Gauss, J.; Perera, S. A.; Watts, J. D.; Yau, A. D.; Nooijen, M.; Oliphant, N.; Szalay, P. G.; Lauderdale, W. J.; Gwaltney, S. R.; Beck, S.; Balkov´a, A.; Bernholdt, D. E.; Baeck, K. K.; Rozyczko, P.; Sekino, H.; Huber, C.; Pittner, J.; Cencek, W.; Taylor, D.; Bartlett, R. J. ACES II is a program product of the Quantum Theory Project, University of Florida. Integral packages included are VMOL (J. Alml¨of and P. R. Taylor); VPROPS (P. Taylor); ABA-CUS (T. Helgaker, H. J. Aa. Jensen, P. Jørgensen, J. Olsen, and P. R. Taylor); HONDO/GAMESS (M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. J. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, J. A. Montgomery).

472

FREE-RADICAL POLYMERIZATION

30. (a) Choi, C. C.; Kertesz, M.; Karpfen, A. Chem. Phys. Lett. 1997, 276 , 266. (b) Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Pople, J. A. J. Chem. Phys. 2000, 112 , 7374. (c) Woodcock, H. L.; Schaefer, H. F., III; Schreiner, P. R. J. Phys. Chem. A 2002, 106 , 11923. (d) Izgorodina, E. I.; Coote, M. L.; Radom, L. J. Phys. Chem. A 2005, 109 , 7558. (e) Check C. E.; Gilbert, T. M. J. Org. Chem. 2005, 70 , 9828. (f) Izgorodina, E. I.; Coote, M. L. J. Phys. Chem. A 2006, 110 , 2486. (g) Grimme, S. Angew. Chem. Int. Ed . 2006, 45 , 4460. (h) Schreiner, P. R.; Fokin, A. A.; Pascal, R. A., Jr.; de Meijere, A. Org. Lett. 2006, 8 , 3635. (i) Wodrich, M. D.; Corminbæf, C.; von Ragu´e Schleyer, P. Org. Lett. 2006, 8 , 3631. (j) Wodrich, M. D.; Corminbæf, C.; Schreiner, P. R.; Fokin, A. A.; von Ragu´e Schleyer, P. Org. Lett. 2007, 9 , 1851. (k) Grimme, S.; Steinmetz, M.; Korth, M. J. Chem. Theory Comput . 2007, 3 , 42. (l) Schreiner, P. R. Angew. Chem. Int. Ed . 2007, 46 , 4217. (m) Izgorodina, E. I.; Brittain, D. R. B.; Hodgson, J. L.; Krenske, E. H.; Lin, C. Y.; Namazian, M.; Coote, M. L. J. Phys. Chem. A 2007, 111 , 10754. (n) Brittain, D. R. B; Lin, C. Y.; Gilbert, A. T. B.; Izgorodina, E. I.; Gill, P. M. W.; Coote, M. L. Phys. Chem. Chem. Phys. 2009, 11 , 1138–1142. 31. Buback, M.; Hippler, H.; Schweer, J.; Vogele, H.-P. Makromol. Chem. Rapid Commun. 1986, 7 , 261–265. 32. (a) Kajiwara, A.; Kamachi, M. Macromol. Chem. Phys. 2000, 201 , 2165–2169. (b) Burnett, G. M.; Wright, W. W. Proc. R. Soc. (Lond .) A 1954, 211 , 41. 33. For a review of the early work in this field, see Fischer, H.; Radom, L. Angew. Chem. Int. Ed . 2001, 40 , 1340–1371. 34. For more recent studies, see, e.g., (a) Henry, D. J.; Parkinson, C. J.; Mayer, P. M.; Radom, L. J. Phys. Chem. A 2001, 105 , 6750. (b) Coote, M. L.; Wood, G. P. F.; Radom, L. J. Phys. Chem. A 2002, 106 , 12124–12138. (c) Coote, M. L. J. Phys. Chem. A 2004, 108 , 3865–3872. (d) G´omez-Balderas, R.; Coote, M. L.; Henry, D. J.; Radom, L. J. Phys. Chem. A 2004, 108 , 2874–2883. (e) Lin, C. Y.; Hodgson, J. L.; Namazian, M.; Coote, M. L. J. Phys. Chem. A 2009, 113 , 3690–3697. 35. Malick, D. K.; Petersson, G. A.; Montgomery, J. A. J. Chem. Phys. 1998, 108 , 5704. 36. Scott, A. P.; Radom, L. J. Phys. Chem. 1996, 100 , 16502. 37. (a) Pople, J. A.; Head-Gordon, M.; Fox, D. J.; Raghavachari, K.; Curtiss, L. A. J. Chem. Phys. 1989, 90 , 5622. (b) Curtiss, L. A.; Jones, C.; Trucks, G. W.; Raghavachari, K.; Pople, J. A. J. Chem. Phys. 1990, 93 , 2537. (c) Curtiss, L. A.; Raghavachari, K.; Trucks, G. W.; Pople, J. A. J. Chem. Phys. 1991, 94 , 7221. (d) Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Rassolov, V.; Pople, J. A. J. Chem. Phys. 1998, 109 , 7764. (e) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K.; Pople, J. A. J. Chem. Phys. 2001, 114 , 108. (f) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. J. Chem. Phys. 2007, 126 , 084108. 38. Henry, D. J.; Sullivan, M. B.; Radom, L. J. Chem. Phys. 2003, 118 , 4849. 39. Montgomery, J. A.; Frisch, M. J.; Ochterski, J. W.; Petersson, G. A. J. Chem. Phys. 1999, 110 , 2822. 40. Martin, J. M. L.; Parthiban, S. In Quantum Mechanical Prediction of Thermochemical Data, Cioslowski, J., Ed. Kluwer-Academic, Dordrecht, The Netherlands, 2001, pp. 31–65. 41. (a) Vreven, T.; Morokuma, K. J. Chem. Phys. 1999, 111 , 8799–8803. (b) Vreven, T.; Morokuma, K. J. Comput. Chem. 2000, 21 , 1419–1432. 42. Lipton, M.; Still, W. C. J. Comput. Chem. 1988, 9 , 343–355.

REFERENCES

473

43. Izgorodina, E. I.; Lin, C. Y.; Coote, M. L. Phys. Chem. Chem. Phys. 2007, 9 , 2507–2516. 44. (a) Kirkpatrick, S.; Gelatt, C. D., Jr.; Vecchi, M. P. Science 1983, 220 , 671. (b) Wilson, S. R.; Cui, W.; Moskowitz, J. W.; Schmidt, K. E. Tetrahedron Lett. 1988, 4343. 45. (a) Gibson, K. D.; Scheraga, H. A. J. Comput. Chem. 1987, 8 , 826. (b) Pincus, M. R.; Klausner, R. D.; Scheraga, H. A. Proc. Natl. Acad. Sci. USA 1982, 79 , 5107. (c) Hingerty, B. E.; Figueroa, S.; Hayden, T. L.; Broyde, S. Biopolymers 1989, 28 , 1195. 46. Malick, D. K.; Petersson, G. A.; Montgomery, J. A. J. Chem. Phys. 1998, 108 , 5704–5713. 47. (a) Knyazev, V. D.; Slagle, I. R. J. Phys. Chem. 1996, 100 , 16899–16911. (b) Knyazev, V. D.; Bencsura, A.; Stoliarov, S. I.; Slagle, I. R. J. Phys. Chem. 1996, 100 , 11346–11354. 48. Schwartz, M.; Marshall, P.; Berry, R. J.; Ehlers, C. J.; Petersson, G. A. J. Phys. Chem. A 1998, 102 , 10074–10081. 49. Coote, M. L.; Collins, M. A.; Radom, L. Mol. Phys. 2003, 101 , 1329–1338. 50. Coote, M. L. In Encyclopaedia of Polymer Science and Technology, 3rd ed., Vol. 9, Kroschwitz, J. I., Ed., Wiley, Hoboken, NJ, 2004, pp. 319–371. 51. See, e.g., (a) Benson, S. W. Thermochemical Kinetics, Wiley, New York, 1976. (b) McQuarrie, D. A. Statistical Mechanics, Harper & Row, New York, 1976. (c) Gilbert, R. G.; Smith, S. C. Theory of Unimolecular and Recombination Reactions, Blackwell Scientific, Oxford, UK, 1990. (d) Steinfeld, J. I.; Francisco, J. S.; Hase, W. L. Chemical Kinetics and Dynamics, 2nd ed., Prentice Hall, Englewood Cliffs, NJ, 1999. (e) Atkins, P. W. Physical Chemistry, 6th ed., W.H. Freeman, San Francisco, 2000. 52. Eyring, H. J. Chem. Phys. 1935, 3 , 107. 53. For a more detailed definition of this term, see, e.g., Karas, A. J.; Gilbert, R. G.; Collins, M. A. Chem. Phys. Lett. 1992, 193 , 181–184. 54. Skodje, R. T.; Truhlar, D. G.; Garrett, B. C. J. Phys. Chem., 1981, 85 , 3019. 55. Garrett, B. C.; Truhlar, D. G.; Wagner, A. F.; Dunning, T. H., Jr. J. Chem. Phys. 1983 78 , 4400. 56. Liu, Y. P.; Lu, D. H.; Gonzalez-Lafont, A.; Truhlar, D. G.; Garrett, B. C. J. Am. Chem. Soc. 1993, 115 , 7806. 57. Corchado, J. C.; Chuang, Y.-Y.; Fast, P. L.; Vill`a, J.; Hu, W.-P.; Liu, Y.-P.; Lynch, G. C.; Nguyen, K. A.; Jackels, C. F.; Melissas, V. S.; Lynch, B. J.; Rossi, I.; Coiti˜no, E. L.; Fernandez-Ramos, A.; Pu, J.; Albu, T. V.; Steckler, R.; Garrett, B. C.; Isaacson, A. D.; Truhlar, D. G. POLYRATE 9.1 , University of Minnesota, Minneapolis, MN, 2002, http://comp.chem.umn.edu/polyrate/. 58. (a) Kuppermann, A.; Truhlar, D. G. J. Am. Chem. Soc. 1971, 93 , 1840. (b) Garrett, B. C.; Truhlar, D. G.; Grev, R. S.; Magnuson, A. W. J. Phys. Chem. 1980, 84 , 1730. 59. Bell, R. P. The Tunnel Effect in Chemistry, Chapman & Hall, New York, 1980. 60. Eckart, C. Phys. Rev . 1930, 35 , 1303. 61. See, e.g., Vansteenkiste, P.; Van Neck, D.; Van Speybroeck, V.; Waroquier, M. J. Chem. Phys. 2006, 124 , 044314. 62. Lin, C. Y.; Izgorodina, E. I.; Coote, M. L. J. Phys. Chem. A 2008, 112 , 1956–1964. 63. East, A. L. L.; Radom, L. J. Chem. Phys. 1997, 106 , 6655.

474

FREE-RADICAL POLYMERIZATION

64. (a) Pitzer, K. S.; Gwinn, W. D. J. Chem. Phys. 1942, 10 , 428–440. (b) Pitzer, K. S. J. Chem. Phys. 1946, 14 , 239–243. (c) Li, J. C. M.; Pitzer, K. S. J. Phys. Chem. 1956, 60 , 466–474. (d) Kilpatrick, K. E.; Pitzer, K. S. J. Chem. Phys. 1949, 17 , 1064–1075. 65. Ellingson, B. A.; Lynch, V. A.; Mielke, S. L.; Truhlar, D. G. J. Chem. Phys. 2006, 125 , 084305. 66. Ayala, P. Y.; Schlegel, H. B. J. Chem. Phys. 1998, 108 , 7560. 67. Coote, M. L.; Davis, T. P.; Klumperman, B.; Monteiro, M. J. J. Macromol. Sci. Rev. Macromol. Chem. Phys. 1998, C38, 567–593. 68. Degirmenci, I.; Aviyente, V.; Van Speybroeck, V.; Waroquier, M. Macromolecules 2009, 42 , 3033–3041. 69. Morrison, D. A.; Davis, T. P. Macromol. Chem. Phys. 2000, 201 , 2128–2137. 70. Tomasi, J. Theor. Chem. Acc. 2004, 112 , 184. 71. (a) Klamt, A.; Schueuermann, G. J. Chem. Soc. Perkin Trans. 2 1993, 799. (b) Cossi, M.; Rega, N.; Scalmani, G.; Barone, V. J. Comput. Chem. 2003, 24 , 669. 72. Miertus, S.; Scrocco, E.; Tomasi, J. J. Chem. Phys. 1981, 55 , 117. 73. (a) Klamt, A. J. Phys. Chem. 1995, 99 , 2224. (b) Klamt, A. COSMO-RS: From Quantum Chemistry to Fluid Phase Thermodynamics and Drug Design, Elsevier Science, Amsterdam, 2005. (c) Klamt, A.; Jonas, V.; Burger, T.; Lohrenz, J. C. W. J. Phys. Chem. A 1998, 102 , 5074. 74. Kelly, C. P.; Cramer, C. J.; Truhlar, D. G. J. Chem. Theory Comput. 2005, 1 , 1133. 75. See, e.g., Takano, Y.; Houk, K. N. J. Chem. Theory Comput. 2005, 1 , 70–77. 76. Beuermann, S.; Buback, M.; Hesse, P.; Kuchta, F.-D.; Lacik, I.; Van Herk, A. M. Pure Appl. Chem. 2007, 79 (8), 1463–1469. 77. See, e.g., (a) Namazian, M.; Coote, M. L. J. Phys. Chem. A 2007, 111 , 7227–7232. (b) Hodgson, J. L.; Namazian, M.; Bottle, S. E.; Coote, M. L. J. Phys. Chem. A 2007, 111 , 13595–13605. (c) Namazian, M.; Zare, H. R.; Coote, M. L. Biophys. Chem. 2008, 132 , 64–68. (d) Namazian, M.; Siahrostami, S.; Coote, M. L. J. Fluorine Chem. 2008, 129 , 222–225. (e) Blinco, J. P.; Hodgson, J. L.; Morrow, B. J.; Walker, J. R.; Will, G. D.; Coote, M. L.; Bottle, S. E. J. Org. Chem. 2008, 73 , 6763–6771. (f) Zare, H.; Eslami, M.; Namazian, M.; Coote, M. L. J. Phys. Chem. B 2009, 113 , 8080–8085. 78. See, e.g., Ho, J.; Coote, M. L. J. Chem. Theory Comput. 2009, 5 , 295–306. 79. Levy, R. M.; Kitchen, D. B.; Blair, J. T.; Krogh-Jespersen, K. J. Phys. Chem. 1990, 94 , 4470–4476. 80. Pliego, J. R., Jr.; Riveros, J. M. J. Phys. Chem. A 2001, 105 , 7241–7247.

14

Evaluation of Nonlinear Optical Properties of Large Conjugated Molecular Systems by Long-Range-Corrected Density Functional Theory HIDEO SEKINO and AKIHIDE MIYAZAKI Toyohashi University of Technology, Toyohashi, Japan

JONG-WON SONG and KIMIHIKO HIRAO Advanced Science Institute, RIKEN, Saitama, Japan

Advantages and problems of quantum chemical methods for nonlinear optical (NLO) property evaluation are discussed. Density functional theory (DFT) is the best quantum chemical tool for quantitative evaluation of the property of NLO materials that have no absorption in the response frequency region. We introduce a practical DFT method with long-range correction (LC) for the purpose. We discuss a strategy for realistic evaluation of large conjugated systems, finding sufficient the classical hypothesis that only the π-electron system needs to be considered in conjugated molecules. The errors arising from this approximation are much smaller than those caused by a deficiency in traditional DFT functionals. We examine the LC-DFT method further by comparison of the length dependence between polyyne and polyene. From a comparison with rigorous ab initio correlated methods, we conclude that the LC-DFT method can calculate NLO properties successfully without a catastrophic overestimation of the conventional DFT functionals and can provide basic information for systematic fabrication of new organic NLO materials.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

475

476

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

14.1 INTRODUCTION

The nonlinear optical (NLO) response of materials under intense optical electromagnetic field is an important, yet challenging subject in theoretical and computational materials science that can arise through a variety of processes. However, it is evident that the nonlinear electronic response plays the most significant role, making essential rigorous quantum chemical calculations for molecular systems. While the importance of contributions from vibrational processes in determining the hyperpolarizabilities of small conjugated molecules has recently been highlighted,1 the pure high-order electronic response is paramount, especially for the evaluation of hyperpolarizabilities of large conjugated systems. Quantum chemical methods are the most reliable methods to use to quantitatively describe the electronic response in molecules. Although no exact analytical solution is available for the Schr¨odinger equation of many-electron systems, advances in quantum chemical theories and computational technologies have pushed the methods to the stage where near-equilibrium molecular electronic states can be described to chemical precision. When the intensity of the incident light field is low, the electronic response arises from states whose parity difference corresponds to that of a single photon. For high-intensity incident light fields, however, more than two photons arrive at the material within a short time and interact simultaneously with the electron. The process that describes such situations must therefore involve states whose parity differences correspond to those of multiphotons, and many higher-lying states are accessed. This nonlinear optical response involves a complex analytical formalism, and several practiced methodologies have been developed based on the energy or dipole response properties. We can further adapt these methods to consider the system as being initially in a molecular single state, typically the ground state. However, to describe the electronic response of extended materials quantitatively, we need knowledge of this initial state in the presence of the light field. Therefore, care must be taken to introduce extra flexibility into calculations to allow for this effect. Large delocalized electronic systems are good candidates for NLO materials because they contain many low-lying states that can temporarily be occupied by electrons, perhaps introducing charge-transfer character to the ground state. The nonlinear response of electrons to external fields is often described using such states as intermediates. Therefore, the computational requirements for describing the NLO processes are much more demanding than those for computing just the total energy of the system. Although ab initio correlated methods have been quite successful in providing chemical descriptions of molecules, they are not feasible at present for the evaluation of nonlinear response properties of large systems. Density functional theory (DFT) methods have been shown capable of reproducing and predicting a variety of chemical properties, such as atomization energies, bond lengths, and vibrational frequencies, while requiring much less computational effort than do rigorous ab initio correlated methods.2 Despite their manifold successes in predicting a wide range of chemical properties, DFT has been found to give poor results for some properties, including weakly bound systems and

INTRODUCTION

477

charge-transfer systems, as well as for the electronic response in large conjugated systems.3 The latter aspect is the subject that we discuss in this chapter, demonstrating how these problems can be overcome to yield effective and practical computational methods for the NLO properties of materials. Traditionally, DFT catastrophically overestimates the rate of increase in the polarizability of a long molecule as its length increases.4,5 The well-known deficiency in evaluating polarizabilities comes from inadequacies in the conventional exchange functional used in DFT. Conventional exchange correlation functionals are local and cannot represent correctly the response of the electrons at long distance. The effects are modest in small molecular systems but become nonnegligible in large molecules. Conventional exchange functionals thus fail to evaluate correctly such properties as the polarizability and hyperpolarizability of large molecules. The gradient correction for nonlocality that is commonly applied through the generalized-gradient approximation (GGA) is ineffective in relieving the problem, which instead needs to be solved as a manybody interaction involving different energy levels. Conventional hybrid methods such as B3LYP6 do not improve the situation either, making the search for new functionals a key focus. A variety of approaches have been developed. The optimized effective potential (OEP) method has been advanced as a solution that seems to provide useful results,7 at least when it is implemented appropriately.8 Unfortunately, the OEP method is rather complicated in that an extra equation must be solved to obtain the optimized potential,9 and this equation is also technically difficult to solve. Care must be taken in the choice of appropriate auxiliary basis functions to properly represent the extra equation with in particular the use of large basis sets leading to a deterioration of the solution. Other methods include the Krieger–Li–Iafrate (KLI) approximation10 and the common-energy-denominator approximation (CEDA)11 for large-molecule applications. Unfortunately, these approximations adversely influence calculated response properties even when the ground-state energy is well represented.12 The current density functional theory (CDFT)13,14 provides another alternative for the evaluation of NLO response properties. It predicts reasonable polarizabilities and hyperpolarizabilities15,16 for long molecules (except for hydrogen chains). There has also been a study on the optical properties of molecules using a many-body fxc kernel that yielded good polarizabilities and optical spectra.17 Although such approaches provide deep insights into the origin and evaluation of the NLO properties, their implementation is also rather complicated. Heavier computational demand also makes these methods less accessible for the large molecules that appear in nano or bio systems. Recently, we introduced a simple hybrid method with long-range correction (LC) using an Ewald partitioning technique on the electron repulsion operator to account for the nonlocal effect of long-distance interactions.18,19 The use of this method to evaluate the hyperpolarizabilities of long conjugated systems has been successful.20 – 24 In this chapter we explain briefly the basic theory for the evaluation of molecular hyperpolarizabilities and describe the LC-DFT method. We also discuss the classic π-electron-only hypothesis and its validity for

478

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

conjugated systems in the context of NLO property evaluation. Finally, we discuss the effects of different types of conjugations. 14.2 NONLINEAR OPTICAL RESPONSE THEORY

The response of the electrons in matter to applied optical fields can be measured in terms of the energy W (E) of the electronic systems as a function of the electric field E caused by the incident light: W (E) = W0 + W (1) E + W (2) E 2 + W (3) E 3 + · · ·

(14.1)

Here, W (n) is the nth-order energy in the expansion with respect to the applied field. W (1) is related to permanent dipole moment, and W (2) , W (3) , . . . are related to the linear and nonlinear polarizabilities, respectively. The total energy of the electronic state in equilibrium with the optical field is well defined and can be computed by solving the time-independent Schr¨odinger equation. In most approximation schemes employed in quantum chemistry, the solutions are upper bounded and well behaved as the level of theory and basis set are improved. An alternative finite-field expression based on the analogous expansion of the induced dipole, μ(E) = μ0 + αE + β

E3 E2 +γ + ··· 2 2·3

(14.2)

is also in common use. Here, the zeroth order term, μ0 , is the permanent dipole, while α is the linear polarizability and β, γ, . . . are the hyperpolarizabilities. The advantage of the latter approach is that the dipole moment is a physical observable that can be compared directly with experimental results. However, computation of the induced dipole generally involves more elaborate computations. The direction of the induced moment does not necessarily coincide with that of the applied field, and therefore the expansion coefficients (α, β, γ, . . .) are, in fact, tensors. The key observables, the macroscopic polarization projected against the molecular orientation vectors, are obtained from the ensemble average of the microscopic polarization tensors over the time scale of resolution for the experiments. The shape of the mobile electron cloud is intimately related to the polarization tensor. While all the tensor components are needed, in principle, to evaluate the macroscopic polarization, many NLO materials consist of molecules whose dimension is enlarged in one direction, and thus the corresponding components of the tensors dominant. Since we focus on such a case, that of linearly prolonged conjugated systems, we are concerned primarily with the absolute values of the longitudinal component of α, β, γ, . . . in the expansion above. To achieve intense electric fields, optical laser beams of specific frequency ω are used. This is modeled using the frequency-dependent Hamiltonian Hint (ω) = μ · 12 (e+iωt + e−iωt )E

(14.3)

479

NONLINEAR OPTICAL RESPONSE THEORY

The induced moment is observed at the frequency of the corresponding NLO process. For example, the induced moment from second-harmonic generation (SHG) is observed at the doubled frequency 2ω, that from third harmonic generation (THG) is observed at tripled frequency 3ω, and so on. Therefore, the expressions with only static electric field E, such as in Eqs. (14.1) and (14.2), are inappropriate for specific NLO process and need to be enhanced as μ(E) = μ0 + α0 E0 + α(−ω; ω)Eω eiωt + β0

E02 E2 + β(−2ω; ω, ω) ω e2iωt + β(−ω; ω, 0)E0 Eω eiωt + · · · 2 2

+ γ0

E03 E3 E2 + γ(−3ω; ω, ω, ω) ω e3iωt + γ(−2ω; ω, ω, 0) ω E0 e2iωt 2·3 2·3 2

+ γ(−ω; ω, ω, −ω)

Eω2 E−ω eiωt + · · · 2

(14.4)

Typically, the frequency-dependent expansion coefficients, α(−ω; ω), β(−2ω; ω, ω), β(−ω; ω, 0), γ(−3ω; ω, ω, ω), γ(−2ω; ω, ω, 0), γ(−ω; ω, ω, −ω), . . . are formulated in the sum-over-states (SOS) representation using time-dependent perturbation theory as α(−ω; ω) = 2P−ω,ω

n|μ|kk|Hint (ω)|n kn − ω

(14.5a)

k

β(−ωσ ; ω1 , ω2 ) = 3K(−ωσ ; ω1 , ω2 )P−σ,1,2 ·

n|μ|ll|H int (ω2 )|kk|Hint (ω1 )|n (ln − ωσ )(kn − ω1 ) k,l

(14.5b) γ(−ωσ ; ω1 , ω2 , ω3 ) = 4K(−ωσ ; ω1 , ω2 , ω3 )P−σ,1,2,3 n|μ|mm|H int (ω3 )|ll|H int (ω2 )|kk|Hint (ω1 )|n · k,l,m (mn − ωσ )(ln − ω1 − ω2 )(kn − ω1 ) ⎤ n|μ|ll|Hint (ω3 )|nn|Hint (ω2 )|kk|Hint (ω1 )|n ⎦ − (14.5c) (ln − ωσ )(ln − ω1 )(kn + ω2 ) k,l

Here, P−σ,1,2,3,... denotes the average of all terms generated by simultaneous while corresponding operators, permutations at frequencies ωσ , ω1 , ω2 , ω3 , . . . , means a summation of all μ, H (ω1 ), H (ω2 ), H (ω3 ), . . . and the notation states except the initial state n. Here, kl = ωk − ωl − 12 ikl is defined by the energy difference of states k and l corrected by a radiative damping factor, a complex number that plays an important role in resonant situations. Also, K(−ωσ ; ω1 , ω2 ), K(−ωσ ; ω1 , ω2 , ω3 ), . . . are the numerical prefactors that depend on the NLO process of interest. The prefactors typically are established so as to provide a consistent identical hyperpolarizability value at zero-frequency limit in the expressions corresponding to different NLO processes. However, care must be taken when the theoretical values thus evaluated are compared with experimental values, since ensemble averaging of microscopic tensor

480

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

components contributes to the observable differently for each experimental setting. As is seen in the equations above, the properties are expressed as a summation of products of transition moments between ground and intermediate excited states, divided by energy denominators. The latter are the energy difference between the electronic states shifted by multiples of the applied frequency, ω, 2ω, 3ω, and so on, in the nonresonant situation. For the resonant-frequency region, the damping factors kl play an important role in evaluating the lifetime and lineshape, but in this chapter we are concerned primarily with NLO properties in the nonresonant region where no significant absorption occurs. However, it should be noted that the effects of dispersion on nonlinear process are enhanced compared to those in a linear process because of the multipliers in the denominators. When the energy difference between the electronic states (excitation energy) approaches the doubled or tripled frequency of the applied field, the dispersion becomes nontrivial. While the SOS representation provides a wealth of information concerning the NLO process of interest, it involves the infinite number of intermediate states whose evaluation is impossible in practice. Unfortunately, truncation of the intermediate states is not a successful strategy because the expansion is poorly convergent.25 Of course, it is possible to compute the dynamic properties by directly solving the perturbed equation of appropriate order and the corresponding NLO process at a given frequency, and the frequency-dependent NLO property has been evaluated by the time-dependent coupled Hartree–Fock (TDCHF) method.26 LC-DFT implementation of such an algorithm for NLO property evaluation is in progress.20 We here compute hyperpolarizabilities of long conjugated molecules at zero frequency in order to evaluate their dependence on the length of the molecule. In the zero-frequency limit, we can use finite-field techniques based on Eq. (14.1) and therefore almost all quantum chemical methods can be employed. While a property evaluated in the zero-frequency limit may be quite different from that observed at the specific frequency in a certain kind of experimental setting, this approach provides much information concerning NLO materials. We explain our hybrid DFT method developed recently, introducing a range-dependent partition of the Coulomb force known as the range separation hybrid (RSH) scheme. 14.3 LONG-RANGE-CORRECTED DENSITY FUNCTIONAL THEORY

As explained in the introduction, DFT is the most appropriate quantum chemical method for large-molecular systems such as long conjugated molecules but suffers from a few pertinent problems. To correct for the long-range deficiencies of traditional exchange functionals, a partitioning technique is introduced. Following to original idea of Savin,27 we partition the Coulomb force into short- and longrange parts using the error function

LONG-RANGE-CORRECTED DENSITY FUNCTIONAL THEORY

1 − erf(μr12 ) erf(μr12 ) 1 = + rij r12 r12

481

(14.6)

where μ is a parameter that determines the ratio of the partition. The shortrange exchange energy Exsr is computed by modifying the usual exchange energy expression from Ex = − 12 σ into Exsr = −

1 2 σ

4/3

ρσ Kσ d 3 R

(14.7)

√ 1 8 K π erf (b − c ) dr a ρ4/3 1 − + 2a σ σ σ σ σ σ 3 2aσ (14.8)

where aσ , bσ , and cσ are 1/2

μKσ aσ = √ 1/3 6 πρσ 1 bσ = exp − 2 − 1 4aσ cσ = 2aσ2 bσ +

1 2

(14.9) (14.10) (14.11)

and Kσ is called the enhancement factor. The use of Kσ allows the modification of GGA functionals. The long-range part of the exchange energy Exlr is evaluated using Hartree–Fock (HF) exchange integrals as Exlr = −

occ occ i

and

(ij |j i)lr

erf(μr12 ) ψr ψs (pq|rs) = ψp ψq r12 lr

(14.12)

j

(14.13)

where ψiσ is the ith molecular orbital (MO). In contrast to density partitioning schemes such as B3LYP, the proportion of the nonlocal HF contribution varies according to the range of the interaction in the present LC scheme. The ratio of the nonlocal HF part to the local DFT part becomes larger at greater distances, thus including the nonlocal effect more efficiently. In all the DFT calculations using the LC scheme, Becke’s exchange and one-parameter functional (BOP) is used with a parameter of μ = 0.4728 (except for one example discussed in Section 14.4.1), and all the calculations are performed using the development version of GAUSSIAN03.29

482

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

14.4 EVALUATION OF HYPERPOLARIZABILITY FOR LONG CONJUGATED SYSTEMS 14.4.1 Examination of the Classical Hypothesis: The Role of π-Conjugation in Determining NLO Properties

Chemists have categorized conjugated molecules quite differently from other hydrocarbons because of their distinguished reactivities and spectroscopic properties; it is apparent that these molecules have enhanced responses to light irradiation. The reason for this sensitivity has been attributed to their mobile π-electrons, which can move freely through the conjugation pathway in the system. Since the early theoretical development of spectroscopic quantum chemistry, it has been recognized that it is a good approximation to ignore the effects of the many σ-electrons in a large molecule and focus purely on the contribution from the much fewer π-electrons, providing an enormous computational simplicity. With modern software and hardware, such ideas have seemed to become obsolete. However, as is evident from the SOS representation of NLO properties given above, π-electrons do play a role of paramount importance in the nonlinear optical process. An important question is therefore the quantitative reliability of the π-electron approximation in practical NLO applications. We show in Table 14.1 the longitudinal polarizabilities of polyenes with different lengths evaluated using the π-electron approximation together with those obtained using all electrons. All properties are evaluated by a finite-field method using energies computed by the HF, BLYP, B3LYP, and LC-BLYP (μ = 0.33)19 methods as a function of the applied field, but for the π-electron approximation, the finite field is applied only on the π-space of the Hamiltonian. There is found a systematic difference in the evaluated absolute value of the polarizabilities with, in particular, the π-electron approximation significantly underestimating the property. This comes from the omission of the σ-electron response, with the error increasing as the size of the system increases. However, the neglected contribution does not increase TABLE 14.1 Longitudinal Polarizabilities α (a.u.) of Polyenes Computed by the HF, BLYP, B3LYP, and LC-BLYP Methods Using the 6-31G Basis Set

Ethylene Total π only Butadiene Total π only C20 H22 Total π only

HF

BLYP

B3LYP

LC-BLYP

33.66 21.94

30.90 17.31

26.84 18.27

31.06 17.96

80.91 63.33

78.09 58.81

70.32 59.63

75.08 55.15

1328 1225

2046 1995

1609 1548

1253 1147

EVALUATION OF HYPERPOLARIZABILITY

483

with length as much as the contribution from the π-electron part. Consequently, for the longer polyenes, the relative error of the approximation becomes more acceptable. Indeed, for the longer molecules, the variation in the computed value with computational methods significantly exceeds the error introduced by the π-electron approximation. It is interesting to note that even the error caused by crude representation of the space using STO-3G (921 for total and 858 for π-only compared with 1253 and 1147 of 6-31G LC-BLYP) seems to be similar or even less than the one from a deficiency of conventional DFT functional (1633 for total and 1603 for π-only compared with 2046 and 1995 of 6-31G BLYP) for C20 H22 . In Table 14.2 we summarize the longitudinal hyperpolarizabilities of C20 H22 . For this molecule, the π-electron approximation results in an overestimation, indicating a more complicated mechanism for this NLO process than for the linear response process, even in the interplay between σ- and π-electrons responding to the applied field. The error in the π-electron approximation remains less than the variation with computational methods, however. 14.4.2 Double- and Triple-Bonded Systems

We calculate the hyperpolarizabilities (γ) of polyyne and polyene to examine the NLO properties of different conjugated systems using DFT, HF, and ab initio electron correlation methods such as M¨uller–Plesset MP2, MP3, MP4(SDQ) theory30,31 and coupled-cluster CCSD, and CCSD(T) theory.32,33 For the geometries of the polyynes H—(C≡C)n —H, a single (C—C) bond length of ˚ and triple (C≡C) bond length of 1.2050 A ˚ are used, taken from 1.3650 A the averaged experimental values obtained from x-ray diffraction data of the i-Pr3 Si—(C≡C)n —Sii-Pr3 (n = 4, 5, 6, and 8) molecules.34 For the polyenes H—(HC=CH)n —H, we used the geometries obtained from B3LYP/6-311G geometry optimizations.4 In all calculations,the cc-pVDZ basis set35 is used. Hyperpolarizabilities γ are computed by the finite-field (FF) method using Eq. (14.1) by numerical Romberg iteration.36 Figure 14.1 and Table 14.3 show, respectively, the γ-values of polyynes obtained using DFTand several wavefunction methods. As reported by other researchers,3 the pure functional [BOP (B88x exchange37 and the one-parameter progressive correlation functional38 )] and the hybrid functional, B3LYP,39,40 which do not have long-range correction, overestimate γ-values. The tendency becomes more enhanced as the chain length, n, increases. The LC-DFT (LC-BOP) functional provides γ-values reasonably close to those from the TABLE 14.2 Longitudinal Second Hyperpolarizabilities γ (107 a.u.) of C20 H22 Computed Using the 6-31G Basis Seta

Total π only a The

HF

BLYP

B3LYP

LC-BLYP

2.0 (2.0) 2.3

5.8 (5.6) 6.6

5.6 (5.5) 6.4

2.8 (3.1) 3.2

values in parentheses were obtained using cc-pVDZ.

484

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

Fig. 14.1 (color online) H—(C≡C)n —H.

Longitudinal second hyperpolarizabilities (γ) of the polyynes,

TABLE 14.3 Calculated Longitudinal Second Hyperpolarizabilities (γ/a.u.) of the Polyynes, H—(C ≡ C)n —Ha γ

1(×102 ) 2(×103 ) 3(×104 ) 4(×105 ) 5(×105 ) 6(×106 ) 7(×106 ) 8(×106 )

BOP B3LYP LC-BOP HF MP2 MP3 MP4 CCSD CCSD(T) a The

6.04 5.11 4.96 2.42 6.10 5.84 5.32 5.74 5.11

6.84 7.11 8.32 6.63 10.8 9.16 9.44 9.91 10.2

4.86 5.03 5.42 4.20 6.86 5.50 5.77 5.82 6.45

2.14 2.14 1.96 1.50 2.56 1.94 2.07 2.03 2.32

7.13 6.79 5.40 3.87 6.97 5.00 5.39 5.27 6.29

2.04 1.76 1.20 0.81 1.53 1.04 1.14 1.10 1.36

4.56 3.91 2.28 1.42 2.84 1.87 2.05 1.95 2.51

9.91 7.74 3.86 2.31 4.81 3.03 3.36 3.16 4.17

numbers in the first row are the unit number n.

CCSD and CCSD(T). On the other hand, HF shows the lowest value and MP2 shows the highestvalue among the wavefunction methods. Figure 14.2 and Table 14.4 show, respectively, the γ-values of polyenes obtained with the DFT and wavefunction methods. Although the complete set of γ-values as a function of chain length n is not presented, key features can be identified. Similar to the results obtained for the polyynes, MP2 predicts the highest and HF the lowest values among the wavefunction methods. The conventional functionals (BOP and B3LYP) also predict large values, while the LC-DFT (LCBOP) functional again predicts γ-values surprisingly close to those from CCSD and CCSD(T). On the other hand, MP2 predicts the largest γ-values for the entire range of the polyynes and the polyenes in all the methods, except for conventional DFT methods which present gradual divergence of hyperpolarizabilities as the chain numbers are larger.

EVALUATION OF HYPERPOLARIZABILITY

485

Fig. 14.2 (color online) Longitudinal second hyperpolarizabilities (γ) of the polyenes, H—(HC=CH)n —H. TABLE 14.4 Longitudinal Second Hyperpolarizabilities (γ/a.u.) of the Polyenes, H—(HC=CH)n —Ha γ

2(×104 ) 3(×105 ) 4(×105 ) 5(×106 ) 6(×106 ) 7(×106 ) 8(×107 ) 9(×107 ) 10(×107 )

BOP B3LYP LC-BOP HF MP2 MP3 MP4 CCSD CCSD(T) a The

0.31 1.07 1.09 0.75 4.40 2.19 1.81 1.52 1.35

0.77 0.83 1.05 0.68 1.69 1.62 1.49 1.25 1.16

3.43 3.71 4.01 2.82 7.88 6.16 5.74 4.54 4.16

1.12 1.21 1.16 0.85 1.82 1.82 1.72 1.38 1.32

3.06 3.25 2.67 2.04 4.30 3.95 3.73 2.88 2.80

7.27 7.54 5.34 4.14 8.84 7.12 6.86 5.47 5.45

1.58 1.58 0.93 0.75 1.55 1.27 1.21 0.90 0.93

3.11 3.04 1.50 1.27 2.55

5.75 5.50 2.28 1.97 3.93

numbers in the first row are unit number n.

On moving from CCSD to CCSD(T), the γ-values of the polyynes change significantly, suggesting that even the CCSD(T) hyperpolarizabilities are not converged with respect to the inclusion of correlation effects (see Fig. 14.3). The calculation of γ for polyynes appears to be a challenging case problem for conventional correlated methods.41 – 44 On the other hand, the differences between the hyperpolarizabilities calculated for the polyenes by CCSD and CCSD(T) are small, perhaps suggesting that the values for the polyenes are nearly converged. Although direct comparison to the experimental values of the absolute values evaluated theoretically should be the final goal for theorists, it is well known that the absolute value of third-order hyperpolarizabilities in the condensed phase is strongly pronounced through intermolecular interactions.43 Some of those effects can be taken conveniently in local field correction, which assumes a continuous medium, but the large deviation of absolute molecular hyperpolarizability values

486

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

Fig. 14.3 (color online) Variation in the longitudinal second hyperpolarizabilities (γ) of the polyenes, H—(HC=CH)n —H (n = 6, 7, and 8), and the polyynes, H—(C≡C)n —H (n = 6, 7, and 8) with the calculation method used: HF, MP2, MP3, MP4, CCSD, and CCSD(T).

computed by rigorous ab initio methods from the third-order NLO coefficients observed for those systems suggests that the intermolecular interaction effect in those systems is paramount under the experimental settings, and it is clear that the much more sophisticated and/or computationally demanding methods must be used for taking the effects. We introduce here an argument using a powerlaw function to calibrate the length effect on the molecular property. To identify the length dependence of the molecular properties, simple power-law functions provide a useful tool. We fit the calculated γ-values for polyynes n = 1 to 8 and polyenes n = 2 to 10 to the power-law function, γ = bnc , and the results are given in Table 14.5 [only n = 1 to 8 are used for MP2, MP3, MP4, CCSD, and CCSD(T)]. For both the polyynes and polyenes, the exponents c calculated by the pure and hybrid functionals exceed 5 and are very much larger than those obtained using wavefunction methods. This is consistent with the fact that the γ-values for large molecules calculated using conventional DFT are overestimated4,5,21 ; hence, these methods cannot provide reliable information on the length dependence of NLO properties. On the other hand, the exponents evaluated using LC-DFT are rather close to those from CCSD(T) for the polyynes. It is notable that the HF exponent for the polyenes is larger than that from other wavefunction methods, whereas that for the polyynes is smaller. The hyperpolarizability exponent c observed for the polyynes, 4.3,34 is higher than that for the polyenes, 2.3 to 3.6.45,46 Contrary to the experimental findings, all values computed for the polyenes exceed those for the polyynes. It is well known that for a reliable comparison with experiment, vibrational NLO effects should be considered. To estimate these contributions for the polyenes and polyynes, we use RHF/6-31G calculated values44 for the ratio of

487

79 (±11) 176 (±4) 778 (±79) 971 (±99) 1086 (±93) 1196 (±125) 1186 (±117) 1314 (±140) 1142 (±119)

b

γ 5.64 (±0.066) 5.14 (±0.011) 4.09 (±0.050) 3.74 (±0.050) 4.04 (±0.042) 3.77 (±0.052) 3.82 (±0.049) 3.75 (±0.053) 3.95 (±0.052)

c

Polyyne

145 (±20) 171 (±6) 620 (±49) 882 (±123) 937 (±89) 968 (±85) 980 (±93) 1132 (±148) 917 (±82)

b

γvib b 5.29 (±0.071) 5.13 (±0.018) 4.19 (±0.042) 3.78 (±0.070) 4.09 (±0.050) 3.85 (±0.051) 3.90 (±0.051) 3.84 (±0.052) 4.04 (±0.047)

c 92 (±3) 142 (±4) 1812 (±67) 881 (±86) 2397 (±198) 2158c (±282) 2052c (±201) 2345c (±361) 1697c (±225)

b

γ 5.80 (±0.013) 5.59 (±0.013) 4.10 (±0.041) 4.35 (±0.043) 4.22 (±0.037) 4.17c (±0.065) 4.17c (±0.048) 3.97c (±0.076) 4.14c (±0.066)

c

Polyene

b For

102 (±12) 163 (±18) 2271 (±109) 1129 (±40) 2994 (±214) 3711c (±658) 3655c (±468) 4081c (±569) 2950c (±382)

b

γvib 5.89 (±0.052) 5.66 (±0.049) 4.14 (±0.021) 4.38 (±0.015) 4.25 (±0.032) 4.06c (±0.088) 4.04c (±0.063) 3.85c (±0.069) 4.02c (±0.064)

c

c Values of the γ Power Law (γ = bnc ) for the Polyynes [H—(C ≡ C)n —H] and the Polyenes [H—(HC=CH)n —H]a

values in parentheses are estimates of the fitting error in each method. The cc-pVDZ basis set is used in all calculations. polyyne, we included data only for n = 1 to 7 as Ref. 44 does not give values for n = 8. c For polyene, MP3, MP4, CCSD, and CCSD(T) data are used only for n = 1 to 8.

a The

CCSD(T)

CCSD

MP4

MP3

MP2

HF

LC-BOP

B3LYP

BOP

TABLE 14.5

488

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

vibrational γ-values (γvib ) and the electronic γ-values. In Table 14.5 we also present the calculated exponents c and its prefactor b for the polyynes and polyenes calculated using this correction. While more sophisticated vibrational correction are typically required,1,41 their explicit determination remains impractical for large-molecular systems. The vibrational corrections used here look not to change the exponents so much, but it is noticeable that in the CCSD(T) and LC-DFT methods, which are thought to predict hyperpolarizability values more reliably than other methods do, the exponents for polyynes become slightly larger than those of polyenes. This shows us that the vibrational corrections can be a key to explaining the reason that the hyperpolarizability exponent observed for the polyynes is higher than that for the polyenes. We expect that more sophisticated vibrational correction will be able to address this problem. Besides the vibrational effect, considering the geometries of the polyenes, we also notice that the molecules used in the hyperpolarizability measurements have varying conformations,46,47 whereas all-trans —C C— conformations are used in the calculations.48 Further, the geometries of the polyenes used in the calculations were optimized using B3LYP, a method that underestimates bondlength alternation and hence is expected to overestimate hyperpolarizabilities.6,21 For the polyynes, only one experimental configuration is possible, but various end-group cappings were used in the experiments.45 Another possible explanation for the difference between the length dependences of the hyperpolarizabilities observed and those calculated is that the longitudinal second hyperpolarizability, γzzzz , is calculated, whereas the experimental values refer to the isotropic second hyperpolarizability γ.33 Finally, we must keep in mind that the experimental γvalues are also affected by solvent effects that can significantly alter the energies of excited charge-transfer states, effects absent in the calculations.49 14.5 CONCLUSIONS

We revisited the basic response theory for NLO property evaluation of materials using time-dependent perturbation theory to present a basic strategy for the theoretical investigation of NLO materials. Although the SOS representation is intuitive and may be useful for predicting the behavior of NLO properties in the vicinity of a resonance, it is not practical for the nonresonant situations important for the NLO materials of interest. Direct evaluation of dynamic NLO properties by solving the perturbed equation at the frequency of the applied field also involves considerable computational effort. Finite-field studies of the static hyperpolarizability can provide reliable information about the NLO materials far from resonance; they are limited, however, in that they cannot provide information relating to the specific NLO process with the frequency of the applied oscillating field. Because of the deficiencies in conventional DFT functionals, these methods are not applicable to NLO studies of large conjugated molecules. We introduce a practical method that incorporates long-range corrections into conventional DFT methods. It is based on the simple idea of range-dependent partitioning of the Coulomb interaction. We find that this method provides a

REFERENCES

489

qualitatively correct description of the NLO properties of large molecules without requiring prohibitive computational effort. We investigate further the validity of the π-electron approximation. This approach is found inadequate for an evaluation of the response properties of small molecules, but for larger systems the dominant terms are properly included so that the error diminishes in relative magnitude. Indeed, the error from this approximation becomes much smaller than the variation in the results associated with the choice of the computational method. These results provide an optimistic perspective for the theoretical prediction of the properties of NLO materials, since this approximation considerably reduces the computational resources required. We further investigated the influence of different types of π-conjugation on NLO properties by contrasting polyynes with polyenes. For both systems, LCDFT gives γ-values close to those predicted by CCSD and CCSD(T), whereas conventional DFT methods such as BOP, as well as hybrid DFT methods such as B3LYP, considerably overestimate the response. MP2 predicts the highest and HF predicts the lowest γ-values among all the wavefunction methods tested. The CCSD and CCSD(T) methods predict similar hyperpolarizabilities for the polyenes but not for the polyynes, indicating that electron correlation may not be described properly in the dense π-electron polyynes. For the exponential scale factor c (from the fit γ = bnc ), LC-DFT also predicts results similar to those of CCSD(T). The theoretical prediction that hyperpolarizabilities increase much faster with increasing length for polyenes compared to polyynes is inconsistent with experimental observations, however. This could arise from the differences in the chemical structures considered, solvent effects, or the approximation that the diagonal hyperpolarizability component dominates the values observed. Even though the vibrational effect considered here shows a small influence on the γvalue and γ scaling factor, more sophisticated vibrational effects may correct the theoretical inconsistency with the experimental observations. Acknowledgments

J.-W.S. is indebted to the postdoctoral fellowship for a foreign researcher of the Japan Society for the Promotion of Science (JSPS). H.S. is grateful for support from the Next Generation Supercomputer Project, Nanoscience Program, MEXT, Japan. REFERENCES 1. Torrent-Sucarrat, M.; Sola, M.; Duran, M.; Luis, M. J.; Kirtman, B. J. Chem. Phys. 2004, 120 , 6346. 2. Koch W.; Holthausen, M. C. A Chemist’s Guide to Density Functional Theory, WileyVCH, New York, 2000. 3. (a) Reimers, J. R.; Cai, Z.-L.; Bili´c, A.; Hush, N. S. Ann. N.Y. Acad. Sci . 2003, 1006 , 235. (b) Cai, Z.-L.; Sendt, K.; Reimers, J. R. J. Chem. Phys. 2002, 117 , 5543.

490

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

´ A.; Jaquemin, D.; van Gisbergen, S. J. A.; Baerends, 4. Champagne, B.; Perp`ete, E. E.-J.; Soubra-Ghaoui, C.; Robins, K. A.; Kirtman, B. J. Phys. Chem. A 2000, 104 , 4755. ´ A.; van Gisbergen, S. J. A.; Baerends, E.-J.; Snijders, 5. Champagne, B.; Perp`ete, E. J. G.; Soubra-Ghaoui, C.; Robins, K. A.; Kirtman, B. J. Phys. Chem. A 1998, 109 , 10489. 6. Stevens, P. J.; Devlin, J. F.; Chabalowski, C. F.; Frisch, M. J. J. Phys. Chem. 1994, 98 , 11623. 7. Sahni, V.; Gruenebaum, J.; Perdew, J. P. Phys. Rev . 1982, B26, 4371. 8. Mori-S´anchez, P.; Wu, Q.; Yang, W. J. Chem. Phys. 2003, 119 , 11001. 9. (a) Kummel, S.; Perdew, J. P. Phys. Rev. B 2003, 68 , 035103. (b) Kummel, S.; Perdew, J. P. Phys. Rev. Lett. 2003, 90 , 043004. 10. Krieger, J. B.; Li, Y.; Iafrate, G. J. Phys. Rev . 1992, A46, 5453. 11. Gritsenko, O. V.; Baerends, E. J. Phys. Rev . 2001, A64, 042506. 12. K¨ummel, S.; Kronik, L.; Perdew, J. P. Phys. Rev. Lett. 2004, 93 , 213002. 13. van Faassen, M.; de Boeij, P. L.; van Leeuwen, R.; Berger, J. A.; Snijders, J. G. J. Chem. Phys. 2003, 118 , 1044. 14. van Faassen, M.; Jensen, L.; Berger, J. A.; de Boeij, P. L. Chem. Phys. Lett. 2004, 395 , 274. 15. van Faassen, M.; de Boeij, P. L.; van Leeuwen, R.; Berger, J. A.; Snijders, J. G. Phys. Rev. Lett. 2002, 88 , 186401. 16. van Faassen, M. Int. J. Mod. Phys. 2006, B20, 3419. 17. Marini, A.; Del Sole, R.; Rubio, A. In Time-Dependent Density Functional Theory, Lecture Notes in Physics, Vol. 706, Marques, M. A. L., Ullrich, C. A., Nogueira, F., Rubio, A., Burke, K., and Gross, E. K. U., Eds., Springer-Verlag, Berlin, 2006, Chap. 20. 18. Iikura, H.; Tsuneda, T.; Yanai, T.; Hirao, K. J. Chem. Phys. 2001, 115 , 3540. 19. Tawada, Y.; Tsuneda, T.; Yanagisawa, S.; Yanai, T.; Hirao, K. J. Chem. Phys. 2004, 120 , 8425. 20. Kamiya, M.; Sekino, H.; Tsuneda, T.; Hirao, K. J. Chem. Phys. 2005, 122 , 234111. 21. Sekino, H.; Maeda, Y.; Kamiya, M.; Hirao, K. J. Chem. Phys. 2007, 126 , 014107. 22. Kirtman, B.; Bonness, S.; Ramirez-Solis, A.; Champagne, B.; Matsumoto, H.; Sekino, H. J. Chem. Phys. 2008, 128 , 114108. 23. Song, J.-W.; Watson, M. A.; Sekino, H.; Hirao, K. J. Chem. Phys. 2008, 129 , 024117. 24. Song, J.-W.; Watson, M. A.; Sekino, H.; Hirao, K. Int. J. Quantum Chem. 2009, 109 , 2012. 25. Sekino, H.; Bartlett, R. J. Theoretical and Computational Modeling of NLO and Electronic Materials, Karna, S. P., and Yeates, A. T., Eds., ACS Symposium Series, 1994, pp. 79–101. 26. Sekino, H.; Bartlett, R. J. J. Chem. Phys. 1986, 85 , 976. 27. Savin, A. In Recent Developments and Applications of Modern Density Functional Theory, Seminario, J. J., Ed., Elsevier, Amsterdam, 1996, Chap. 9. 28. Song, J.-W.; Hirosawa, T.; Tsuneda, T.; Hirao, K. J. Chem. Phys. 2007, 126 , 154105. 29. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; et al. Gaussian 03, Revision D.02 , Gaussian Inc., Wallingford CT, 2004.

REFERENCES

491

30. Sekino, H.; Maeda, Y.; Kamiya, M. Mol. Phys. (Bartlett Special Issue) 2005, 103 , 2183. 31. M¨uller, C; Plesset, M. S. Phys. Rev . 1934, 46 , 0618. 32. Bartlett, R. J; Purvis, G. D., III. Int. J. Quantum Chem. 1978, 14 , 561. 33. Pople, J. A.; Krishnan, R.; Schlegel, H. B; Binkley, J. S. Int. J. Quantum Chem. 1978, 14 , 545. 34. Eisler, S.; Slepkov, A. D.; Elliott, E.; Luu, T.; McDonald, R.; Hegmann, F. A.; Tykwinski, R. R. J. Am. Chem. Soc. 2005, 127 , 2666. 35. Dunning, T. H., Jr. J. Chem. Phys. 1989, 90 , 1007. 36. Jaquemin, D.; Champagne, B.; Andr´e, J.-M. Int. J. Quantum Chem. 1997, 65 , 679. 37. Becke, A. D. Phys. Rev. A 1988, 38 , 3098. 38. Tsuneda, T.; Suzumura, T.; Hirao, K. J. Chem. Phys. 1999, 110 , 10664. 39. Lee, C.; Yang, W.; Parr, R. G. Phys. Rev. B 1988, 37 , 785. 40. Becke, A. D. J. Chem. Phys. 1993, 98 , 5648. 41. Torrent-Sucarrat, M.; Sol´a, M.; Duran, M.; Luis, J. M.; Kirtman, B. J. Chem. Phys. 2003, 118 , 711. 42. Toto, J. L.; Toto, T. T.; de Melo, C. P. Chem. Phys. Lett. 1996, 104 , 8586. 43. Bredas, J. L.; Adant, C.; Tackx, P.; Persoons, A.; Pierce, B. M. Chem. Rev . 1994, 94 , 243. 44. Kirtman, B.; Champagne, B. Int. Rev. Phys. Chem. 1997, 16 , 389. 45. Luu, T.; Elliott, E.; Slepkov, A. D.; Eisler, S.; McDonald, R.; Hegmann, F. A.; Tykwinski, R. R. Org. Lett. 2005, 7 , 51. 46. Samuel, I. D. W.; Ledoux, I.; Dhenaut, C.; Zyss, J.; Fox, H. H.; Schrock, R. R.; Silbey, R. J. Science 1994, 265 , 1070. 47. Craig, G. S. W.; Cohen, R. E.; Schrock, R. R.; Silbey, R. J.; Puccetti, G.; Ledoux, I.; Zyss, J. J. Am. Chem. Soc. 1993, 115 , 860. 48. Rossi, G.; Chance, R. R.; Silbey, R. J. Chem. Phys. 1989, 90 , 7594. 49. Ray, P. C. Chem. Phys. Lett. 2004, 395 , 269.

15

Calculating the Raman and HyperRaman Spectra of Large Molecules and Molecules Interacting with Nanoparticles NICHOLAS VALLEY Northwestern University, Evanston, Illinois

LASSE JENSEN Pennsylvania State University, University Park, Pennsylvania

JOCHEN AUTSCHBACH University at Buffalo–SUNY, Buffalo, New York

GEORGE C. SCHATZ Northwestern University, Evanston, Illinois

This chapter describes calculations of the Raman and hyperRaman spectra of large molecules and molecules interacting with nanoparticles using time-dependent density functional theory with the Amsterdam density functional (ADF) program package. The ADF code uses Slater basis functions, which provides a very efficient basis set for optical property calculations using density functional theory (DFT). In addition, ADF has special capabilities for determining resonant Raman spectra, which is enabled by the inclusion of excited-state lifetimes in the calculations, and therefore polarizabilities and polarizability derivatives for wavelengths close to resonance can be determined. Specific details of the theory are described, and examples of applications to pyridine (for nonresonant properties) and uracil (for resonant properties) are provided.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

493

494

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

15.1 INTRODUCTION

Raman spectroscopy is an inelastic linear light-scattering method that provides a vibrational fingerprint of a molecule. This fingerprint can be used to identify molecules, so there has been increasing interest in using Raman in analytical chemistry applications and medical diagnostics,1 – 6 particularly with the development of lasers and detectors which allow Raman measurements to be made over a wide range of wavelengths from the near-infrared to the ultraviolet (UV). HyperRaman spectroscopy is an analogous optical technique that involves inelastic light scattering relative to the second harmonic of the incident light, so this nonlinear optics technique also provides a vibrational fingerprint, but for an incident frequency which is only half of the frequency needed to produce the same scattered photon as in Raman scattering.7 – 10 In addition, the selection rules for hyperRaman are different from those for Raman, as the latter involves two photons (incident and scattered) while the former involves three photons (two incident plus one scattered). This means that vibrations that are silent in Raman become active in hyperRaman. Both techniques are intrinsically weak processes, however both can be amplified by placing the molecule next to a silver or gold nanoparticle, as plasmon excitation in the particle can produce enhanced electromagnetic fields near the particle surface, leading to surface-enhanced Raman and hyperRaman spectroscopies (SERS and SEHRS, respectively).11,12 In addition, enhancement can also arise if the molecule has a resonant electronic state at the excitation wavelength, leading to resonance Raman and resonance hyperRaman spectroscopy. Under favorable conditions it is possible to combine resonance and surface enhancement effects, leading to surface-enhanced resonance Raman spectroscopy (SERRS) and surface-enhanced resonance hyperRaman spectroscopy (SERHRS).13,14 Raman intensities are proportional to the square of the derivative of the polarizability of the molecule with respect to vibrational normal coordinates,15 so the calculation of Raman intensities requires a determination of the frequencydependent polarizabilities, usually by determining the first-order response of the molecule to the applied electromagnetic field. Many electronic structure codes have the ability to produce Raman spectra in the static limit (low frequency) through analytical determination of the polarizability derivative. This works well for small molecules that do not have important electronic transitions in the visible. However, for larger systems, especially for molecules with transitions at optical frequencies, or for molecules interacting with metal particles (as in SERS), this approximation is not appropriate. In this chapter we describe calculations of Raman intensities based on the Amsterdam density functional (ADF) code,16 – 18 a code specifically developed to determine response properties using time-dependent density functional theory (TDDFT). The basics of density functional theory (DFT) and TDDFT are described in detail in Chapter 1. ADF and a recently developed local version of ADF have some unique features for calculating Raman, resonance Raman, and SERS intensities at finite frequencies.19 – 21 ADF can also determine hyperRaman intensities, but

INTRODUCTION

495

in an automated fashion only in the static limit at this point. The capability of calculating dynamic hyperpolarizabilities is available22 – 24 and will soon be combined with near-resonance damping functionality. In either case, ADF provides an efficient approach to studying large-molecular systems due to the use of Slater orbital basis functions in the calculations. These functions mimic the slow fall-off of atomic orbitals, a property that is especially important for response properties, much better than do Gaussian orbitals. Hence, they provide a more efficient representation of the change in density that arises in response to an applied electromagnetic field. As such, ADF enables the determination of Raman intensities for a number of challenging problems,25 including studies of the resonance Raman scattering for molecules with multiple excited states,20 and the study of SERS intensities for molecules interacting with silver and gold metal clusters.26,27 In all these SERS calculations, the atoms in the molecule and in the metal cluster are described using basis sets of comparable quality and the same density functional [the same combination of exchange–correlation (XC) potential and XC response kernel]. This has the advantage of providing a completely balanced electronic structure description of the entire system, but a limitation with this approach is that the calculations are restricted to a total system size on the order of 100 to 200 atoms. To go beyond this requires methods that partition between components of the system that are described with quantum mechanics and components described using classical electrodynamics. The formal theory of such calculations was recently developed28 but has not yet been implemented. The Raman intensity calculation begins with a determination of the harmonic frequencies and normal coordinates of vibration for the molecule of interest by using density functional theory to calculate the Hessian matrix (second derivative of the energy with respect to the nuclear positions). Diagonalization of the mass-weighted Hessian determines the vibrational frequencies, and the eigenvectors define the normal coordinates. Subsequently, the polarizabilities (second derivative of energy with respect to applied finite field) are determined from TDDFT. For the Raman intensity, the polarizability calculations are performed for geometries that are displaced from equilibrium so that the derivatives of the polarizability with respect to each normal mode vibration can be calculated by finite differencing. Both normal Raman differential cross sections and relative surface-enhanced Raman intensities can be calculated from combinations of the polarizability derivatives. This approach can also be expanded to allow for the calculation of resonance Raman spectra. HyperRaman and surface-enhanced Raman spectra can also be calculated using ADF. While the use of finite differencing may seem to be inefficient relative to the analytical evaluation of the polarizability derivatives, for large molecules one often does not want or need derivatives with respect to all the modes. Indeed, for applications in SERS, where the system of interest is a molecule plus a large metal cluster, only a small fraction of the possible modes, those referring to vibrations of the molecule, is of interest, and in any case the finite-difference procedure is trivially parallelized.

496

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

In the following sections we describe the underlying theory of Raman/ hyperRaman intensity calculations, and specific details of these calculations based on the ADF code. 15.2 DISPLACEMENT OF COORDINATES ALONG NORMAL MODES

To construct the Raman or hyperRaman spectrum of a computationally large molecule, it is first necessary to calculate the vibrational normal modes of an optimized local minimum structure. Details of these calculations, which involve diagonalization of the mass-weighted Hessian matrix (second derivative of energy with respect to atomic coordinates), are described in standard textbooks, so we omit the steps here. The Raman intensity is easily calculated by making the double harmonic approximation. This approximation is composed of two parts.29 First, each vibration is assumed to be described by a harmonic potential (i.e., a linear expression for the intramolecular forces). Second, the dipole moment function μ(r) is assumed to vary linearly with the normal mode coordinate r in the region where r is close to the equilibrium structure denoted by re . In ADF it is possible to calculate the energies (in wavenumbers, cm−1 ) and Cartesian displacements (in bohr) of the vibrational normal modes of a molecule by using the FREQUENCIES keyword under the GEOMETRY block. To calculate the polarizability derivatives, the components of the polarizability tensor are calculated at two structures that have been displaced in different directions along a vibrational mode. Starting with the equilibrium geometry, the coordinates Req,i of each atom are changed by a small amount ±sR Rk,i where Rk,i is the Cartesian displacement of the ith coordinate in the kth vibrational normal mode and sR is the step size. Ideally, sR should be mode specific, such that a more shallow potential (low harmonic frequency) should be treated with a somewhat larger displacement.30 The sR should be chosen so that the norm (root of the sum of squares) of sR Rk,i for each k is on the order of a few hundredths of a bohr.31 If the sR is too large, the double harmonic approximation breaks down, while if it is too small, there will not be an appreciable change in the polarizability tensor. Both cases will lead to errors in the polarizability derivatives and thus the calculated Raman intensities. Once a suitable sR has been chosen, the equilibrium coordinates are displaced to obtain two sets of coordinates. The set created by using Req,i − sR Rk,i will be denoted as the minus structure, and those created by Req,i + sR Rk,i will be denoted as the plus structure. Polarizability derivatives are then calculated by finite differencing. 15.3 CALCULATION OF POLARIZABILITIES USING TDDFT

Polarizabilities can be calculated using time-dependent DFT (TDDFT) response theory. In the ADF program, this functionality can be reached by specifying

CALCULATION OF POLARIZABILITIES USING TDDFT

497

the input “block” keyword RESPONSE or AORESPONSE [conveniently, also via the graphical user interface (GUI)]. ADF input files consist of a list of keywords (e.g., BASIS, ATOMS, GEOMETRY) which provide the program with specifics of the chemical system (e.g., charge, atomic positions), type of calculation desired (e.g., geometry optimization, Hessian matrix diagonalization), and specifics of the calculation (e.g., basis set, level of theory). Many keywords have specific options that can be enumerated on lines following the keyword forming a block which is ended with the line END. For more details on using ADF, refer to the documentation, including a user guide and input examples, available at http://www.scm.com. The RESPONSE keyword triggers the original implementation of TDDFT response theory by van Gisbergen et al.,16,22 which is capable of using symmetry. AORESPONSE triggers a more recently developed code32,33 that offers additional functionality, such as the near-resonance dynamic response capability,19,34 or enhanced analysis features,23,24 but lacks symmetry. Both blocks allow calculation of frequency-dependent polarizabilities, but the AORESPONSE block is needed to calculate the resonance Raman spectra. For the examples in this chapter the RESPONSE key was used to calculate hyperpolarizabilities from which hyperRaman spectra can be predicted. In our explanations of how to calculate Raman spectra, use of the AORESPONSE block will be assumed. Any specifics for calculating hyperRaman spectra will assume use of the RESPONSE block. In an upcoming version of the program the hyperRaman and resonance hyperRaman functionality will be combined with the AORESPONSE functionality. An example of an input (more example inputs can be found in the supporting information) to calculate the static polarizability tensor of a displaced structure of pyridine using the AORESPONSE block is as follows: BASIS C /share/apps/adf2007.01/atomicdata/ET/DIFFUSE/ET-QZ3P-polar/C H /share/apps/adf2007.01/atomicdata/ET/DIFFUSE/ET-QZ3P-polar/H N /share/apps/adf2007.01/atomicdata/ET/DIFFUSE/ET-QZ3P-polar/N END DEPENDENCY XC model SAOP END ATOMS N C C C C C

0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

0.000002 -0.000002 1.197479 -1.197480 1.141525 -1.141523

0.043787 2.855245 2.143113 2.143111 0.748759 0.748756

498 H H H H H END

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

0.000000 0.000000 0.000000 0.000000 0.000000

SYMMETRY

2.064563 -2.064560 2.159648 -2.159650 -0.000003

0.165373 0.165368 2.654667 2.654661 3.944999

NOSYM

AORESPONSE ALDA END END INPUT

The real parts of the nine Cartesian components of the frequency-dependent polarizability tensor of an input structure are calculated if the AORESPONSE block is included in the input. When an external potential νext,i (r, t) = Eri cos ωt is applied to a molecule, the components of the polarizability, αij (ω), can be determined from the change in the electron density using (15.1) αij (ω) = − d 3 r ρi (1) (r, ω)rj where i and j are the Cartesian directions and ρi (1) (r, ω) is the linear change in density (linear response) due to the external potential.35 In TDDFT, the density change is found using the linear response function of the noninteracting Kohn–Sham system χs (r, r , ω) and the linear change in the effective potential ν(1) eff (r, ω) with the relation35 (1) ρ (r, ω) = dr χs (r, r , ω)ν(1) (15.2) eff (r , ω) where for the potential given above, the external field part of the linear perturbation operator (ν(1) ext below) is obtained through division by the field amplitude. In the absence of finite-lifetime or other damping terms, the expression for the Kohn–Sham response function, constructed from the occupied and virtual Kohn–Sham orbitals (φ), energies (ε), and occupation numbers (n), is35 χs (r, r , ω) =

occ. virt.

ni φi (r)φm (r)φm (r )φi (r )

m

i

1 1 + × (εi − εm ) + ω (εi − εm ) − ω

(15.3)

499

CALCULATION OF POLARIZABILITIES USING TDDFT

When adopting the finite lifetime damping technique, the frequencies are formally substituted for by ω → ω + iγ, where γ is a common damping parameter, and thus the response function as well as the linear density response become complex. This allows calculation of both the real and imaginary parts of the polarizability. The change in effective potential is35 ν(1) eff (r , ω)

=

ν(1) ext (r , ω)

+

ρ(1) (r , ω) dr |r − r |

+

dr fxc (r, r , ω)ρ(1) (r , ω) (15.4)

and contains terms for the external field as noted above, the linear response of the Coulomb potential, and the linear response of the exchange-correlation potential; fxc is called the exchange-correlation kernel. The change in the effective potential is constructed in such a way that it will result in the correct change in density for the fully interacting system even though the noninteracting response function is being used, assuming that one would know the exact expression for the XC kernel. Of course, in practice, this is the term that gets approximated. In most cases an adiabatic approximation is used (i.e., one uses a frequency-independent fxc , which neglects all memory effects). With the adiabatic approximation, XC kernels can be obtained simply by taking functional derivatives of the XC potential used for the ground-state calculation, based on popular functionals such as VWN, LYP, BP86, B3LYP, and PBE0. It is particularly efficient to use an XC kernel based on a local-density approximation (LDA) such as the VWN or Xalpha functional (ALDA keyword in AORESPONSE block, default in RESPONSE). Used in the examples, the adiabatic LDA (ALDA) exchange correlation kernel fxc is local in space and time.35 With a hybrid functional the kernel contains some nonlocal Hartree–Fock exchange. An implementation based on ADF’s Slater-type basis and density-fitting approach has been reported by Ye et al.23 The last two terms in the expression for the change in effective potential are dependent on the change in the density. Calculation of the density change must therefore be done in a self-consistent manner. The initial density change is cal(1) culated using ν(1) eff = νext . Then the new effective field is determined using the updated density change. A new density change is calculated using the new effective field and the cycle continues until the change in the density change is below a set threshold. As in other self-consistent field codes, the iterations incorporate procedures to accelerate and stabilize the solution such that convergence is virtually guaranteed.36,37 The number of iterations and the convergence threshold can be set in the SCF block. With the change in density converged, the polarizability components are calculated. Similar procedures are adopted for calculating electric hyperpolarizabilities; see articles by van Gisbergen et al.16,22 and Ye et al.23,24 for further details regarding implementations in the ADF package and benchmark data. To calculate Raman spectra for nonresonant molecules where the frequency dependence of the polarizability derivatives is weak, it is often sufficient to calculate the static polarizabilities (polarizabilities at zero frequency: ω = 0) and

500

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

include a nonzero frequency only when calculating cross sections.38,39 (There are prefactors in the cross sections that cause the cross sections to be zero at zero frequency, so a finite frequency estimate of the cross sections requires inputting a frequency other than zero.) If the frequency-dependent polarizabilities are desired, the FREQUENCY keyword can be put into AORESPONSE, followed by a list of frequencies and the units (EV, HARTREE, or ANGSTROM) used for the frequencies. The necessary components of the static hyperpolarizability are obtained by adding the ALLCOMPONENTS and HYPERPOL keywords to the RESPONSE block. To obtain frequency-dependent hyperpolarizabilities, the DYNAHYP keyword must also be added to the “block” and the frequency in hartrees must be specified after the HYPERPOL keyword. Use of the DEPENDENCY keyword in the input as well as specifying SYMMETRY NOSYM is suggested for both types of calculations.

15.4 DERIVATIVES OF THE POLARIZABILITIES WITH RESPECT TO NORMAL MODES

With either the static or frequency-dependent polarizabilities at hand for both the plus and minus structures for a normal mode, the polarizability derivatives can be calculated using the quotient of the change in polarizability and twice the normal-mode step size. This step size, sQk , is different from the step size sR used earlier to make the displaced structures, and must be calculated separately for each normal mode. Note that this is not contained in ADF. The two step sizes are related by the equation

sQk

⎛ ⎞2 3N Ri sR ⎜ ⎟ = sR ⎝ ⎠ = norm √ R /Qnorm 3N 2 i i (Ri mi )

(15.5)

where mi is the mass of the atom being displaced by Ri , and Qnorm is the square root of the sum of the squares of the mass-weighted displacements.31 The coordinates were displaced both backward and forward along the vibration, so the change in the polarizabilities must be divided by twice the sQk step size. The polarizability derivatives, αij are therefore given as αij =

αij (plus) − αij (minus) 2sQk

(15.6)

Polarizabilities in ADF are reported as polarizability volumes in atomic units and so have units of cubic bohr. By calculating sQk using the displacements in bohr and the masses in atomic mass units, the polarizability derivatives will have units of square bohr per square root of amu. Hyperpolarizabilities are also given in atomic units (quintic bohr per electron charge), which can be converted to quintic angstroms per electrostatic unit and then to quartic angstroms per statvolt.40 The

ORIENTATION AVERAGING

501

components of the polarizability are also given with respect to a molecule fixed coordinate frame. Results for the specific components will therefore vary if the molecule coordinates are transformed with respect to this frame. Although there are times when the molecular orientation is important, most manipulations to produce spectra are invariant to orientation as they involve orientation averaging.

15.5 ORIENTATION AVERAGING

Certain combinations of the polarizability derivatives will give values that accurately predict the relative Raman peak intensities. When trying to reproduce spectra of systems that sample over all orientations of the molecule, the intensity of Raman scattered light will be IRaman =

ω4 2 I0 α˜ ij (ω, Q) c4

(15.7)

ij

where ω is the frequency of the scattered light. The tilde denotes that the components of the polarizability derivatives are defined relative to a space fixedcoordinate system, and the brackets denote that the value within is orientation averaged. For hyperRaman scattering, the expression for the intensity is IhyperRaman =

8πω4 ˜ 2 βijj (ω, Q) I0 c4

(15.8)

If a common experimental setup is assumed where the scattering observed is 90◦ relative to the direction of the incident light and the scattered beam polarization is not resolved, the expression for the Raman intensity for a normal mode k becomes41 ω4 7 IkRaman = 4 I0 ak2 + γk2 (15.9) c 45 The value 45ak2 + 7γk2 is called the Raman scattering factor, Sk , and is dependent on the polarizability derivatives through ak , the trace, and γk , the anisotropy, of the polarizability derivatives. The trace and anisotropy in terms of the polarizability derivatives in the molecule fixed-coordinate system are31 ak = 13 [(αxx )k + (αyy )k + (αzz )k ] γk2 = 12 [(αxx )k − (αyy )k ]2 + [(αyy )k − (αzz )k ]2 + [(αzz )k − (αxx )k ]2 + 6[(αxy )2k + (αyz )2k + (αzx )2k ] (15.10) Raman scattering factors are generally reported in quartic angstroms per atomic mass unit.

502

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

Under the same experimental conditions as those outlined for Raman scattering, the hyperRaman intensity expression becomes IhyperRaman =

8πω4 2 ˜ 2 I (β + β˜ 2 ijj ) c5 0 iii

(15.11)

9 ˜ 2 In terms of the molecule fixed hyperpolarizability derivatives, β˜ 2 iii and βijj are

β˜ 2 iii =

1 2 4 2 2 4 βiii + βiij + βiii βijj + βjii βiij 7 35 35 35 i=j

i

+

i=j

4 1 2 4 βiii βjji + βjii + βiij βjkk 35 35 105 i=j

i=j

i=j =k

+

1 4 βjii βjkk + β β 105 i=j =k 105 i=j =k iij kkj

+

2 2 4 βijk + βijk βjik 105 105 i=j =k

β˜ 2 iii =

i=j

(15.12)

i=j =k

1 2 4 4 8 2 βiii + βiii βiij − βiii βjji + βiij 35 105 70 105 i=j

i

+

i=j

i=j

3 2 4 1 βijj − βiij βjii + βijj βikk 35 70 35 i=j

i=j

i=j =k

−

4 4 βiik βjjk − β β 210 i=j =k 210 i=j =k iij jkk

+

2 2 4 βijk − βijk βjik 35 210 i=j =k

(15.13)

i=j =k

15.6 DIFFERENTIAL CROSS SECTIONS

Although Raman scattering factors will give a good idea of relative intensities, it is the differential cross sections that are directly comparable to experimental measurements. The frequency of the incident light is part of the expression of the differential cross section which allows normal Raman spectra for a specific wavelength of incident light to be calculated even while using static polarizabilities.39 This approach should give reasonable estimates for any off-resonance situation as long as the dispersion of the polarizability is relatively small. The computational effort to calculate the scattering factors using dynamic polarizabilities is higher, but is recommended for improved accuracy.

DIFFERENTIAL CROSS SECTIONS

503

For the Q branch in an experiment where the scattering angle is 90◦ and the incident light is perpendicularly plane polarized with respect to the scattering plane, the differential cross section is31,39 dσ Sk h 1 (˜νin − ν˜ k )4 = 2 d 45 1 − exp(−hc˜νk /kB T ) 8ε0 c˜νk

(15.14)

where ν˜ in is the frequency of the incident light and ν˜ k is the frequency of the kth normal mode, both in wavenumbers. If the Raman scattering factors in quartic angstroms per atomic mass unit are converted to C2 · m2 /V2 · kg using a factor of 1/4πε0 along with the appropriate length and mass conversions, the differential cross section can be made to have units of cm2 /sr (sr is the abbreviation for steradians). These are the standard units for reporting Raman scattering differential cross sections. Example 1: Raman Spectra of Pyridine and Pyridine on a Silver Cluster As an example of the results that can be expected using the method described above, simulated Raman spectra for pyridine and pyridine on the surface of a tetrahedral 20-silver-atom cluster will be shown. The orientationally averaged off-resonance spectra calculated are referred to as normal or bulk Raman spectra, and are comparable to those obtained in experiments performed on solutions of the species modeled. Geometry optimization and normal-mode frequency calculations were performed using the PW91 functional and a polarized triple-zeta Slater-type basis (TZP) for all atoms. Relativistic effects, which have been shown to be important in the modeling of optical properties of silver clusters,42 are included with the use of the zeroth-order regular approximation (ZORA)43,44 in its spin-free (scalar relativistic) version. An extension of AORESPONSE to include spin-orbit coupling has also been developed recently,45 but for an Ag cluster, such effects can be considered negligible. The normal-mode frequencies calculated were compared to those from experiment to ensure decent agreement. Normal-mode frequencies and atomic coordinates for the optimized geometries are available in the supporting information. Polarizability calculations used an asymptotically correct XC potential, SAOP,46 and the larger ET-QZ3P-polar basis set for the carbon, hydrogen, and nitrogen atoms (still using TZP for the silver atoms). Use of the SAOP model potential gives the correct long-distance behavior, which is important for obtaining accurate polarizabilities (although for the systems at hand, BP86 and TZP give similar results) and even more so for hyperpolarizabilities.47 The normal Raman spectrum for pyridine, calculated from static polarizabilities and using an incident wavelength of 514.5 nm in the equation for the cross section, is shown in Fig. 15.1 (the differential cross section is given in units of 10−30 cm2 sr−1 and wavenumbers are given in cm−1 ). The stick spectrum (note: it has been scaled) obtained from calculation of intensities at each normal-mode frequency is overlaid by the spectrum where each peak has been convoluted with a Lorentzian with a width of 20 cm−1 . Peaks and intensities seen in the experimental spectrum48,49 are reproduced well by the calculations. The minor

504

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA 0.12 1026 dσ/dΩ (10–30cm2/sr)

0.1 0.08 983

0.06

1500 1300 1100 900

700

0.04 0.02

1581 1472

0

1500

1209 1146 1300

651

1100

900

700

599 500

300

Wavenumber (cm–1) 0.16 1026

dσ/dΩ (10–30cm2/sr)

0.14 0.12 0.10

982 0.08 0.06 0.04 0.02 0.00

1580 1472 1500

1208 1146 1300

1100

651

900

Wavenumber

700

599 500

300

(cm–1

)

Fig. 15.1 (color online) Simulated normal Raman spectrum of pyridine at an incident wavelength of 514.5 nm using static (top) and frequency-dependent (bottom) polarizability derivatives. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum. Inset: Experimental spectrum from Golab et al.48

peaks, however, are relatively more intense and the intensity ordering of the peaks at 983 and 1026 cm−1 is the opposite of what is seen experimentally. Adding a tetrahedral 20-silver-atom cluster allows the investigation of phenomena such as the chemical enhancements observed in SERS.39 Though the pyridine–Ag20 system has a large number of normal modes, only those in the range 300 to 1600 cm−1 , which correspond primarily to motions of the atoms in pyridine, are of interest. Figure 15.2 shows the optimized pyridine–Ag20 complex geometry (where the pyridine is perpendicular to a face of the cluster and binds through the N atom to the Ag atom at the center of the face) and the calculated normal Raman spectrum for the structure with the cross section

DIFFERENTIAL CROSS SECTIONS

505

Fig. 15.2 (color online) Optimized geometry and simulated normal Raman spectrum of the surface pyridine–Ag20 complex at an incident wavelength of 514.5 nm using static polarizability derivatives. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum.

once again assuming an excitation wavelength of 514.5 nm. Comparing the intensities of the peaks in this spectrum to those in the pyridine spectrum, the chemical enhancement is approximately one order of magnitude. These results are comparable to results presented by Zhao et al.,39 where it was also found that the corresponding spectra at wavelengths that are on-resonance for the Ag20 are enhanced by 105 or greater. This provides a model for understanding SERS.

Example 2: HyperRaman Spectrum of Pyridine Using the same geometry and frequencies for pyridine as in the normal Raman example, the hyperRaman spectra can also be simulated. The hyperpolarizability calculations at the displaced geometries were run with the SAOP model potential and an ET-QZ3Ppolar basis set for all atoms. The orientationally averaged hyperRaman spectrum is shown in Fig. 15.3 [intensities are given in angstrom6 /(amu · statvolt2 )]. The differential cross section is not calculated because the equation outlined is only applicable to Raman spectroscopy with a specific experimental setup.31,39 Although an effective excitation wavelength cannot be added into the spectrum, the relative intensities of the peaks should still be able to be compared to experimental spectra. In general, experimental hyperRaman spectra are rarely determined, due to the hyperRaman signal being even weaker than the already weak Raman signal. Luckily, for pyridine there are experimental measurements, which are matched rather well by the calculated spectrum.49 Not all the peaks calculated can be verified due to noise in the experiment, but the relative intensities of those that are observed matches well.

506

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

Fig. 15.3 (color online) Simulated normal hyperRaman spectrum of pyridine using static hyperpolarizability derivatives. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum. Inset: Experimental spectrum from Neddersen et al.49

15.7 SURFACE-ENHANCED RAMAN AND HYPERRAMAN SPECTRA

The previous discussion is applicable to the Raman and hyperRaman spectra of any system where orientation averaging applies. The intensities are generally small, but they can be greatly enhanced by placing the molecules on a surface. Molecules adsorbed to a surface are generally restricted to a finite set of orientations relative to the surface, so the expressions based on orientation averaging no longer apply. If a specific orientation to the surface is assumed, the Raman intensities are proportional to the polarizability component perpendicular to the surface, as the plasmon-enhanced electromagnetic field near the surface is dominated by this component. For calculations which assume that the z-direction is 2 normal to the surface, α2 zz (βzzz for hyperRaman intensities) will give the relative peak intensities. As the interest is only in one of the components of the polarizability tensor of the molecule, the orientation of the molecule in the input becomes important. For example, to calculate the surface-enhanced Raman spectrum of a molecule standing straight up on a surface, the molecule should be appropriately oriented along the z-axis (as determined by its adsorption behavior) in all of the inputs. Also, the frequency calculation and polarizability calculations at the displaced coordinates would be performed as for a normal Raman calculation. The difference is that it is only necessary to calculate the polarizability derivative for the αzz component.

APPLICATION OF TENSOR ROTATIONS TO RAMAN SPECTRA

507

15.8 APPLICATION OF TENSOR ROTATIONS TO RAMAN SPECTRA FOR SPECIFIC SURFACE ORIENTATIONS

In cases where the molecular orientation is uncertain, the comparison of simulated spectra with experiment can be used to infer the correct orientation. In this case the complete polarizability tensor needs to be determined for an arbitrary orientation, and then the polarizability is rotated to the desired orientation. A second-order tensor (the polarizability tensor [αlm ]) or third-order tensor (the hyperpolarizability tensor [βijk ]) tensor can be rotated into a new coordinate frame by applying a rotation matrix [R] and its inverse. The tensor in the new coordinate frame is given by [α∗ij ] = [R][αlm ][R]−1

(15.15)

Here R is an orthogonal matrix ([R]−1 = [R]T ) whose components ril are the cosines of the angle between the ith axis of the original coordinate frame and the lth axis of the target coordinate frame: ril = cos(i, l)

(15.16)

For surface-enhanced Raman, only the perpendicular component of the polarizability tensor is of interest. This can easily be calculated using the formula αij ∗ =

ril rjm αlm

(15.17)

ril rjm rkn βlmn

(15.18)

lm

for polarizabilities, and βijk ∗ =

lmn

for hyperpolarizabilities. Of course, this work can be avoided completely if the molecular structure is defined in coordinates where one axis is along the surface normal. Example 3: Surface-Enhanced Raman Spectrum of Pyridine If a normal Raman spectrum has already been calculated for the molecule of interest, it takes only minor modifications to obtain a surface-enhanced Raman spectrum. For the example molecule pyridine, the results of the polarizability calculations from the pyridine normal Raman example will be used. To model the surfaceenhanced spectrum using only the polarizability derivatives of the molecule (so that plasmon enhancement effects are left out), an orientation relative to a fictional surface must be assumed. For pyridine, it will be assumed that the nitrogen atom binds to the surface and that the molecule stands straight up. This orientation places the C2 -axis of pyridine along the surface normal.

508

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

Fig. 15.4 (color online) SERS spectrum of pyridine standing straight up on a fictional surface. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum. Inset: Experimental spectrum from Golab et al.48

The equilibrium structure of pyridine used for the polarizability calculations has its C2 -axis along the z-axis. This means that the surface normal is along the z-axis in the calculations and that the squares of the derivatives of the αzz components of the polarizabilities will be proportional to the experimental SERS intensities. The SERS spectrum for pyridine obtained in this manner is shown in Fig. 15.4 (intensities are given in angstrom4 /amu). Once again, the differential cross-section equation does not apply to what is being modeled. The surface is fictional, so only the relative intensities have any real significance. The calculated spectrum compares well with experimental data,48 except for the peak at 1026 cm−1 , which should be only slightly more intense than the peaks at 1581 and 1209 cm−1 . While the correct intensity ordering is not observed, the peak at 983 cm−1 does increase in intensity relative to the peak at 1026 cm−1 going from the nonresonant Raman spectrum to the SERS spectrum, which is seen experimentally.48 It may be possible that the differences observed are occurring because the orientation of pyridine relative to the surface is not what has been assumed in the calculations. A rigorous study would consider other orientations and possibly average over a range of orientations to see if better agreement can be achieved.

15.9 RESONANCE RAMAN

Another phenomenon used to increase Raman intensities in experiments is the resonance Raman effect. Resonance Raman involves using incident light with an energy that matches the energy needed to put the molecule in an electronically

DETERMINATION OF RESONANT WAVELENGTH

509

excited state.50 In the expression for the Kohn–Sham response function, this would mean ω = εi − εm , which leads to division by zero in the response function described above.19 The zero occurs because it was assumed that the excited state has an infinite lifetime. However, the excited states of molecules in a condensed phase always have a significant width, due to dephasing of the excited state through interaction with the environment. The AORESPONSE functionality in ADF allows calculation of polarizabilities at resonant wavelengths by adding in an effective lifetime by way of a damping parameter in the response function. This is not a perfect fix, though, because it assumes that all excited states have the same lifetime, which is generally not true. Damping parameters are best obtained by fitting experimental absorption data for the molecule of interest.19 If there are no available data, it is possible to use the value for a similar molecule if the short-time approximation is valid. A value of 0.004 atomic unit (0.1 eV) has been found to be reasonable for many large organics, as well as pyridine interacting with silver clusters.39 In the AORESPONSE block, the keyword LIFETIME followed by the lifetime in atomic units will tell the program to account for the excited-state lifetime provided. With a lifetime specified, ADF will be able to calculate both the real and imaginary parts of the polarizability. The imaginary polarizabilities should be treated like their real counterparts until the scattering factors are calculated. At that point, the real and imaginary scattering factors can be summed to give the total scattering factor. 15.10 DETERMINATION OF RESONANT WAVELENGTH

Using the AORESPONSE lifetime functionality, it is possible to calculate polarizabilities for the displaced structures at resonant wavelengths, but it is important to have an idea of where the resonance is located before doing the calculations. Experimental resonance Raman literature or absorption maximum data for the system provide a good place to start. Using the optimized geometry for the system, polarizability calculations should then be run for a range of incident light frequencies close to where the resonant frequency is believed to be. The polarizability calculations should also be using the finite lifetime that was found to be appropriate for the system. The absorption maximum for the system occurs where the imaginary polarizability has its maximum and is an appropriate frequency to choose for the resonance Raman calculations.39 Of course, another way to determine the excitation energies of the system for a given combination of basis set, XC potential, and XC kernel would simply be to run a calculation of the excitation spectrum using TDDFT. This can be accomplished using the EXCITATIONS keyword in ADF. The equivalence of the two approaches, Im[α] versus TDDFT excitation spectra, was demonstrated explicitly by Jensen et al.,19 Devarajan et al.,45 and Krykunov et al.51 for the closely related case of optical rotatory dispersion versus TDDFT circular dichroism spectra. Example 4: Resonance Raman Spectrum of Uracil To detail the steps necessary to calculate a resonance Raman spectrum, the molecule uracil will be used

510

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

as an example. An excitation in uracil that can be used to study RRS corresponds to the lowest-energy π → π∗ transition.20 This excitation is found experimentally at 5.08 eV (244 nm) in the gas phase and 4.77 eV (260 nm) in the aqueous phase.52 To discern what excitation energy to use in the calculation of resonance polarizabilities, the real and imaginary polarizabilities of the equilibrium geometry were calculated at discrete points between incident light wavelengths of 240 and 280 nm. For all the calculations in this example, a value of = 0.004 a.u. was chosen for the damping parameter, the BP86 functional was used, and all atoms were treated with a TZP basis set. The real and imaginary polarizabilities as a function of the wavelength of the incident light are shown in Fig. 15.5. A maximum is seen in the imaginary polarizability of the system at 263 nm. For the polarizability derivative calculations, it is reasonable to use 263 nm for the incident light wavelength in the input to the displaced geometry calculations, or a nearby wavelength that was used in experiments. Using an incident light wavelength of 263 nm, the spectrum displayed in Fig. 15.6 can be obtained. The spectrum assumes an average over all molecular orientations, and the stick spectrum has been broadened by a Lorentzian as in the pyridine nonresonant Raman example. Close agreement with experiment is seen except for the peak at 1737 cm−1 , which is much too intense in the calculations, and the peaks at 1448 and 1353 cm−1 which are seen as a single peak at 1401 cm−1 in experiments. The second issue appears to be due to solvent effects since adding two water molecules to the calculations shifts the two peaks together around 1400 cm−1 .20 This does not, however, correct the peak at 1737 cm−1 . This error probably arises due to Fermi resonance (not included in the calculations) between the

Fig. 15.5 Real (squares) and imaginary (circles) polarizabilities of uracil as a function of the wavelength of incident light between 240 and 280 nm.

SUMMARY

511

Fig. 15.6 (color online) Simulated resonance Raman spectrum of uracil at an incident wavelength of 263 nm. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum. Inset: Experimental spectrum from Jensen et al.20

C—O and N—H bending modes.53,54 Fermi resonances and overtones are not accounted for in the harmonic approximation that has been made in the calculation of the vibrations.20 Raman spectra calculated for molecules in which these processes play a visible role will not accurately reproduce all the peak intensities.

15.11 SUMMARY

In this chapter we have provided a detailed discussion of the calculation of Raman and hyperRaman spectra for large molecules and molecules interacting with metal clusters using the ADF computer program and time-dependent density functional theory. Both static- and frequency-dependent Raman spectra are considered, and the frequency-dependent spectra include the possibility of excitation on resonance through the input of an empirical width factor in the resonant optical response. In addition, we describe the calculation of spectra for specific molecular orientations and an average over orientations. Specific examples are presented for pyridine in vacuum, for pyridine interacting with a silver cluster, and for pyridine oriented on a fictitious surface to mimic orientation effects that can occur in SERS. In addition, we examined the resonance polarizability and resonance Raman spectrum of uracil as an example of a resonance Raman calculation.

512

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

Although these examples reveal important capabilities that are now available using TDDFT, there remain important limitations in the use of this method for large systems. The current technology can handled up to 100 to 200 atoms but becomes impractical for much larger systems. Even for 100 to 200 atoms, it can be quite challenging to calculate spectra for a lot of normal modes. In addition, the excited-state widths are purely empirical factors in the current version of the code and are assumed not to depend on the nature of the excited state. Finally, we note that the models of SERS which replace the metal particles by silver clusters that have less than 100 atoms make important approximations whose validity is still uncertain. Plasmon resonances are size dependent for small clusters, so the resonance wavelengths do not match the observations, and the cluster-size dependence of the widths is unknown. In addition, the behavior of the electromagnetic fields around the cluster are unlikely to match the fields associated with large particles, so the field enhancements that lead to SERS are not likely to be described accurately. Supporting Information

Supporting information including atomic coordinates and vibrational frequencies for all example species may be found on the book Web site. Acknowledgments

This research was supported by AFOSR/DARPA project BAA07-61 (FA955008-1-0221) and the National Science Foundation Network for Computational Nanotechnology. We thank our many collaborators, including Stephen Gray, Richard Van Duyne, Chad Mirkin, and Teri Odom.

REFERENCES 1. Camden, J. P.; Dieringer, J. A.; Zhao, J.; Van Duyne, R. P. Acc. Chem. Res. 2008, 41 , 1653. 2. LaFratta, C. N.; Walt, D. R. Chem. Rev . 2008, 108 , 614. 3. Jain, P. K.; Huang, X.; El-Sayed, I. H.; El-Sayad, M. A. Plasmonics 2007, 2 , 107. 4. Lal, S.; Link, S.; Halas, N. J. Nat. Photon. 2007, 1 , 641. 5. Murphy, C. J.; Gole, A. M.; Hunyadi, S. E.; Stone, J. W.; Sisco, P. N.; Alkilany, A.; Kinard, B. E.; Hankins, P. Chem. Commun. 2008, 544. 6. Willets, K. A.; Van Duyne, R. P. Annu. Rev. Phys. Chem. 2007, 58 , 267. 7. Kneipp, J.; Kneipp, H.; Kneipp, K. Chem. Soc. Rev . 2008, 37 , 1052. 8. Kelley, A. M. J. Phys. Chem. A 2008, 112 , 11975. 9. Yang, W. H.; Schatz, G. C. J. Chem. Phys. 1992, 97 , 3831. 10. Yang, W.-H.; Hulteen, J.; Schatz, G. C.; Van Duyne, R. P. J. Chem. Phys. 1996, 104 , 4313. 11. Jeanmaire, D. L.; Van Duyne, R. P. J. Electroanal. Chem. 1977, 84 , 1.

REFERENCES

12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.

37. 38. 39. 40. 41. 42. 43. 44.

513

Hulteen, J. C.; Young, M. A.; Van Duyne, R. P. Langmuir 2006, 22 , 10354. Schatz, G. C. Acc. Chem. Res. 1984, 17 , 370. Moskovits, M. Rev. Mod. Phys. 1985, 57 , 783. Schatz, G. C.; Ratner, M. A. Quantum Mechanics in Chemistry, Dover, Mineola, NY, 2002. van Gisbergen, S. J. A.; Snijders, J. G.; Baerends, E. J. Comput. Phys. Commun. 1999, 118 , 119. Velde, G. T.; Bickelhaupt, F. M.; Baerends, E. J.; Guerra, C. F.; Van Gisbergen, S. J. A.; Snijders, J. G.; Ziegler, T. J. Comput. Chem. 2001, 22 , 931. ADF2008.01; SCM: Theoretical Chemistry, Vrije Universiteit, Amsterdam, http://www.scm.com, click on “Theoretical Chemistry.” Jensen, L.; Autschbach, J.; Schatz, G. C. J. Chem. Phys. 2005, 122 , 224115/1. Jensen, L.; Zhao, L. L.; Autschbach, J.; Schatz, G. C. J. Chem. Phys. 2005, 123 , 174110/1. Mort, B. C.; Autschbach, J. J. Phys. Chem. A 2006, 110 , 11381. van Gisbergen, S.; Snijders, J. G.; Baerends, E. J. J. Chem. Phys. 1998, 109 , 10644. Ye, A.; Patchkovskii, S.; Autschbach, J. J. Chem. Phys. 2007, 127 , 074104. Ye, A.; Autschbach, J. J. Chem. Phys. 2006, 125 , 234101. Jensen, L.; Aikens, C. M.; Schatz, G. C. Chem. Soc. Rev . 2008, 37 , 1061. Jensen, L.; Zhao, L. L.; Schatz, G. C. J. Phys. Chem. C 2007, 111 , 4756. Aikens, C. M.; Schatz, G. C. J. Phys. Chem. A 2006, 110 , 13317. Masiello, D. J.; Schatz, G. C. Phys. Rev. A 2008, 78 , 042505/1. Bernath, P. F. Spectra of Atoms and Molecules, 2nd ed., Oxford University Press, New York, 2005. Mort, B. C.; Autschbach, J. J. Phys. Chem. A 2005, 109 , 8617. Reiher, M.; Neugebauer, J.; Hess, B. A. Z. Phys. Chem. 2003, 217 , 91. Krykunov, M.; Autschbach, J. J. Chem. Phys. 2005, 123 , 114103. Krykunov, M.; Autschbach, J. J. Chem. Phys. 2007, 126 , 024101. Autschbach, J.; Jensen, L.; Schatz, G. C.; Tse, Y. C. E.; Krykunov, M. J. Phys. Chem. A 2006, 110 , 2461. van Gisbergen, S. J. A.; Snijders, J. G.; Baerends, E. J. J. Chem. Phys. 1995, 103 , 9347. Pulay, P. Analytical derivative techniques and the calculation of vibrational spectra. In Modern Electronic Structure Theory, Part II, Vol. 2, Yarkony, D. R., Ed., World Scientific, Singapore, 1995, p. 1191. Pople, J. A.; Raghavachari, K.; Schlegel, H. B.; Binkley, J. S. Int. J. Quantum Chem. 1979, S13 , 225. Neugebauer, J.; Reiher, M.; Kind, C.; Hess, B. A. J. Comput. Chem. 2002, 23 , 895. Zhao, L.; Jensen, L.; Schatz, G. C. J. Am. Chem. Soc. 2006, 128 , 2911. Kanis, D. R.; Ratner, M. A.; Marks, T. J. Chem. Rev . 1994, 94 , 195. Califano, S. Vibrational States, Wiley, New York, 1976. Aikens, C. M.; Li, S. Z.; Schatz, G. C. J. Phys. Chem. C 2008, 112 , 11272. van Lenthe, E.; Baerends, E. J.; Snijders, J. G. J. Chem. Phys. 1993, 99 , 4597. van Lenthe, E.; Baerends, E. J.; Snijders, J. G. J. Chem. Phys. 1994, 101 , 9783.

514

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

45. Devarajan, A.; Gaenko, A.; Autschbach, J. J. Chem. Phys. 2009, 130 , 194102. 46. Gritsenko, O. V.; Schipper, P. R. T.; Baerends, E. J. Chem. Phys. Lett. 1999, 302 , 199. 47. Schipper, P. R. T.; Gritsenko, O. V.; van Gisbergen, S. J. A.; Baerends, E. J. J. Chem. Phys. 2000, 112 , 1344. 48. Golab, J. T.; Sprague, J. R.; Carron, K. T.; Schatz, G. C.; Van Duyne, R. P. J. Chem. Phys. 1988, 88 , 7942. 49. Neddersen, J. P.; Mounter, S. A.; Bostick, J. M.; Johnson, C. K. J. Chem. Phys. 1989, 90 , 4719. 50. Albrecht, A. C. J. Chem. Phys. 1961, 34 , 1476. 51. Krykunov, M.; Kundrat, M. D.; Autschbach, J. J. Chem. Phys. 2006, 125 , 194110. 52. Clark, L. B.; Peschel, G. G.; Tinoco, I. J. Phys. Chem. 1965, 69 , 3615. 53. Peticolas, W. L.; Rush, T. J. Comput. Chem. 1995, 16 , 1261. 54. Szczesniak, M.; Nowak, M. J.; Rostkowska, H.; Szczepaniak, K.; Person, W. B.; Shugar, D. J. Am. Chem. Soc. 1983, 105 , 5969.

16

Metal Surfaces and Interfaces: Properties from Density Functional Theory IRENE YAROVSKY, MICHELLE J. S. SPENCER, and IAN K. SNOOK Applied Physics, School of Applied Sciences, RMIT University, Victoria, Australia

In this chapter we describe comprehensive theoretical studies of metallic surfaces and interfaces using density functional theory (DFT) calculations. First, we provide a general introduction and background, then describe the methodology used and validation studies performed. Calculations performed on Fe(100), Fe(110), and Fe(111) surfaces to investigate their structure, energetics, electronic, magnetic, and adsorption properties are then discussed. Interfaces between these surfaces and, specifically, adhesion and the associated electronic and magnetic properties are then presented. Adhesion is studied between the surfaces in match (in registry) and mismatch (out of registry), ideal and relaxed, and clean and sulfur-contaminated states. Finally, we provide summaries, conclusions, and suggestions for future work. 16.1 BACKGROUND, GOALS, AND OUTLINE

Iron surfaces have been of interest to both pure and applied sciences since the Iron Age. Despite their crucial importance for many industries,1 – 3 from crude heavy industry to refined electronics, there is a gap in the fundamental understanding of many important properties of iron surfaces, such as magnetic properties and adhesion, which may slow their application in new and innovative technologies. This gap in understanding arises partly because of the inherent difficulty of studying the material both experimentally, due to its high susceptibility to corrosion,4 and theoretically due to its transition metal nature and hence complex electronic properties. Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

515

516

METAL SURFACES AND INTERFACES

Specifically, adhesion between metallic iron surfaces plays an important role in many industrial processes.5,6 For example, in the extraction of metallic iron (Fe) via the fluidized-bed iron ore reduction process of powdered ores, the process often suffers from the buildup of deposits, known as accretions, in various parts of the reactors, and component particles may strongly adhere, forming large clumps resulting in defluidization of the bed.7 As iron forms a major constituent of the accretions, a fundamental understanding of the mechanism by which the metal particles adhere, as well as identification of the species capable of preventing severe adhesion, is of vital importance. Previous investigations of Fe surfaces and interfaces have looked at a number of their properties, including structural and magnetic features. However, generally speaking, previous studies have not provided a systematic fundamental description, particularly of the dynamic properties associated with thermal and impurity-induced transformations and the effects of the material which are crucial for the ability to design and manipulate its properties at the macro- and nanoscale. Here we present an account of our theoretical work on Fe, which includes new results and those obtained previously. Specifically, after describing the methodology in Section 16.2, in Section 16.3.1 we review results on the computed relaxations and energies of the three low index surfaces—(100), (110), and (111)—of body-centered cubic (bcc) Fe and compare the computational results with experimental observations. In Section 16.3.2 we describe new results on the magnetic properties of the Fe(100), Fe(110), and Fe(111) surfaces, such as changes in the magnetic and electronic properties after relaxation and the layerresolved magnetic moment values, as well as up- and down-spin-resolved density of states. In Sections 16.3.3 and 16.3.4 we present results on the adsorption of atomic S on the atop, bridge, and hollow sites of Fe(100) and Fe(110) surfaces at 1/2 and 1/4 monolayer (ML) coverages. The most stable site, the effects of S adsorption on surface reconstruction, and magnetic and electronic properties are considered. A summary of the effect of higher S coverages on these properties is also presented. In Sections 16.3.5 and 16.3.6 we discuss our calculations on the dynamic behavior of S and H2 S adsorbed on Fe(100) and Fe(110), including ab initio molecular dynamics (AIMD) simulations to examine the effect of elevated temperatures. In Section 16.4.1 we review our studies on adhesion between clean, bulkterminated bcc Fe(100), Fe(110), and Fe(111) matched and mismatched interfaces. The parameters obtained from this work allowed the behavior of the work of separation (Wsep ) to be determined and examined. In Section 16.4.2 we examine newly obtained results on the relationship between magnetic and electronic properties and adhesion of the Fe(100), Fe(110), and Fe(111) surfaces in match and mismatch. In Section 16.4.3 we discuss the avalanche effect in adhesion between Fe(100) surfaces, in match and mismatch, where the role of model constraints has been focused on specifically. In Section 16.4.4 we give a brief summary of our study of the effect of adsorbed S on the adhesion of Fe(100) and Fe(110) surfaces in the atop, bridge, and hollow sites at 1/2 and 1/4 ML coverages

METHODOLOGY

517

in match and mismatch interfaces. The effect of adsorbed S on the charge-density distribution and magnetic properties of the interface are also examined and related to the interfacial geometry. Also discussed is the effect of relaxation of the interfaces and different coverages of S at the interface. We conclude this chapter with a summary and outline of future work in Section 16.5. 16.2 METHODOLOGY

Density functional theory (see Chapter 1) is a technique that can provide fundamental understanding of the structural, electronic, magnetic, and adhesion properties of materials and their surfaces and interfaces at the electronic level.8 – 16 Theoretically, it is possible to construct a model interfacial system of any two surfaces with any degree of lattice match or mismatch and to make arbitrary alterations to the surfaces: for example, to introduce atomic and molecular impurities and then systematically investigate their effects on the system’s properties. A range of atomic simulation methods, including DFT, have already been applied successfully to the investigation of various metallic and ceramic interfaces (e.g., MgO/Ag,17 Mo/MoSi2 , 18 NiAl/Cr,19 and Fe20,21 ) and on the effects of impurities (S, C, N, O, P, etc.) on adhesion between surfaces.18,22 – 29 A fairly comprehensive review of applications of various theoretical simulation techniques to study material interfaces can be found in the literature by Finnis30 and is beyond the scope of this publication. We have developed a number of methods using classical empirical potentials based on the embedded-atom method (EAM) to study Fe surfaces and interfaces31 – 34 ; the advantage of this approach is that it is significantly less computationally expensive and it is possible to estimate the system free energies for much larger models and hence simulate a wider variety of surface structures and defects. However, in this chapter we describe our investigations of the surface and interfaces using the DFT approach. 16.2.1 Choice and Validation of the Computational Method: Bulk Iron Studies

All calculations were performed using the Vienna ab initio simulation package (VASP),35 – 37 which performs fully self-consistent DFT calculations to solve the Kohn–Sham equations38 within the local spin density approximation (LSDA) using the functional of Perdew and Zunger39 (PZ) or the generalized-gradient spin approximation (GGSA), using the functional of Perdew and Wang40 (PW91). The electronic wavefunctions are expanded as linear combinations of plane waves (see Chapter 3), truncated to include only plane waves with kinetic energies below a prescribed cutoff energy, Ecut . Due to the delocalized nature of conduction electrons in metals, a delocalized plane-wave basis provides a good representation of metallic systems. The core electrons are replaced by ultrasoft pseudopotentials by Vanderbilt,41 and k -space sampling was performed using the scheme of Monkhorst and Pack.42

518

METAL SURFACES AND INTERFACES

TABLE 16.1 Calculated Structure and Properties of Bulk Fe, Using Both LSDA and GGSA Functionalsa Property ˚ Lattice parameter, a0 (A) Bulk modulus, B (GPa) Magnetic moment/atom (μB )

LSDA

GGSA

Experimental

2.767 (−3.5%) 195 (+16%) 1.98 (−11%)

2.869 (+0.11%) 140 (−16%) 2.37 (+6.8%)

2.866 168 2.22

Source: Ref. 20. a The percent deviation from known experimental values43 is shown in parentheses.

The bulk, surfaces, and interfaces of Fe are modeled using the supercell approach, where periodic boundary conditions are applied to the central supercell so that it is reproduced periodically throughout space. Tests were performed on the bulk bcc phase of Fe, using both LSDA and GGSA functionals as well as different Ecut and k -space sampling values, to ensure that the bulk properties were converged.20 The optimized bulk structure was then used to create different surface and interface models. The total energy, Etot , and lattice parameter, a0 , of bulk bcc Fe were calculated using different plane-wave cutoff energy values and k -point sampling sets to ensure the reliability of the calculations. It was found that an Ecut of 300 eV and k -point mesh of 12 × 12 × 12 gave convergence of Etot and a0 to ˚ respectively. The lattice parameter, bulk modulus, 10−4 eV/atom and 0.001 A, and magnetic moment values calculated using these converged parameters with both LSDA and GGSA functionals are presented in Table 16.1, along with the experimental values.43 The values calculated using GGSA were found to give better agreement with the known experimental values than those calculated with LSDA. In particular, the LSDA functional was shown to predict the face-centered-cubic (fcc) Fe phase to be more energetically stable at 0 K than the bcc phase, while the GGSA functional correctly predicted the order of stability, consistent with previous findings (see, e.g., Jansen and Peng44 ). 16.2.2 Surface and Interface Models

The relaxed-bulk bcc Fe cell (with the lattice parameter determined using the GGSA PW91 functional) was cut along the (100), (110), and (111) Miller planes to form the three low-index Fe surface models (see Fig. 16.1). These models also served as our interface models. Using the supercell approach, the interfacial separation distance was defined by the vacuum layer thickness between image cells adjacent to each other in the z -direction (Fig. 16.1). Interfaces were modeled in two different orientations corresponding to a perfect lattice match between the two surfaces (i.e., epitaxial interfaces) and maximum lattice mismatch (i.e., where surface atoms of the two surfaces share the same coordinates in the x,y-plane within the supercell). An even number of

METHODOLOGY

(100)

(110)

519

(111)

(a) match interfaces vacuum (interfacial) separation d

(b) mismatch interfaces d

Unit cell top view 2.866 Å

2.48 Å

4.057 Å

Fig. 16.1 (color online) Surface/interface models: (a) (100), (110), and (111) match interfaces; (b) (100), (110), and (111) mismatch interfaces. Profiles of the surface unit cells are displayed below each surface supercell model. The interfacial separation, d , is indicated.

atomic layers was used to model the match interfaces, while an odd number of layers was used to model the mismatch interfaces (Fig. 16.1). To determine the number of layers in the surface model required for convergence, the surface energies of the unrelaxed surfaces (Esurf ) were calculated as ˚ Esurf was a function of slab thickness, using a vacuum layer separation of 10 A. calculated using the expression: Esurf =

Etot (slab) − nEtot (bulk) 2A

(16.1)

where Etot (slab) and Etot (bulk) are the total energies of the slab and bulk, respectively; n is the number of bcc Fe unit cells present in the slab; and A is the cross-sectional surface area of the slab.

520

METAL SURFACES AND INTERFACES

All surface and interface calculations also used the PW91 functional and GGSA approach. Further specific computational details for each case are given in relevant sections, as appropriate. 16.2.3 Interfacial Adhesion: Work of Separation and UBER

Most of the calculations we report here (except those described in Section 16.4.3) have been performed for interfaces between ideal Fe surfaces; namely, we calculate the work of separation (Wsep ). The concept of the work of separation versus the work of adhesion has been introduced by Finnis30 and was discussed by us previously in detail.20 In terms of the surface and interfacial excess free energies of the materials, the ideal Wsep is given by the Dupre equation45 : Wsep = σ1 + σ2 − σ12

(16.2)

where σ1 and σ2 are ideal surface free energies of materials 1 and 2, and σ12 is the interfacial free energy. This quantity should be distinguished from the work of adhesion, which is defined as the energy required to separate two surfaces from the equilibrium separation to infinity, taking full account of all relaxation and diffusion processes. Wsep can be calculated directly from the molecular simulation of isolated surfaces and of these surfaces when brought into close contact to form an interface.30 By calculating the single-point energy at discrete separation distances, d , one can obtain an interaction energy curve Ead (d): Ead (d) =

E(d) − E(∞) A

(16.3)

where E (d ) is the total computed energy at separation distance d, E(∞) is the total energy at infinite separation, and A is the cross-sectional area of interaction. The well depth of this curve, E0 , is equivalent to the Wsep . The adhesion curves calculated can be fitted to the universal binding-energy relation (UBER),46 which is given by a Rydberg-type function adapted for the case of interfacial adhesion and is considered to give a valid representation of binding in situations where bonding results mainly from overlap of the tails of wavefunctions47 : Ead (d) = −E0 (1.0 + d ∗ )ed∗

(16.4)

(d) is the fitted adhesion interaction energy, d ∗ = (d − d0 )/ l (scaled where Ead distance), E0 is the depth of the adhesion energy well at equilibrium interfacial separation (equivalent to the work of separation Wsep ), d0 is the interfacial separation at the adhesion energy minimum, and l is the scale factor, which for transition metals may be interpreted as the surface scaling length, and sets the approximate scale for the distance over which electronic forces can act. The value of E0 represents the work of separation for a particular interface.

STRUCTURE AND PROPERTIES OF IRON SURFACES

521

16.2.4 Calculation of Binding Energies for Surface Impurities

We have computed the binding energies of sulfur impurity adsorbed in various adsorption sites by the equation S(g) + Fe(s) → S · Fe(s)

(16.5)

The binding energy is the difference in total energy of the products minus the reactants: BE = Etot (products) − Etot (reactants) = Etot (S · Fe) − [Etot (S) + Etot (Fe)]

(16.6)

where Etot (S) is the total energy of an isolated S atom and Etot (S · Fe) and Etot (Fe) are the total energies of the relaxed clean Fe surface and S-adsorbed Fe(110) surface, respectively. 16.3 STRUCTURE AND PROPERTIES OF IRON SURFACES 16.3.1 Structural Relaxation and Stability of Fe(100), Fe(110), and Fe(111) Surfaces 16.3.1.1 Introduction and Previous Studies Relaxation of metal surfaces after cleavage from the bulk is a well-known phenomenon. The reduction in atomic interactions perpendicular to the surface can cause the topmost surface layers to contract toward the bulk or expand away from it. In addition, movements of the surface atoms within the plane of the surface can lead to surface reconstructions. Some previous theoretical findings have differed from those obtained experimentally.48 For the low-index surfaces of Fe in particular, there is also some conflict; however, it has been shown that the surfaces do not reconstruct, while they do relax.49 – 51 We have already reviewed the findings of the experimental studies,52 which used low–energy electron diffraction (LEED)49 – 51,53 and medium-energy ion scattering (MEIS)54 – 56 to examine relaxation of the Fe(100), Fe(110), and Fe(111) surfaces and find that there is some conflict between the reported surface relaxations. The relaxations that occur from cleavage of a bulk structure to yield a surface result from the drive to minimize the energy of the surface. The measurement of surface energy values experimentally, however, can be very difficult to perform for a number of reasons, one being the difficulty to control the presence or absence of contaminants. In calculations, the state of the surface and level of impurities can be examined systematically. Theoretical studies that have determined surface energy values of the low-index Fe surfaces have included mainly molecular mechanics (MM) techniques,34,57 – 68 with fewer studies using a quantum mechanical (QM) approach.20,69,70 In particular, the latter studies did not

522

METAL SURFACES AND INTERFACES

take into account the effect of surface relaxation on the calculated surface energy values. Furthermore, there were conflicting trends obtained for the stability of the three low-index surfaces. Hence, we performed DFT calculations52 to model these properties and to try to clarify the situation. 16.3.1.2 Surface Models The Fe surfaces were modeled using the supercell approximation as described in Section 16.2.2. All models used a [1 × 1] crystal unit cell; however, a number of [2 × 2] unit cell slab calculations were performed as well in order to test for convergence. k -Space sampling was performed using the scheme of Monkhorst and Pack.42 A k -point mesh of 12 × 12 × 1 for the [1 × 1] unit cells and a 6 × 6 × 1 mesh for the [2 × 2] unit cell cal˚ was used, as culations were employed. A lattice constant value of 2.869 A this was the optimized value obtained in our previous study20 of bulk bcc Fe using the same computational parameters. Models with different numbers of layers (ranging from 7 to 17 layers) were constructed to determine the size of slab needed to converge the surface geometry and energy values. Either one middle layer (for an odd-number layered model) or two middle layers (for an even-number layered model) were fixed, to provide a reference point for comparing the relaxed Fe positions, while all other atoms were allowed to relax in the x -, y-, and z -directions. The models selected are described in Section 16.3.1.3. 16.3.1.3 Surface Relaxation Our calculations of the relaxed surface models52 indicated that only relaxations perpendicular to the surface (in the z -direction) occurred and that these surfaces do not reconstruct (showing no atomic displacements in the x,y-directions), in agreement with experimental studies.49 – 51 For each layer in our model we calculated the values of δzn , which is a measure of the distance the nth layer of the surface moves as a percentage of the interlayer spacing. A positive value indicates an expansion or upwards movement (towards the surface), whereas a negative value indicates a contraction or downwards displacement. The relaxation values for the (100), (110) and ˚ for (111) surfaces were found to be converged by 0.01, 0.0005 and 0.005 A a 13, 7 and 12 layer model, respectively. The [2 × 2] surface models showed close agreement with the [1 × 1] slabs. These models are employed in our further work. The relaxation values obtained (Table 16.2) showed good agreement with experiment, with the open surface relaxing more, in the order of (110) < (100) < (111). The magnitude of the relaxations was found to be smaller as the bulk layers were approached. For all surfaces, the topmost layer contracted toward the bulk, with the (111) surface showing the largest relaxation, followed by the (100), then the (110) surface. The relaxation of the (110) surface layer was essentially zero, indicating that it is basically bulk cleaved. The second layer was found to relax outward for the (100) and (110) surfaces, while it expanded away from the bulk for the (111) surface. Again, the relaxations were largest for the (111) surface and smallest for the (110) surface.

STRUCTURE AND PROPERTIES OF IRON SURFACES

523

TABLE 16.2 Calculated Relaxation Measurements, δzn (n = 1, 2, . . .) as a Percentage of the Bulk Interlayer Spacing for the First Five Layers of Fe(100), Fe(110), and Fe(111)a Surface Energy (J m−2 )

Surface Relaxation (%)

(100) (110) (111)

δz1

δz2

δz3

δz4

δz5

Relaxed

Unrelaxed

−1.89 −0.13 −13.3

+2.59 +0.197 −3.6

+0.21 −0.06 +13.3

−0.56 — −1.2

−0.14 — +0.35

2.29 2.27 2.52

2.32 2.27 2.62

Source: Ref. 52. a Calculated surface energy values.

For the (100) and (110) surfaces, the magnitude of our calculated surface relaxations agreed well with the experimentally determined values50,51,55 and fell within the error of these measurements. For the (111) surface, there was a discrepancy between the relaxation values measured experimentally using MEIS54,56 and LEED.49,53 The MEIS measurements54,56 indicated that the first layer contracted and the second expanded, whereas the LEED study53 indicated that the first two layers contracted and the third expanded. Our calculations agreed with the LEED measurements. The magnitude of the surface relaxations can be related to the openness of the surface, with the more open (111) surface showing larger relaxation and the most close-packed (110) surface being almost bulk cleaved. 16.3.1.4 Surface Energy The calculated surface energy values (Table 16.2) for all three surfaces was found to be converged to at least 0.01 J m−2 by nine layers with the unrelaxed models having slightly higher or the same surface energy values. Experimentally, the surface energy of Fe has been determined using liquid surface tension measurements by extrapolating the data to 0 K to give a numerical value for the solid of 2.41 J m−2.71 As this value does not represent a particular surface of Fe, we cannot make a direct comparison; however, our values were generally in line with this value, especially if the average for all three surfaces was calculated. It was also found that the results obtained from previous MM calculations are dependent on the quality of the potentials employed, while the QM calculations, including our work, all give values that are close to the experiment. The surface energy values that were calculated showed the order of the surface stability to be (110) < (100) < (111), before and after relaxation. This relative order could be explained in terms of bond cutting arguments as well as the openness of the surface.52 In summary, our models provide a good approximation of the surface energy values, with the extent of the decrease in surface energy after relaxation being related to the magnitude of the relaxation and are therefore used in subsequent studies. Our calculations described above provided the first fully converged study of the relaxation and surface energies of the three low-index Fe surfaces.

524

METAL SURFACES AND INTERFACES

16.3.2 Electronic and Magnetic Properties of Fe(100), Fe(110), and Fe(111) Surfaces 16.3.2.1 Introduction and Previous Studies It is well known that the magnetic properties of metals at a surface are different from those in the bulk and the magnetic moments of Fe surfaces have been studied both theoretically69,72 – 80 and experimentally.81 Table 16.3 summarizes available computational results. It is well established that the magnetic moment (μB ) at the surface is enhanced compared to the bulk, due to loss of coordination upon formation of the surface. However, only a few such studies that have investigated this effect theoretically consider surface relaxations,72,73,75 with most only examining bulk-terminated surfaces.69,73,74,76 – 80 Despite the number of studies that have investigated magnetic properties of surfaces, we are unaware of any published computational studies of how the magnetic properties of Fe surfaces are related to Fe adhesion and interface formation. At an interface, the magnetic properties can differ from those of the surface or the bulk. Understanding this is particularly important for magnetic device technology.82

TABLE 16.3 Computed Magnetic Moments (μB ) of the Relaxed and (Unrelaxed) Fe(100), Fe(110), and Fe(111) Surfaces of Fe, Along with the Values Determined Previouslya Magnetic Moment, μB Surface

Year

[Ref]

(100)

[this work]

(110)

199673 199574 199469 199276 199277 198778 198379 198180 [this work]

(111)

200272 199469 /199277 198778 [this work] 199375

S

S-1

S-2

S-3

S-4

3.03 (3.06) 2.74 (3.01) (2.97) (2.87) (2.97) (2.98) (2.98) (3.01) 2.75 (2.75) 2.47 (2.57) (2.65) 2.96 (3.01) 2.62

2.47 (2.50) 2.62 (2.36)

2.59 (2.55) — (2.42)

2.48 (2.47) — —

2.45 (2.46) — —

(2.34) (2.30)

(2.33) (2.37)

(2.25)

(2.24)

(2.35) (1.68) 2.53 (2.53) 2.29 (2.35) (2.37) 2.50 (2.57) 2.25

(2.39) (2.13) 2.40 (2.48) 2.32 (2.25) (2.28) 2.66 (2.66) 2.34

— — 2.43 (2.44) 2.26 (2.24) (2.25) 2.56 (2.54) 2.15

— — 2.41 (2.41) — (2.24) — 2.55 (2.56) 2.17

C 2.42 (2.43) 2.60 (2.32)

(2.25) (1.84) 2.39 (2.39) 2.22 (2.22) 2.56 (2.53) 2.11/2.00b

a S is the surface layer, S-n (n = 1 to 4) are the second to fifth layers, and C is the center of the slab. b The calculation also included an S-5 value; hence, the values indicated are S-5/C.

STRUCTURE AND PROPERTIES OF IRON SURFACES

525

16.3.2.2 Magnetic Moments and Density of States of Fe Surfaces To relate magnetic and electronic properties to adhesion we first examined the properties of the isolated relaxed and unrelaxed Fe(100), Fe(110), and Fe(111) surfaces. The layer-resolved magnetic moment values obtained from our calculations for the three low-index surfaces before and after relaxation are shown in Table 16.3 together with a summary of previously determined values for the same Fe surfaces. It can be seen that the magnetic moment values are enhanced at the surface, due to the loss in coordination at the surface resulting in localized surface states (see, e.g., Alden et al.77 and Freeman and Fu78 ). For the surfaces studied, the enhancement is 25%, 15%, and 16% for the (100), (110), and (111) surfaces, respectively, using the relaxed surface models. The difference in surface layer magnetic moment enhancement can be attributed to the coordination of Fe atoms at each surface, where the (110) surface atoms have a higher surface coordination number and hence the lowest surface enhancement. However, the difference between the (100) and (111) surfaces, which both have a surface Fe coordination number of 4, indicates that additional features of the surface atomic arrangement, such as packing, affect the magnetism of Fe, as seen previously.75 The enhancement of the surface magnetic moment value observed for all Fe(100), Fe(110), and Fe(111) surfaces has been attributed by Wu and Freeman83 to the difference in density of surface layer up- and down-spin states at the Fermi level (EF ) as compared to the bulk. They showed that for the bulk (or center layer) density of states (DOS), the Fermi level lies on an up-spin peak and in the valley of the down-spin DOS. At the surface layer, however, the DOS are significantly narrowed due to a loss in coordination. As a result, there is a decrease in up-spin states at EF and an increase of down-spin states due to surface states and resonances. It is this increased number of down-spin states relative to the up-spin states that gives rise to the surface magnetic moment enhancement. The total DOS resolved to up- and down-spin states of each of the unrelaxed (and relaxed) surfaces is shown in Fig. 16.2 (dashed line). The bulk DOS are shown in Fig. 16.3. As can be seen from Fig. 16.2, there is an increased density of down-spin states compared to up-spin states present at EF for all three surfaces, leading to the enhanced surface magnetic moment. Comparison of the DOS for the three surfaces with those obtained previously shows good agreement. The atoms in the lower layers of our surface models show magnetic moment values that generally decrease and are identical within 1.2% for the S-4 and C layers, indicating that the surface models are large enough to achieve convergence. The magnetic moment values for the center layer of the (100) and (110) surfaces are similar, within 1.6%, and are converged to less than 1.25% compared to the bulk value, calculated to be 2.40 μB using the same computational parameters. They are, however, up to 7% different when compared to the (111) surface value where the central layer μB is 5% larger than the bulk value, indicating that the surface model may not be large enough for convergence of this property. It is important to note, though, that other properties, including surface energy and relaxation, do converge for models of the same size.52 We therefore consider the models to be appropriate for this study and for comparison with previous work.

METAL SURFACES AND INTERFACES

4

Fe(100)

3 Up

2 1

-5 -4 -3

-1

n(E) (states/eV atom)

526

energy (eV)

EF 1

2

3

4

-1 -2

Down

4

Fe(110)

3 2

Up

1

-5 -4 -3

-1

n(E) (states/eV atom)

-3

energy (eV)

EF 1

2

3

4

-2 Down -3

4 3 Up 2 1 -5 -4 -3

-1

n(E) (states/eV atom)

5

Fe(111)

energy (eV) 1

2

3

4

-2 Down -3 -4

Fig. 16.2 Total density of states (DOS) resolved to up- and down-spin for the surface/top layer of the unrelaxed (dashed line) and relaxed (solid line) Fe (100), (110), and (111) surfaces. The DOS values have not been smoothed.

STRUCTURE AND PROPERTIES OF IRON SURFACES

4

Bulk Eq. (1.39Å) n(E) (states/eV atom)

3 Up 2

1

5

4

1

527

EF

energy (eV) 1

2

3

1

Down

2

3

Fig. 16.3 Total density of states (resolved to up- and down-spins) for the surface layer of the (100) matching interface at equilibrium separation compared to the bulk. The DOS values have not been smoothed.

Even though our calculated layer-resolved magnetic moment values decrease toward the bulk (i.e., away from the surface), the (100) and (111) surfaces show some small oscillations. As can be seen from Table 16.3, most previous studies show an oscillation as well, which has been explained by rearrangements in the electron density (i.e., Friedel oscillations). The two exceptions are given by Kishi and Itoh,73 whose surface model is not large enough to observe oscillations, and Eriksson et al.,76 who incorporate spin-orbit coupling into their calculations. We do not observe such oscillations for the (110) surface, similar to previous calculations by Freeman and Fu,78 Alden et al.,69,77 and Braun et al.72 We do see a 1.2% increase in the magnetic moment value at the S-3 layer, similar to the results of Braun et al.72 ; however, this change is probably within computational uncertainty. Comparison of the magnetic moment values after relaxation shows that the μB of the surface atom decreases for the (100) and (111) surfaces, while it remains the same for the (110) surface. This appears to be related directly to the magnitude of the surface relaxation of the outermost layer.52 The DOS of the outermost layer shows little change after relaxation (Fig. 16.2, dashed line). Thus, surface relaxation does not affect the surface magnetic moments or DOS to a significant extent, and therefore the “frozen surface” adhesion model we employ in Section 16.4.2 is justified.

528

METAL SURFACES AND INTERFACES

16.3.3 Sulfur Adsorption on Fe(110) 16.3.3.1 Introduction and Previous Studies The presence of S on Fe surfaces has been shown to affect adhesion, corrosion, and catalysis and is thus of importance in industrial processes. Impurities, in general, can either increase or decrease the strength of adhesion, depending on conditions. Prior to studying the effect of impurities on adhesion we needed to examine the adsorption of these impurities on the clean Fe surfaces. The experimental S adsorption data on Fe(110)84 – 86 has concentrated primarily on the 1/4 ML coverage with S adsorbed in a p(2 × 2) arrangement. Below we summarize our findings on adsorption of S on Fe(110) in three different high-symmetry adsorption sites: atop, bridge, and four-fold hollow at 1/4 ML coverage, followed by the effect of different S coverages on the foregoing properties of the Fe(110) surface87,88 (Section 16.3.3.8). 16.3.3.2 Adsorption Models and Computational Details The Fe surfaces were modeled using the supercell approach (Section 16.2.2). S adsorption at the experimentally observed coverage of 1/4 ML and p(2 × 2) arrangement84,85 was modeled by placing an S atom on one side of the slab (see Fig. 16.4). S was adsorbed in atop, bridge, or four-fold hollow sites. The S atom and only the three top Fe layers were allowed to relax. A k -point mesh of 6 × 6 × 1 was employed, as this gives a good description of FeS2 89,90 and clean Fe(110).52

vacuum spacing (~10Å) S Fe1 Fe2 Fe3 Fe4 Fe5 (a)

(b)

(c)

Fig. 16.4 (color online) Top and side views of the supercells used to model sulfur adsorbed in a p(2 × 2) arrangement ( 1/4 ML coverage) in (a) atop, (b) bridge, and (c) four-fold hollow sites.

529

STRUCTURE AND PROPERTIES OF IRON SURFACES

To determine the workfunction (defined as the energy required to remove an electron from the Fermi level, EF , to the vacuum) of the foregoing systems, a dipole correction was added in the direction perpendicular to the surface. As we have an asymmetric slab with the adsorbate placed on only one side of the slab, the electrostatic potential in the vacuum region will show a clear distinction between the each side of the slab, representing the adsorbed surface or the clean surface. The workfunction value, , is represented as = Evac − EF

(16.7)

where Evac is the electrostatic potential in the vacuum region of the supercell on the adsorbate side of the supercell and EF is the energy of the Fermi level. The change in workfunction value, , is calculated by subtracting the workfunction of the clean surface from that of the adsorbed surface. 16.3.3.3 Binding Energy and Workfunction Measurements The calculated binding energy values (Table 16.4) indicated that the hollow site is the most favored, and is in agreement with experimental data,84 followed by the bridge and then the atop sites. The calculated workfunction values and workfunction changes for S/Fe(110) in the three adsorption sites are shown in Table 16.4; our calculated values compared well to the experimental segregation energy value of 5.2 eV91 as does the calculated clean surface value with the experimental value of 5.12 ± 0.06 eV.92 The change in sign of the workfunction values after S adsorption was similar to other atomic adsorbates, such as oxygen, which also show a negative workfunction change,93 indicating a negatively charged surface species. As the magnitude of the workfunction change was only very small, it suggested that there is little transfer of charge from the Fe to the S. The change in workfunction values after S adsorption were largest for the atop site, followed by the bridge and then four-fold hollow site. 16.3.3.4 Adsorption Geometry After adsorption of S, the calculations showed that both relaxation and surface reconstruction occurred.87 Table 16.5 shows the TABLE 16.4 Parameters Calculated for S Adsorbed on Fe(110) in the Atop, Bridge, and Four-fold Hollow Sitesa Adsorption Site Parameter

Atop

Bridge

Hollow

BE (eV) (eV) (eV)

4.52 5.08 0.24

5.32 4.999 0.15

5.82 4.98 0.14

Source: Ref. 87. a BE, binding energy; , workfunction; , change in workfunction after S adsorption. The workfunction for the clean Fe(110) surface was calculated to be 4.84 eV, using a five-layer slab.

530

METAL SURFACES AND INTERFACES

TABLE 16.5 Calculated Distances for S Adsorbed on Fe(110) in a p(2 × 2) Arrangement in Atop, Bridge, and Four-Fold Hollow Sitesa Adsorption Site ˚ Distance (A) d⊥ (S–FeS d (S–Fe)

Atop87

Bridge87

Four-fold Hollow87

Four-fold Hollow84

2.06 (1.797) 2.06

1.70 2.15

1.49 2.19

1.43 2.17

Source: Ref. 87. a Included are the corresponding values determined from LEED measurements84 for the four-fold hollow site: the perpendicular height of S above the highest atom in the topmost Fe layer, d⊥ (S–FeS ), and the shortest S–Fe distance, d(S–Fe).

calculated distances between the adsorbed S and closest Fe atom, the height of S above the surface and the experimental LEED84 values for the 4-fold hollow site. The perpendicular height of the adsorbed S above the top Fe layer (Table 16.5) increases going from the four-fold hollow to the bridge and atop sites, as the S lies closer to the surface for the more highly coordinated adsorption sites. The shortest S–Fe bond distances were again related to the coordination number of the adsorption site; the S–Fe bond distance is shorter for the atop site, where it is bonding directly to one atom but is longer for the bridge and four-fold hollow sites, where the bonding is distributed over more atoms. Interestingly, some buckling of the surface layers was observed after S adsorption. For the four-fold hollow site all Fe atoms in the top layer relax upward slightly, opposite to the clean Fe(110) surface. In addition, the two Fe atoms lying farther from the S moved upward, while the two atoms closest to the S only moved upward, which resulted in the S–Fe distances to these four surface atoms being equalized, maximizing the S–Fe coordination. The second-layer Fe atoms were less buckled and the third-layer Fe atoms were bulklike, in good agreement with experimental data.84 For the bridge site, there was also some buckling of the surface layer, similar to the four-fold hollow site; for the second layer there was some small buckling, while the third layer was bulklike. For the atop site, all surface layer Fe atoms relaxed upward slightly, except for the atom directly below the adsorbed S, which moved downward. The atoms next closest to the S in the top layer relaxed upward, with the farthest ones also relaxing upward, but only slightly. The small displacement in the x - or y-direction indicated that the four-fold hollow site reconstructs the most and the atop site the least.52 For the fourfold hollow site, the second-layer Fe atoms showed no reconstruction, while those in the third layer reconstructed slightly but the movement was negligible ˚ (>(111). This order is the same as that calculated for the isolated surfaces in Section 16.3.1.4. As a result of the relative stability of the surfaces, despite the (111) matching interface having the largest Wsep of all the low-index interfaces, the lower stability of the surface indicates that it is less likely to exist as the clean bulk-terminated face. The d0 values calculated (Table 16.7) were found to be smaller for the matching interfaces than for the mismatching interfaces. In fact, the d0 values for the matching interfaces indicate that the interface forms the bulk structure at the equilibrium separation. For the mismatching interfaces, the d0 values were found ˚ 133 , as the to be approximately equal to the Fe–Fe bond distance of 2.482 A topmost Fe atoms on each surface forming the interface directly face each other. The l values (Table 16.7) calculated for the matching and mismatching interfaces were all close to each other and agreed with the empirically estimated ˚ for several Fe surfaces,46 except for average screening length value of 0.56 A the (111) mismatching interface, suggesting again that this interface is unlikely to form. The l values were slightly larger for the matching interfaces, indicating that the electronic interactions between the approaching surfaces forming the interface begin at a larger separation. The ideal peak interfacial stress values (Table 16.7), which give a measure of the maximum tensile stress that the interfaces can withstand without spontaneous cleavage, were shown to be in the same order as the Wsep values. 16.4.2 Relationship Between Adhesion and Electronic and Magnetic Properties

In this section we present new results investigating the relationship between adhesive energy and interfacial separation for the body-centered cubic (bcc) Fe(100), Fe(110), and Fe(111) interfaces. Both ideally matching and mismatching interfaces were considered in order to cover the endpoints of the range of adhesion of real surfaces. 16.4.2.1 Magnetic Properties and Adhesion of Fe Interfaces The computed layer-by-layer local atomic magnetic moments for the Fe(100), Fe(110), and Fe(111) interfaces in match and mismatch at three interfacial separation distances ˚ separation, the interfaces at approximately infinite separation; 4 A, ˚ (d )20,21 : 10 A the approximate distance at which metallic interactions begin to dominate; and the equilibrium separation (Eq.) are shown in Fig. 16.9. Figure 16.9 shows that for the (100) match interface, the top surface layer μB changes considerably as the surfaces approach, while the second and third layers change only slightly and the lower layers, hardly at all. At the equilibrium interfacial separation, the μB values differ very little from layer to layer, consistent with the fact that at this separation the system is essentially bulk Fe. For the mismatch interface, it is again the surface μB that is most changed upon

541

STRUCTURE AND PROPERTIES OF IRON INTERFACES

1

magnetic moment (μB)

2 3 4 layer number

5

6

10Å 4Å 1.99Å(Eq.)

3

0

1

2 3 4 layer number

5

6

Fe(111) Match

3.5 magnetic moment (μB)

2

0

1

2

3 4 5 layer number

6

7

Fe(110) Mismatch 10Å 4Å 2.43Å(Eq.)

3

2.5

2.5

2

0

1

2

3 4 5 layer number

6

7

Fe(111) Mismatch

3.5 10Å 4Å 1.5Å 0.8Å(Eq)

3

10Å 4Å 2.39Å(Eq.)

3

2.5

2.5 2

10Å 4Å 2.43Å(Eq.)

3

3.5

Fe(110) Match

magnetic moment (μB)

0

3.5

2

Fe(100) Mismatch

2.5

2.5 2

magnetic moment (μB)

10Å 4Å 2Å 1.39Å(Eq.)

3

magnetic moment (μB)

magnetic moment (μB)

3.5

Fe(100) Match

3.5

0

1

2 3 4 layer number

5

6

2

0

1

2

3 4 5 layer number

6

7

Fig. 16.9 (color online) Calculated layer-by-layer magnetic moment values (μB ) for the match and mismatch Fe(100), (100), and (111) interfaces at the interfacial separations indicated; Eq. is the equilibrium separation.

formation of the interface, while the lower layers stay almost constant. At the equilibrium interfacial separation the surface μB is still enhanced, as the bulk crystal is not formed when the surfaces are out of epitaxy. The (110) match and mismatch interfaces display similar trends to the (100) interfaces where the second- and third-layer μB values stay almost the same as those of the lower layers. The third layers of both (110) interfaces, however, appear to be less affected than they are on the (100) interface. This surface is more closely packed than the (100) surface, and hence it would be expected that the lower layers would be less affected by changes occurring at the surface layer. The (111) match and mismatch interfaces also show a surface layer magnetic moment enhancement; however, in addition to the surface layer, the second- and third-layer μB values are clearly altered as the interfacial separation is decreased. For this less close-packed surface, the second and third atomic layers are more exposed. It can therefore be suggested that there are surface states localized on

542

METAL SURFACES AND INTERFACES ΔμΒ

ΔμΒ

ΔμΒ

–0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 –0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 –0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0 0

–6

–3

–3

–4

–4

Match Mismatch

–5 Eq. –6

Eq.

–2

–2

–3 –5 Eq.

–1

–1 Eq.

Ead (kJ/mol)

–2

Ead (kJ/mol)

Ead (kJ/mol)

–1

Eq.

–4 –5 –6

Eq.

20,21 Fig. 16.10 (color online) Adhesion energy values, Ead, plotted against surface layer magnetic moment enhancements, μB = μBsurface − μBbulk , corresponding to the same interfacial separations for the (100), (110), and (111) interfaces (from left to right) in match and mismatch (triangles).

these “lower-layer” atoms, and as the surfaces are brought together, the lowerlayer surface states also begin to interact, resulting in changes in their computed magnetic moments. This is in contrast to the (100) and (110) surfaces, where atoms below the topmost layer are fully (i.e., bulk) coordinated; their magnetic moment values are therefore close to those computed for the bulk, and changes in interfacial separations have negligible influence. This observation is consistent with this surface being more open. The relation between the surface μB changes and the adhesion energy can be seen from Fig. 16.10, where the values for the surface μB enhancement, μB (the difference between the surface atomic layer μBsurface and the computed bulk μBbulk ), and the adhesion energy, Ead , for the interfaces have been plotted. For all three matching interfaces the adhesion energy decreases with decreasing μB until the adhesion energy reaches a minimum when the interface is most stable (bulklike), and the enhancement is essentially zero. For the mismatching interfaces, the adhesion energy decreases as the μB decrease but μB does not reach zero at the minimum adhesion energy because the bulk crystal structure is not formed. 16.4.2.2 Density of States DOS of Matching Interfaces The surface layer density of states (S-DOS), resolved to up- and down-spin states, for all interfaces were calculated at four ˚ 4 A, ˚ Eq., and a separation between 4 A ˚ and Eq. interfacial separations: 10 A, As the difference in magnitude of the up- and down-spin states at the Fermi level affects the surface μB enhancement, we examine how these states change as a function of interfacial separation. The S-DOS for the matching (100) interface are shown in Fig. 16.11. At 10˚ separation, the S-DOS are identical to those seen earlier for the unrelaxed A surface (Fig. 16.2), as this separation represents the isolated surfaces.20 The values calculated for the up- and down-spin DOS at EF (Table 16.8) show the presence of more down-spin states at EF , which gives rise to the surface μB enhancement.

543

STRUCTURE AND PROPERTIES OF IRON INTERFACES

TABLE 16.8 Number of Up- and Down-Spin States at the Fermi Energy (in States/eV Atom) for Match and Mismatch Interfaces at Interfacial Separations ˚ and Equilibrium (Eq.) of 10 A Match Interface (100) (110) (111)

Mismatch

Interfacial Separation

Up

Down

Up

Down

˚ 10 A Eq. ˚ 10 A Eq. ˚ 10 A Eq.

0.09 0.79 0.16 1.00 0.08 0.45

1.02 0.23 0.86 0.52 1.45 0.33

0.10 0.13 0.14 0.20 0.07 0.15

0.81 1.24 0.87 0.45 1.33 2.03

˚ is shown That there is little chemical interactions of the surfaces for d >4 A by the similarity in the S-DOS, consistent with the similarity of the adhesion energy curves20 and the values of the surface μB enhancements. At the equilibrium interfacial separation, the number of down-spin states at EF has decreased significantly (see Table 16.8), the overall features of the S-DOS are those of bulk Fe (Fig. 16.3), and the up-spin S-DOS change significantly at EF , with an increased number of states at EF . As a result of these changes, there is a larger number of up-spin states at EF , as compared to larger separation distances, leading to a significant decrease in the surface μB at this separation. For the (110) matching interface, similar behavior is observed as the interfacial separation is decreased, but at the equilibrium separation there is a decrease in the down-spin states, whereas there is an increase in the up-spin states at EF , and the DOS resemble those of the bulk crystal structure (Fig. 16.3). The up- and down-spin S-DOS of the matching (111) interface (Fig. 16.11) show behavior similar to that of the other two interfaces, with the down-spin states dominating at larger interfacial separations. At the equilibrium separation the up-spin states dominate at EF and the S-DOS resemble those of the bulk. This is consistent with the very small value computed for the μB enhancement. DOS of Mismatching Interfaces The resolved surface layer DOS values for the three mismatching interfaces were calculated and the up- and down-spin states ˚ separation, the Sat EF are shown in Table 16.8. For the (100) interface at 10 A DOS represents the isolated noninteracting surface. As the interfacial separation is decreased, the down-spin states present near EF vary slightly in number, but unlike the matching interface, they are still present at the equilibrium separation, still having an increased number of down-spin states, indicating an enhanced surface μB value. Similar behavior is seen for the DOS of the (110) and (111) mismatching interfaces. 16.4.2.3 Charge Density The charge-density distribution of the (100), (110), and (111) matching and mismatching interfaces was examined at two different interfacial separations: equilibrium separation and a separation greater than

METAL SURFACES AND INTERFACES

Fe(100)

4 3

Up

2 1

-5 -4

10Å 3.95Å 2Å 1.39Å(Eq.)

n(E) (states/eVatom)

544

-1 EF

energy (eV) 1

2

-2 Down -3 4 3 Up

2 1

-5 -4

10Å 4Å 1.99Å(Eq.)

n(E) (states/eVatom)

Fe(110)

-1 EF

energy (eV) 1

2

-2 Down -3 5 4 Up

3 2 1

10Å 4Å 1.5Å 0.8Å(Eq.)

n(E) (states/eVatom)

Fe(111)

energy (eV) -5 -4

-1 EF

1

2

3

Down -2 -3

Fig. 16.11 Surface layer density of states (resolved to up- and down-spin states) for the (100), (110), and (111) matching interfaces at the interfacial separation indicated, including equilibrium (Eq.). The DOS values have not been smoothed.

STRUCTURE AND PROPERTIES OF IRON INTERFACES a) match interface low

545

b) mismatch interface

high

d d

2Å

1.39 Å(equil.) (a)

2.43 Å(equil.)

4Å (b)

Fig. 16.12 (color online) Charge-density plots of (a) matching and (b) mismatching Fe(100) interfaces at the interfacial separation d indicated.

equilibrium. The plots shown in Fig. 16.12 correspond to a slice taken perpendicular to the (100) match and mismatch interfaces. ˚ (greater For the (100) matching interface (Fig. 16.12a) at a separation of 2 A than equilibrium), the plot shows a region of low charge density between the two surfaces forming the interface, indicating that negligible metallic bond formation ˚ there is a uniform distribution of occurs. At the equilibrium separation (1.39 A) the charge density between the atoms at the interface and the bulk, signifying bond formation has occurred and the bulk material formed. The (110) and (111) matching interfaces (not shown) show identical behavior at the corresponding interfacial separations. Hence, irrespective of the crystal face forming the interface in epitaxy, the interface is most stable when the charge density is evenly distributed between the atoms at the interface and those within the bulk. The charge-density plot for the corresponding (100) mismatching interface (Fig. 16.12b) shows that at an interfacial separation greater than equilibrium ˚ there is a region of very low charge density at the interface, separation (4 A), ˚ an similar to the matching interface. At the equilibrium separation (2.43 A), increase in the charge density between the closest surface atoms forming the interface indicates that some bonding occurs. However, there are large areas of low charge density between the directional bonds, which result in a much weaker interfacial energy than that in the epitaxial arrangement.20,21 The mismatching (110) and (111) interfaces show similar behavior. 16.4.2.4 Conclusions For all three surfaces studied, there is an enhanced magnetic moment at the surface due to an increased number of down-spin states as opposed to up-spin states at the Fermi level in the DOS, consistent with previous studies. The inclusion of surface relaxation in the calculations had little effect on the magnetic moment values and DOS. The magnetic moments calculated for the interfaces at a number of special interfacial separation distances were found to be related and were consistent with

546

METAL SURFACES AND INTERFACES

the adhesion properties obtained previously. The surface layer magnetic moment is most affected upon formation of the interface, with lower layers being less affected but most altered for more open surfaces. For the matching interfaces the surface layer magnetic moment enhancement decreases as the interfacial separation is reduced, until it reaches zero at the equilibrium separation. In contrast, for mismatching interfaces an enhanced surface magnetic moment is still present at the equilibrium separation, as manifested by the increased number of down-spin states at EF . The charge-density plots for different interfacial separations show rearrangement of the electron density as the surfaces are brought into contact in and out of epitaxy. There is little interaction between the surfaces at large interfacial separations, in agreement with the DOS and magnetic moment enhancement values, but for shorter separations they indicate bond formation. 16.4.3 Effect of Relaxation on Adhesion of Fe(100) Surfaces: Avalanche 16.4.3.1 Introduction and Previous Studies Avalanche is a process whereby the mutual attraction between two surfaces, at a critical interfacial separation, causes the surface atoms to displace toward the opposing surface, resulting in a collapse of the two slabs to form a single slab. A number of studies have examined this effect using a range of computational methods.134 – 138 Good and Banerjea139 performed Monte Carlo simulations at room temperature on bcc Fe and W140,141 and found that avalanche still occurred for Fe(110) interfaces that were out of registry; however, it was inhibited when the surfaces were far out of registry and when only a few layers near the surface were allowed to relax. Also, the energy released in the avalanche decreased as the loss of registry increased. A study of the avalanche effect for silicon (111) surfaces142 showed covalent bond effects, indicating the importance of using quantum mechanical methods. None of these studies, however, employed quantum mechanical techniques to examine avalanche in adhesion between metallic surfaces. Furthermore, no lateral displacements were allowed during the simulations, preventing the study of avalanche formation, or avalanche of a mismatching interface into a matching one. 16.4.3.2 Interface Models The Fe interfaces were modeled using the supercell approximation, described in Section 16.2.2. Surfaces were cleaved from a crystal structure of bcc Fe, corresponding to the (100) Miller plane; the specific details of the individual models and their graphical representations have been explained by Spencer et al.131 In model I131 the sandwich approach was used to represent the match and mismatch interfaces, which means that only one vacuum spacer was positioned between the surfaces, comprising six layers each for the match interface and six and five layers for the mismatch interface. The three-dimensional periodic boundary conditions (PBCs) were then applied to the cell. For the match interface, the two middle-layer atom positions were fixed; for the mismatch interface the

STRUCTURE AND PROPERTIES OF IRON INTERFACES

547

middle layer of atoms was fixed. All other atoms were allowed to relax. We defined the initial and final interfacial separations as the distance between the boundary layers of the original and relaxed separated surfaces, respectively. The ˚ total energies were calculated for separations from approximately 1 to 10 A. Model II131 was identical to model I except that no surface layers were fixed ˚ was added in the z -direction to allow and an additional vacuum spacer of >30 A the entire slab to move in the z-direction during relaxation. The initial interfacial ˚ for both match and mismatch interfaces. The separation was approximately 3 A systems were then subject to the full geometry optimization, keeping the total volume of the supercell fixed. The energy at the final interfacial separation was calculated. ˚ were introduced in In model III,131 vacuum spacers of approximately 8 A the x-, y-, and z-directions, creating a periodic cluster-type model. The number of layers was similar to those of models I and II, but only a mismatch initial configuration was used for the geometry optimization. One surface (i.e., cluster) was fixed during the geometry optimization, while another one was free to move ˚ and the final in all three directions. The initial interfacial separation was 4.8 A, geometry was examined. 16.4.3.3 Summary of Findings In model I, the relaxation resulted in increasing the interlayer spacing throughout the surfaces. For the relaxed system, the ˚ and for the unrelaxed surface it interlayer spacing was approximately 1.58 A ˚ was 1.4345 A, making the relaxed interlayer spacing approximately 10% larger than the unrelaxed spacing. Further detailed analyses131 indicated that in such a system setup, a proper avalanche effect cannot occur because of the additional constraint on the fixed layers of the slabs as well as the periodic boundary conditions in all three dimensions, which cause unrealistic stretching of the interlayer spacing and formation of a highly strained crystal region. In model II, relaxation of the periodic boundary condition in one (z-) dimension resulted in the two surfaces jumping together. The equilibrium interfacial ˚ was achieved for the match and mismatch separation of 1.437 and 2.4996 A interfaces, respectively. The match interface value was approximately equal to ˚ as was expected. Similarly, the mismatch the bulk interlayer spacing (1.4345 A), ˚ The overall geometry interface was close to the bulk Fe–Fe distance of 2.47 A. at the center of the interface formed upon avalanche was bulklike, as opposed to the strained model I. The adhesion energy for the match interface after relaxation compared well with that obtained for the minimum-energy structure with the same interfacial separation using model I, but as the outer layers of model II were allowed to move, this resulted in surface relaxation and hence in slightly lower energy. In our model III, the two clusters were found to approach each other, forming a nearly matching interface with some minor structural imperfections due to a limited simulation time. However, the calculation clearly illustrated that if no constraints are imposed on the system, it will undergo avalanche and relax toward perfect registry.

548

METAL SURFACES AND INTERFACES

16.4.4 Effect of Sulfur Impurity on Fe(110) Adhesion 16.4.4.1 Introduction and Previous Studies In Section 16.3.3 we discussed the effects that S impurity can have on the properties of Fe surfaces. Experimentally, the presence of S contamination affects the adhesive strength of the interface compared to the clean surfaces143 – 145 but there are some conflicting findings. Also, the effect that S has on the structural, electronic, and magnetic properties has not been examined. Below we summarize our findings on the effect of the experimentally observed 1/4 ML coverage of S adsorbed in atop, bridge, and four-fold hollow sites on the adhesion properties of Fe(110) surfaces132 and how they compared to the clean interfaces. We also provide a brief summary of the effect of different S coverages on the properties of Fe(110) in Section 16.4.4.3. 16.4.4.2 Interface Models and Computational Parameters Adhesion between a relaxed S/Fe(110) surface and an unrelaxed clean Fe(110) surface was investigated in order to make a comparison with our previous study of adhesion between unrelaxed clean Fe(110) surfaces.20 Our S/Fe(110) surface models obtained previously87 and described in Section 16.4.4.1 were used to model the S-contaminated interfaces. The relaxed five-layer model with a S atom adsorbed in either an atop, bridge, or four-fold hollow site on one side of the slab in a p(2 × 2) arrangement represented a mismatch interface, where insertion of the vacuum spacer in the z -direction resulted in formation of the interface. An additional layer was added to the relaxed five-layer model to form the match interfaces. The definitions of match and mismatch are described according to the geometry of the interface formed when the S is removed. By adjusting the size or thickness of the vacuum spacer, different interfacial separations were modeled. The two surfaces forming the interfaces were defined as surface A [the relaxed S/Fe(110) surface] and surface B [the unrelaxed clean Fe(110) surface]. The interfacial separation was defined as the distance between the topmost Fe atoms on each surface. A diagram of the models employed can be found elsewhere.132 For all three matching interfaces the S atom lies between two different adsorption sites, one on surface A and the other on surface B. On surface A, the S atom lies above an atop, bridge, or four-fold hollow site, whereas on surface B, the S atom lies above a four-fold hollow, bridge, and atop site, respectively. For the bridge–site interface, the two Fe atoms forming the bridge site on surface B are oriented at right angles to those forming the bridge site on surface A. As the topmost Fe atoms and S atoms on surface A were relaxed, they showed some buckling (described previously by Spencer et al.87 ). The Fe atoms on surface B represented a clean bulk-terminated surface which did not show any buckling. The interfaces were described as atop, bridge, or hollow, depending on the site to which the S atom was adsorbed on surface A. As the work of separation, by definition, disregards the effect of plastic or diffusional processes, we performed further calculations to remove some of the constraints applied to our interface models and to examine the effect of relaxation of the interface at equilibrium. These calculations were performed on the

STRUCTURE AND PROPERTIES OF IRON INTERFACES

549

interfaces at the equilibrium separation and allowed all S and Fe atoms to relax while also allowing the cell volume to change. 16.4.4.3 Results Adhesion Energetics The adhesion energy values calculated for each interface132 are presented in Fig. 16.13, along with the fitted UBER parameters in Table 16.9.132 In all adsorption sites and for both match and mismatch interfaces, the UBER provides a good description of the adhesion values. The S was found to decrease the adhesion energy compared to the clean interface20 in all adsorption sites and alignments of match and mismatch. The strongest interface was with S adsorbed in atop sites in a matching orientation. For all interfaces, except the hollow interface, the match interfaces were stronger than the corresponding mismatching interfaces. Relaxation of the interfaces at the equilibrium separation led to an increase in the adhesion energy, but the interfaces were still weaker than the corresponding clean ones. For all interfaces, the S was found to increase the equilibrium interfacial separation, with the S–Fe distances to different adsorption sites on the two surfaces being consistent with the distances on the same sites on the isolated surface. The shortest S–Fe distances to surfaces A and B were found to be smaller than on the isolated surface, due to the attraction between the Fe atoms across the interface, bringing the two surfaces closer together. The relaxation introduced surface buckling of the clean surface due to the presence of S, as it did on the isolated surface, but of larger magnitude. A comparison of the S–Fe distances at the interface with those found in naturally occurring iron sulfide minerals indicated the presence of chemical bonds across the interface. Similar to the Wsep values, the screening length, l (Table 16.9), for each interface was reduced by the presence of S-contamination, showing that the attraction

0.5

4

6

8

10

0.5 -0.50

interfacial separation 2

4

6

8

10

Ead(Jm-2)

2

Ead(Jm-2)

-0.50

interfacial separation

-1.5

-1.5

-2.5

hollow bridge atop clean hollow UBERfit bridge UBERfit atop UBERfit clean UBERfit

-3.5

-4.5

(a)

-2.5 -3.5

-4.5

(b)

Fig. 16.13 (color online) Adhesion energy data calculated and fitted UBER curves for the 1/4-ML S-contaminated Fe(110) match (a) and mismatch (b) interfaces with S adsorbed in atop, bridge, and hollow sites. The clean Fe(110) interface data20 are shown for comparison. (From Ref. 132.)

550

METAL SURFACES AND INTERFACES

TABLE 16.9 UBER Parameters Calculated for the S-Contaminated Match and Mismatch Interfaces132 and Values for Clean Interfacesa Adsorption Site

Atop

Bridge

Hollow

Clean

0.88 (1.50) 3.55 (2.29) 0.37 1.000

4.494 1.991 0.590 0.99

1.32 (1.72) 3.03 (2.60) 0.45 0.995

2.795 2.427 0.588 0.99

Match Interface E0 = Wsep (Ead ) (J m−2 ) ˚ d0 (A) ˚ l (A) R2

1.79 (2.41) 3.30 (2.30) 0.47 0.998

1.30 (1.95) 3.30 (2.25) 0.43 0.998

Mismatch Interface −2

E0 = Wsep (Ead ) (J m ) ˚ d0 (A) ˚ l (A) R2

1.02 (1.16) 3.86 (3.10) 0.37 0.999

1.19 (1.42) 3.33 (2.78) 0.43 1.000

Source: Ref. 20. a The adhesion energy, Ead , and d0 values calculated for the relaxed S-contaminated interfaces are shown in parentheses.

between the contaminated surfaces occurs over a shorter separation distance than with a clean interface. The relative order of the l values is correlated to the dis˚ tance of the S atom from the underlying surface. In particular, from 6 to ∼3.5 A the attraction was greater than between the clean surface at the same separation, indicating that it is more likely to adhere. Charge Density Charge-density plots taken along the directions that cut the shortest S–Fe bonds across the interface were examined and compared for each interface (see Ref. 132). For both match and mismatch interfaces at the equilibrium separation, they showed that the S bonds to both surfaces A and B, bonding to the same atoms as on the isolated surface as well as the closest Fe atoms on the other surface. They also further supported the chemical as opposed to physical nature of the bonds formed at the interface. Bonding across the interface was in line with the interfacial geometry, being symmetrical for the mismatching interfaces. For each interface, however, there were regions of low charge density between adjacent S atoms which were not seen for the clean interfaces, as the S atom prevents the Fe atoms from getting close enough to interact as strongly across the interfacial boundary. After relaxation of these interfaces, these large regions of low charge density were reduced due to the structural changes that lead to a more even distribution of charge at the interface. Magnetic Moments The magnetic moment enhancements, μB , calculated for the Fe atoms most strongly bonded to the S atom on surfaces A and B were calcu˚ for lated as a function of interfacial separation. At an interfacial separation of 12 A both match and mismatch interfaces, the magnetic moment enhancements of Fe atoms on surfaces A and B were the same as seen on the isolated S-contaminated

STRUCTURE AND PROPERTIES OF IRON INTERFACES

551

surfaces87 and clean surface (see Section 16.3.2.2), respectively, in line with the adhesion energy curves. Hence, for the clean surface B, the enhancements were positive, as seen on the clean isolated surface, whereas they were negative for the S-contaminated surface A, as S quenches the enhancement seen on the clean surface. At smaller separations, the enhancements were found to stay the same until the separation where the surfaces began being attracted to each other. The values then generally decreased significantly by the equilibrium separation, with the values for surface A being largest for the hollow site, and smaller for the bridge and then atop sites. For surface B they were in the opposite order. After relaxation, the enhancements for all interfaces were found to decrease, becoming more negative as a result of the stronger interaction between the surfaces, giving rise to more spin pairing. Also, the magnetic moment enhancements for S bonding to the same sites on the different surfaces became identical, in line with the changes in geometry and charge density. Effect of Sulfur Coverage on Adhesion To determine how other coverages of S affect the interfacial properties of Fe, we performed density functional theory calculations of S adsorbed in three adsorption sites (atop, bridge, and four-fold hollow) at two different arrangements, c(2 × 2) and p(1 × 1), corresponding to coverages of 1/2 and 1 ML, respectively. We examine the same parameters as calculated for the 1/4 ML coverage for interfaces, both in and out of epitaxy. Different experimental studies of the effect of different coverages of S impurity on the adhesion of different Fe143 – 145 surfaces led to some conflict as to whether it increases or decreases the Fe adhesion. Buckley144 found that S appreciably decreased the adhesive strength of the Fe(110) interface formed through S segregation at 1/4 ML coverage and c(2 × 4) arrangement. In contrast, later studies by Hartweck and Grabke,143,145 found that segregated S increased the strength of adhesion of polycrystalline surfaces at submonolayer coverages, showing a maximum in the adhesive force at an estimated S coverage of 0.6 ML. S reduced the strength of adhesion compared to that of the clean surfaces at coverages greater than 1 ML. The differences have been suggested to be due to grain boundary effects. The adhesion energy curves and UBER parameters calculated from the fitted curve146 indicate that S reduces the adhesive strength of Fe(110) surfaces in match and mismatch orientations at all coverages examined ( 1/4, 1/2, and 1 ML). The largest work of separation was for the matching atop interface with 1/2 ML S coverage. For the mismatching configuration, the bridge 1/2 ML mismatching interface has the largest work of separation; however, it is still weaker than the strongest matching interface. The mismatching four-fold hollow 1 ML interface has such a low work of separation that it is unlikely to form. The charge-density slices of the matching and mismatching interfaces of the strongest match and mismatch interfaces examined are presented in Fig. 16.14. The magnetic moment enhancement values, μB , calculated for the Fe atoms closest to the S atoms on either side of the interface are also indicated.

552

METAL SURFACES AND INTERFACES

Surface B -0.41 0.02 S

d0

0.19 Fe 1

0.03

Fe2 Fe3 Fe4

Surface A

Fe5 Fe6

Fig. 16.14 (color online) Charge-density plots of the atop match and bridge mismatch interfaces with 1/2-ML S coverage. Slices are taken through the azimuths indicated. The calculated magnetic moment enhancement values, μB , of the Fe atoms closest to the S atoms on either side of the interface are also indicated.

Overall, compared to the results for the clean interface, we found that the interfacial separation was increased by the presence of S. The distance of S from the two surfaces was also found to be related directly to the type of adsorption site in which S sits at the two surfaces. 16.4.5 Effect of Sulfur Impurity on Fe(100) Adhesion: A Brief Summary

We have performed a detailed study of the effects of S on the adhesion of the (100) surface of Fe using methodology similar to that employed for Fe(110), described in Section 16.4.4 and in the literature.119 Adhesion energy calculations show that at 1/2 ML coverage, S decreases the adhesive energy between the Fe(100) surfaces in both match and mismatch orientations, as was also seen for the Fe(110) match and mismatch interfaces with 1/4 ML coverage of adsorbed S. The strongest S-contaminated Fe(100) interface was found to be the atop match interface. The difference between the Wsep values calculated for the clean and S-contaminated atop and bridge mismatch interfaces, however, was only 6.5%, which is smaller than the difference for the corresponding Fe(110) interfaces. In particular, for these two interfaces (as well as for their matching counterparts), the adhesive attraction was found to be stronger at larger interfacial separations than it was for the corresponding clean interface. Hence,

SUMMARY, CONCLUSIONS, AND FUTURE WORK

553

this indicates that the S-contaminated interfaces can be more prone to adhesion. A complete report of the effects of 1/2 ML coverage of S on the adhesion properties of Fe(100) surfaces has been published elsewhere.119 16.5 SUMMARY, CONCLUSIONS, AND FUTURE WORK

The results above show that the (100) and (110) surfaces have almost identical surface energies, with the (110) being slightly lower while the (111) surface has the highest energy. The surface relaxation results demonstrate that for the (100) surface a contraction of the outer layer is observed while the second and third layers expand perpendicular to the surface plane; for the (110) surface, little relaxation occurs, indicating that it is essentially bulk cleaved; and for the (111) surface, the first two layers contract while the third expands, with the magnitude of the relaxations being much larger than for the other surfaces. The layer-resolved magnetic moment values, as well as up- and down-spinresolved density of states, indicate the presence of an enhanced magnetic moment at the surface which is only slightly affected by relaxation, with the more open (111) surface showing larger changes and the most closely packed (110) surface showing little change. The adsorption of atomic S on the Fe(100) and Fe(110) surfaces at different adsorbent surface densities at the atop, bridge, and hollow sites shows that for both the Fe(100) and Fe(110) surfaces, the hollow site is the most stable, followed by the bridge and atop sites. At all three sites, S adsorption results in minor surface reconstruction, the most significant being for the hollow site. All three adsorption configurations affect the underlying surface geometry, with S causing a buckling of the top Fe layer when adsorbed in an atop site. Comparisons between S-adsorbed and clean Fe surfaces revealed a reduction in the magnetic moments of surface layer Fe atoms in the vicinity of the S. At the hollow site, the presence of S causes an increase in the surface Fe d-orbital density of states but has no significant effect on the structure and magnetic properties of lower substrate layers. We have also modeled adhesion energy as a function of surface separation between clean, bulk-terminated Fe(100), Fe(110), and Fe(111) matched and mismatched surfaces. The values of the adhesion parameters obtained suggested that the (110) interface was slightly more stable than the (100) interface. However, the order of stability is reversed if the effects of both matching and mismatching interfaces are taken into consideration, in agreement with experimental findings. The (111) interface in epitaxy is much stronger than the mismatch interface. Compared to the (100) and (110) interfaces, the (111) match interface is strongest, whereas the (111) mismatch interface is the weakest. In addition, we have examined the relationship between magnetic and electronic properties and adhesion of the Fe(100), Fe(110), and Fe(111) surfaces and found that for matching interfaces, the surface layer magnetic moment is enhanced for larger interfacial separations and decreases to the bulk value as the surfaces are brought together. The enhancement approaches zero at the minimum

554

METAL SURFACES AND INTERFACES

adhesion energy, where the bulk solid is formed. The lower layers show smaller enhancements with little or no enhancement at the centre of the slab. The mismatch interfaces show similar behavior, but the enhancement does not reach zero at the equilibrium separation, as the bulk structure is not formed. To consider the dynamics of the interface formation, we have studied the avalanche effect between Fe(100) surfaces, in match and mismatch, and the role of model constraints on the results. When the central layers of the two surfaces are constrained, the surface layers are attracted toward each other, forming a strained crystal region at intermediate interfacial separations, but if the constraints in the z -direction are lifted, the surfaces avalanche together. When the surfaces are allowed to move sideways, an interface initially out of registry (mismatch) will tend to avalanche toward an interface that is in registry (match). The effects of adsorbed S on the adhesion of Fe(100) and Fe(110) surfaces have been studied by introducing S impurity in atop, bridge, and hollow sites at a range of coverages in match and mismatch interfaces. The calculated minima of the adhesion energy curves show that the presence of S on the surface reduces the strength of the interface. However, the contaminated interfaces can be more prone to adhesion, as the increased adhesive energy values at larger separations show. The effect of adsorbed S on the charge-density distribution and magnetic properties of the interface have also been examined and related to the interfacial geometry. The effect of relaxation of the interfaces at equilibrium was also investigated and was shown to increase the strength of the interface while reducing the equilibrium interfacial separation. Some recent studies have included modeling of the surface properties of the three low-index faces of Fe33,147 – 150 ; experiments and modeling of various properties of Fe nanoparticles,151,152 nanowires,153 and nanosized clusters154 ; adhesion and other properties of high-toughness steels155,156 ; and the behavior of segregated S at an Fe grain boundary.157 Finally, it must be emphasized that having developed several approaches to model Fe substrate structures, we can now create various surface defects and impurities as well as controlled modified surface models, with modifications ranging from individual atoms, molecules, nanoclusters, and thin layers to study their effects on the surface and interface properties and the effects of temperature and pressure on the structure and properties of surfaces and interfaces. With the current focus on miniaturization, the ability to modify surfaces atomically for specific applications opens up enormous possibilities for theoretical experimentation with various conditions, surface modifications, and resultant properties, which has a great potential to aid laboratory synthesis and fabrication. Acknowledgments

We thank BHP Billiton and, specifically, their (now retired) chief scientist and vice president for technology, Robert O. Watts, for providing the initial motivation for this work and financial support. Useful discussions with Mike Finnis (Imperial College London) are gratefully acknowledged. This research was undertaken

REFERENCES

555

on the Victorian Partnership for Advanced Computing and the NCI Facility, Australia, which is supported by the Australian Commonwealth Government.

REFERENCES 1. Baddoo, N. R. J. Constr. Steel Res. 2008, 64 , 1199. 2. Kuziak, R.; Kawalla, R.; Waengler, S. Arch. Civ. Mech. Eng. 2008, 8 , 103. 3. Camley, R. E.; Celinski, Z.; Fal, T.; Glushchenko, A. V.; Hutchison, A. J.; Khivintsev, Y.; Kuanr, B.; Harward, I. R.; Veerakumar, V.; Zagorodnii, V. V. J. Magn. Magn. Mater. 2009, 321 , 2048. 4. Grabke, H. J. Mater. Corros. 2003, 54 , 736. 5. Georg, D. Eng. Aus. 2000, 72 , 30. 6. Castle, J. E. J. Adhes. 2008, 84 , 368. 7. Hayashi, S.; Sawai, S.; Iguchi, Y. ISIJ Int . 1993, 33 , 1078. 8. Payne, M. C.; Teter, M. P.; Allan, D. C.; Arias, T. A.; Joannopoulos, J. D. Rev. Mod. Phys. 1992, 64 , 1045. 9. Greeley, J.; Norskov, J. K.; Mavrikakis, M. Annu. Rev. Phys. Chem. 2002, 53 , 319. 10. Gross, A. Surf. Sci . 2002, 500 , 347. 11. Segall, M. D.; Lindan, P. J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C. J. Phys. Condes. Matter 2002, 14 , 2717. 12. Velde, G. T.; Bickelhaupt, F. M.; Baerends, E. J.; Guerra, C. F.; Van Gisbergen, S. J. A.; Snijders, J. G.; Ziegler, T. J. Comput. Chem. 2001, 22 , 931. 13. Nagy, A. Phys. Rep. Rev. Sec. Phys. Lett . 1998, 298 , 2. 14. Ordejon, P. Phys. Status Solidi B 2000, 217 , 335. 15. Schwarz, K.; Blaha, P. Comput. Mater. Sci . 2003, 28 , 259. 16. Pisani, C. J. Mol. Struct. (Theochem) 1999, 463 , 125. 17. Hong, T.; Smith, J. R.; Srolovitz, D. J. J. Adhes. Sci. Technol . 1994, 8 , 837. 18. Hong, T.; Smith, J. R.; Srolovitz, D. J. Phys. Rev. B 1993, 47 , 13615. 19. Raynolds, J. E.; Smith, J. R.; Zhao, G.-L.; Srolovitz, D. J. Phys. Rev. B 1996, 53 , 13883. 20. Hung, A.; Yarovsky, I.; Muscat, J.; Russo, S.; Snook, I.; Watts, R. O. Surf. Sci . 2002, 501 , 261. 21. Spencer, M. J. S.; Hung, A.; Snook, I. K.; Yarovsky, I. Surf. Sci . 2002, 515 , L464. 22. Hong, S. Y.; Anderson, A. B.; Smialek, J. L. Surf. Sci . 1990, 230 , 175. 23. Hong, T.; Smith, J. R.; Srolovitz, D. J. Phys. Rev. Lett. 1993, 70 , 615. 24. Hong, T.; Smith, J. R.; Srolovitz, D. J. Acta Metall. Mater. 1995, 43 , 2721. 25. Raynolds, J. E.; Roddick, E. R.; Smith, J. R.; Srolovitz, D. J. Acta Mater. 1999, 47 , 3281. 26. Smith, J. R.; Cianciolo, T. V. Surf. Sci . 1989, 210 , L229. 27. Smith, J. R.; Hong, T.; Srolovitz, D. J. Phys. Rev. Lett. 1994, 72 , 4021. 28. Smith, J. R.; Raynolds, J. E.; Roddick, E. R.; Srolovitz, D. J. J. Comput. Aided Mater. Des. 1996, 3 , 169.

556

METAL SURFACES AND INTERFACES

29. Smith, J. R.; Raynolds, J. E.; Roddick, E. R.; Srolovitz, D. J. Processing and Design Issues in High Temperature Materials: Proceedings of the Engineering Foundation Conference, 1997, p. 37. 30. Finnis, M. W. J. Phys. Conders. Matter 1996, 8 , 5811. 31. Grochola, G.; Russo, S. P.; Snook, I. K.; Yarovsky, I. J. Chem. Phys. 2002, 117 , 7685. 32. Grochola, G.; Russo, S. P.; Snook, I. K.; Yarovsky, I. J. Chem. Phys. 2002, 117 , 7676. 33. Grochola, G.; Russo, S. P.; Yarovsky, I.; Snook, I. K. J. Chem. Phys. 2004, 120 , 3425. 34. Grochola, G.; Russo, S. P.; Snook, I. K.; Yarovsky, I. J. Chem. Phys. 2002, 116 , 8547. 35. Kresse, G.; Furthmuller, J. Phys. Rev. B 1996, 54 , 11169. 36. Kresse, G.; Furthmuller, J. Comput. Mater. Sci . 1996, 6 , 15. 37. Kresse, G.; Hafner, J. Phys. Rev. B 1993, 48 , 13115. 38. Kohn, W.; Sham, L. J. Phys. Rev . 1965, 140 , 1133. 39. Perdew, J. P.; Zunger, A. Phys. Rev. B 1981, 23 , 5048. 40. Perdew, J. P.; Yue, W. Phys. Rev. B 1992, 45 , 13244. 41. Vanderbilt, D. Phys. Rev. B 1990, 41 , 7892. 42. Monkhorst, H. J.; Pack, J. D. Phys. Rev. B 1976, 13 , 5188. 43. Herper, H. C.; Hoffmann, E.; Entel, P. Phys. Rev. B 1999, 60 , 3839. 44. Jansen, H. J. F.; Peng, S. S. Phys. Rev. B 1988, 37 , 2689. 45. Dupre, A. Theorie mechanique de la chaleur, Gauthier-Villars, Paris, 1869. 46. Rose, J. H.; Smith, J. R.; Ferrante, J. Phys. Rev. B 1983, 28 , 1835. 47. Banerjea, A.; Smith, J. R. Phys. Rev. B 1988, 37 , 6632. 48. Feibelman, P. J. Surf. Sci . 1996, 360 , 297. 49. Shih, H. D.; Jona, F.; Jepsen, D. W.; Marcus, P. M. Surf. Sci . 1981, 104 , 39. 50. Shih, H. D.; Jona, F.; Bardi, U.; Marcus, P. M. J. Phys. C 1980, 13 , 3801. 51. Legg, K. O.; Jona, F.; Jepsen, D. W.; Marcus, P. M. J. Phys. C 1977, 10 , 937. 52. Spencer, M. J. S.; Hung, A.; Snook, I. K.; Yarovsky, I. Surf. Sci . 2002, 513 , 389. 53. Sokolov, J.; Jona, F.; Marcus, P. M. Phys. Rev. B 1986, 33 , 1397. 54. Xu, C.; O’Connor, D. J. Nucl. Instrum. Methods Phys. Res. 1990, 51 , 278. 55. Xu, C.; O’Connor, D. J. Nucl. Instrum. Methods Phys. Res. 1991, 53 , 315. 56. Yalisove, S. M.; Graham, W. R. J. Vac. Sci. Technol. A 1988, 6 , 588. 57. Rodriguez, A. M.; Bozzolo, G.; Ferrante, J. Surf. Sci . 1993, 289 , 100. 58. Johnson, R. A.; White, P. J. Phys. Rev. B 1976, 13 , 5293. 59. Kato, S. Jpn. J. Appl. Phys. 1974, 13 , 218. 60. Tyson, W. R. J. Appl. Phys. 1976, 47 , 459. 61. Tyson, W. R.; Ayres, R. A.; Stein, D. F. Acta Metall . 1973, 21 , 621. 62. Haftel, M. I.; Andreadis, T. D.; Lill, J. V.; Eridon, J. M. Phys. Rev. B 1990, 42 , 11540. 63. Linford, R. G.; Mitchell, L. A. Surf. Sci . 1971, 27 , 142. 64. Schweitz, J. A.; Vingsbo, O. Mater. Sci. Eng. 1971, 8 , 275.

REFERENCES

557

65. Gvozdev, A. G.; Gvozdeva, L. I. Fiz. Met. Metalloved . 1971, 31 , 640. 66. Avraamov, Y. S.; Gvozdev, A. G. Fiz. Met. Metalloved . 1967, 23 , 405. 67. Gilman, J. J. Cleavage, ductility and tenacity in crystals. In Fracture in Solids, Averbach, B. L., Felbeck, D. K., Hahn, G. T., and Thomas, B. L., Eds., Wiley, New York, 1959, p. 193. 68. Nicholas, J. F. Aust. J. Phys. 1968, 21 , 21. 69. Alden, M.; Skriver, H. L.; Mirbt, S.; Johansson, B. Surf. Sci . 1994, 315 , 157. 70. Vitos, L.; Ruban, A. V.; Skriver, H. L.; Kollar, J. Surf. Sci . 1998, 411 , 186. 71. Tyson, W. R.; Miller, W. A. Surf. Sci . 1977, 62 , 267. 72. Braun, J.; Math, C.; Postnikov, A.; Donath, M. Phys. Rev. B 2002, 65 , 184412. 73. Kishi, T.; Itoh, S. Surf. Sci . 1996, 358 , 186. 74. Ostroukhov, A. A.; Floka, V. M.; Cherepin, V. T. Surf. Sci . 1995, 333 , 1388. 75. Wu, R. Q.; Freeman, A. J. Phys. Rev. B 1993, 47 , 3904. 76. Eriksson, O.; Boring, A. M.; Albers, R. C.; Fernando, G. W.; Cooper, B. R. Phys. Rev. B 1992, 45 , 2868. 77. Alden, M.; Mirbt, S.; Skriver, H. L.; Rosengaard, N. M.; Johansson, B. Phys. Rev. B 1992, 46 , 6303. 78. Freeman, A. J.; Fu, C. L. J. Appl. Phys. 1987, 61 , 3356. 79. Ohnishi, S.; Freeman, A. J. Phys. Rev. B 1983, 28 , 6741. 80. Wang, C. S.; Freeman, A. J. Phys. Rev. B 1981, 24 , 4364. 81. Danan, H.; Herr, A.; Meyer, A. J. J. Appl. Phys. 1968, 39 , 669. 82. Binns, C.; Baker, S. H.; Demangeat, C.; Parlebas, J. C. Surf. Sci. Rep. 1999, 34 , 107. 83. Wu, R. Q.; Freeman, A. J. Phys. Rev. Lett. 1992, 69 , 2867. 84. Shih, H. D.; Jona, F.; Jepsen, D. W.; Marcus, P. M. Phys. Rev. Lett. 1981, 46 , 731. 85. Kelemen, S. R.; Kaldor, A. J. Chem. Phys. 1981, 75 , 1530. 86. Oudar, J. Bull. Soc. Fr. Mineral. Cristallogr. 1971, 94 , 225. 87. Spencer, M. J. S.; Hung, A.; Snook, I.; Yarovsky, I. Surf. Sci . 2003, 540 , 420. 88. Spencer, M. J. S.; Snook, I. K.; Yarovsky, I. J. Phys. Chem. B 2005, 109 , 9604. 89. Hung, A.; Muscat, J.; Yarovsky, I.; Russo, S. P. Surf. Sci . 2002, 513 , 511. 90. Hung, A.; Muscat, J.; Yarovsky, I.; Russo, S. P. Surf. Sci . 2002, 520 , 111. 91. Broden, G.; Gafner, G.; Bonzel, H. P. Appl. Phys. 1977, 13 , 333. 92. Fischer, R.; Fischer, N.; Schuppler, S.; Fauster, T.; Himpsel, F. J. Phys. Rev. B 1992, 46 , 9691. 93. Delchar, T. A. Surf. Sci . 1971, 27 , 11. 94. Schonhense, G.; Getzlaff, M.; Westphal, C.; Heidemann, B.; Bansmann, J. J. Phys. 1988, C8 , 1643. 95. Weissenrieder, J.; Gothelid, M.; Le Lay, G.; Karlsson, U. O. Surf. Sci . 2002, 515 , 135. 96. Berbil-Bautista, L.; Krause, S.; Hanke, T.; Bode, M.; Wiesendanger, R. Surf. Sci . 2006, 600 , L20. 97. Taga, Y.; Isogai, A.; Nakajima, K. Trans. Jpn. Inst. Met . 1976, 17 , 201. 98. Spencer, M. J. S.; Snook, I.; Yarovsky, I. J. Phys. Chem. B 2006, 110 , 956.

558

METAL SURFACES AND INTERFACES

99. Sinkovic, B.; Johnson, P. D.; Brookes, N. B.; Clarke, A.; Smith, N. V. Phys. Rev. B 1995, 52 , R6955. 100. Sinkovic, B.; Johnson, P. D.; Brookes, N. B.; Clarke, A.; Smith, N. V. Phys. Rev. Lett. 1989, 62 , 2740. 101. Johnson, P. D.; Clarke, A.; Brookes, N. B.; Hulbert, S. L.; Sinkovic, B.; Smith, N. V. Phys. Rev. Lett. 1988, 61 , 2257. 102. Clarke, A.; Brookes, N. B.; Johnson, P. D.; Weinert, M.; Sinkovic, B.; Smith, N. V. Phys. Rev. B 1990, 41 , 9659. 103. Fujita, D.; Ohgi, T.; Homma, T. Appl. Surf. Sci . 2002, 200 , 55. 104. Zhang, X. S.; Terminello, L. J.; Kim, S.; Huang, Z. Q.; Vonwittenau, A. E. S.; Shirley, D. A. J. Chem. Phys. 1988, 89 , 6538. 105. Didio, R. A.; Plummer, E. W.; Graham, W. R. Phys. Rev. Lett. 1984, 52 , 683. 106. Legg, K. O.; Jona, F.; Jepsen, D. W.; Marcus, P. M. Surf. Sci . 1977, 66 , 25. 107. Grabke, H. J.; Paulitschke, W.; Tauber, G.; Viefhaus, H. Surf. Sci . 1977, 63 , 377. 108. Grabke, H. J.; Petersen, E. M.; Srinivasan, S. R. Surf. Sci . 1977, 67 , 501. 109. Didio, R. A.; Plummer, E. W.; Graham, W. R. J. Vac. Sci. Technol. A 1984, 2 , 983. 110. Fernando, G. W.; Wilkins, J. W. Phys. Rev. B 1986, 33 , 3709. 111. Fernando, G. W.; Wilkins, J. W. Phys. Rev. B 1987, 35 , 2995. 112. Kishi, T.; Itoh, S. Surf. Sci . 1996, 363 , 100. 113. Huff, W. R. A.; Chen, Y.; Zhang, X. S.; Terminello, L. J.; Tao, F. M.; Pan, Y. K.; Kellar, S. A.; Moler, E. J.; Hussain, Z.; Wu, H.; Zheng, Y.; Zhou, X.; von Wittenau, A. E. S.; Kim, S.; Huang, Z. Q.; Yang, Z. Z.; Shirley, D. A. Phys. Rev. B 1997, 55 , 10830. 114. Chubb, S. R.; Pickett, W. E. J. Appl. Phys. 1988, 63 , 3493. 115. Chubb, S. R.; Pickett, W. E. Phys. Rev. B 1988, 38 , 10227. 116. Chubb, S. R.; Pickett, W. E. Phys. Rev. B 1988, 38 , 12700. 117. Anderson, A. B.; Hong, S. Y. Surf. Sci . 1988, 204 , L708. 118. Hong, S. Y.; Anderson, A. B. Phys. Rev. B 1988, 38 , 9417. 119. Nelson, S. G.; Spencer, M. J. S.; Snook, I.; Yarovsky, I. Surf. Sci . 2005, 590 , 63. 120. Todorova, N.; Spencer, M. J. S.; Yarovsky, I. Dynamic properties of the sulfurcontaminated Fe(110) surface. In Proceedings of the Australian Institute of Physics 16th Biennial Congress, Canberra, Australia, 2005. 121. Todorova, N.; Spencer, M. J. S.; Yarovsky, I. Surf. Sci . 2007, 601 , 665. 122. Verlet, L. Phys. Rev . 1967, 159 , 98. 123. Nose, S. Prog. Theor. Phys. Suppl . 1991, 1. 124. Jiang, D. E.; Carter, E. A. J. Phys. Chem. B 2004, 108 , 19140. 125. Kamakoti, P.; Sholl, D. S. J. Membr. Sci . 2003, 225 , 145. 126. Haug, K.; Jenkins, T. J. Phys. Chem. B 2000, 104 , 10017. 127. Spencer, M. J. S.; Todorova, N.; Yarovsky, I. Surf. Sci . 2008, 602 , 1547. 128. Spencer, M. J. S.; Yarovsky, I. J. Phy. Chem. C 2007, 111 , 16372. 129. Narayan, P. B. V.; Anderegg, J. W.; Chen, C. W. J. Electron Spectrosc. Relat. Phenom. 1982, 27 , 233. 130. Shanabarger, M. R. A comparison of adsorption kinetics on iron of H2 and H2 S. In Hydrogen Effects in Metals, Bernstein, J. M., and Thompson, A. W., Eds., The Metallurgical Society of AIME, Warrendale, PA, 1981, p. 135.

REFERENCES

559

131. Spencer, M. J. S.; Hung, A.; Snook, I.; Yarovsky, I. Surf. Rev. Lett. 2003, 10 , 169. 132. Spencer, M. J. S.; Snook, I. K.; Yarovsky, I. J. Phys. Chem. B 2004, 108 , 10965. 133. Handbook of Chemistry and Physics, 70th ed., CRC Press, Metals Park, OH, 1989–1990. 134. Taylor, P. A.; Nelson, J. S.; Dodson, B. W. Phys. Rev. B 1991, 44 , 5834. 135. Taylor, P. A. Phys. Rev. B 1991, 44 , 13026. 136. Smith, J. R.; Bozzolo, G.; Banerjea, A.; Ferrante, J. Phys. Rev. Lett. 1989, 63 , 1269. 137. Good, B. S.; Banerjea, A.; Smith, J. R.; Bozzolo, G.; Ferrante, J. Mater. Res. Soc. Symp. Proc. 1990, 193 , 313. 138. Lynden-Bell, R. M. Surf. Sci . 1991, 244 , 266. 139. Good, B. S.; Banerjea, A. J. Phys. Condens. Matter 1996, 8 , 1325. 140. Banerjea, A.; Good, B. S. Int. J. Mod. Phys. B 1997, 11 , 315. 141. Banerjea, A.; Good, B. S. Indian J. Phys. 1995, 69A, 105. 142. Nelson, J. S.; Dodson, B. W.; Taylor, P. A. Phys. Rev. B 1992, 45 , 4439. 143. Hartweck, W.; Grabke, H. J. Surf. Sci . 1979, 89 , 174. 144. Buckley, D. H. Int. J. Nondestructive Test. 1970, 2 , 171. 145. Hartweck, W. G.; Grabke, H. J. Acta Metall . 1981, 29 , 1237. 146. Spencer, M. J. S.; Snook, I. K.; Yarovsky, I. J. Phys. Chem. B 2005, 109 , 10204. 147. Jiang, D. E.; Carter, E. A. Surf. Sci . 2003, 547 , 85. 148. Zhang, J. M.; Ma, F.; Xu, K. W. Surf. Interface Anal . 2003, 35 , 662. 149. Blonski, P.; Kiejna, A. Vacuum 2004, 74 , 179. 150. Wang, X. C.; Jia, Y.; Qiankai, Y.; Wang, F.; Ma, J. X.; Hu, X. Surf. Sci . 2004, 551 , 179. 151. Postnikov, A. V.; Entel, P.; Soler, J. M. Eur. Phys. J. D 2003, 25 , 261. 152. Postnikov, A. V. Surface relaxation in solids and nanoparticles. In Computational Materials Science, Vol. 187, Catlow, R., and Kotomin, E., Eds., IOS Press, Amsterdam, 2003, p. 245. 153. Mohaddes-Ardabili, L.; Zheng, H.; Ogale, S. B.; Hannoyer, B.; Tian, W.; Wang, J.; Lofland, S. E.; Shinde, S. R.; Zhao, T.; Jia, Y.; Salamanca-Riba, L.; Schlom, D. G.; Wuttig, M.; Ramesh, R. Nat. Mater. 2004, 3 , 533. 154. De Hosson, J. T. M.; Palasantzas, G.; Vystavel, T.; Koch, S. JOM 2004, 56 , 40. 155. Hao, S.; Moran, B.; Liu, W. K.; Olson, G. B. J. Comput. Aided Mater. Des. 2003, 10 , 99. 156. Hao, S.; Liu, W. K.; Moran, B.; Vernerey, F.; Olson, G. B. Comput. Methods Appl. Mech. Eng. 2004, 193 , 1865. 157. Gesari, S. B.; Pronsato, M. E.; Juan, A. J. Phys. Chem. Solids 2004, 65 , 1337.

17

Surface Chemistry and Catalysis from Ab Initio–Based Multiscale Approaches CATHERINE STAMPFL School of Physics, The University of Sydney, Sydney, Australia

SIMONE PICCININ CNR-INFM DEMOCRITOS National Simulation Center, [email protected] Group, Trieste, Italy

Chemical problems involving heterogeneous catalysis, diffusion, and related processes occur in systems that are too large to simulate using electronic structure methods directly, requiring either the use of prohibitively large samples and/or prohibitively long simulation times. However, methods such as density functional theory, augmented by statistical mechanics techniques such as kinetic Monte Carlo, can directly address the critical issues using multiscale techniques. As a result, phase diagrams for catalytic processes can be calculated and used to model real-time catalytic processes. Significant applications considered include CO catalytic conversion, hydrogen storage, and fuel cell operation.

17.1 INTRODUCTION

Theory, computation, and simulation have been identified repeatedly in international reports and technology road maps as key components of a successful strategy toward the implementation of new energy technologies.1,2 Indeed, they play a crucial role in the advancement and development of all new technologies that require knowledge and understanding on the atomic level as well as on the nanoscale. Materials by design and the growing, exciting role of computation/simulation are making impacts across multidisciplinary fields such as physics, chemistry, engineering, and biology.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

561

562

SURFACE CHEMISTRY AND CATALYSIS

Advances in catalytic science laid the foundation for the rapid development of the petroleum and chemical industries in the twentieth century, which contributed directly to the substantial increase in the standard of living in industrialized countries. Traditionally, catalytic science has progressed through trial and error, requiring many thousands of experiments involving complex combinations of metals, metal compounds, promoters, and inhibitors.3 With increased awareness of the need for new and improved green energy technologies and processes for an environmentally clean and sustainable future, catalysis researchers are focusing on ways to improve existing applications and develop new ones. Control and understanding on the atomic level of surface and material properties is crucial for the development of cutting-edge technologies. Lack of such knowledge presently hinders further progress in already established applications and prevents real advances in promising ones which are still at the conceptual level. Modern imaging and spectroscopic techniques are being extended to operate under increasingly realistic conditions (e.g., high pressures, high temperatures),4 and can provide quantitative information at an unprecedented level. However, determination of important properties such as adsorption and reaction energetics, structure of surface species, and the nature of transient intermediates and transition states are still highly challenging. Increasingly, accurate quantum mechanical calculations are being used to investigate such quantities and to predict new materials and structures that may lead to improved efficiencies and selectivities. Indeed, an ultimate goal of catalysis and materials research is to control chemical reactions and materials properties so that one can synthesize any desired molecule or material. Understanding the mechanisms and dynamics of such transformations has been identified as a grand challenge for catalysis and advanced materials research.5 Calculation methods derived from advanced theoretical models and implemented in efficient algorithms are crucial for fundamental understanding and ultimately for steps toward first-principles design. By combining density functional theory (DFT) calculations with statistical mechanical approaches, phenomena and properties occurring on macroscopic length and long time scales can be achieved, affording accurate predictions of surface structures, phase transitions, diffusion, and increasingly, heterogeneous catalysis.6 – 10 The present chapter contains some recent applications of first-principles-based multiscale modeling approaches for describing and predicting surface structures, phase transitions, and catalysis. In particular, through specific applications, these approaches are highlighted: (1) ab initio atomistic thermodynamics, which predicts stable (and metastable) phases, from a pool of considered structures, in equilibrium with a gas-phase environment; (2) the ab initio lattice-gas Hamiltonian plus equilibrium Monte Carlo method, which can predict stable surface structures (without their explicit consideration), including order–disorder phase transitions; and (3) ab initio kinetic Monte Carlo simulations, which in addition to the above can describe the kinetics of a system (e.g., reaction rates).

PREDICTING SURFACE STRUCTURES AND PHASE TRANSITIONS

563

17.2 PREDICTING SURFACE STRUCTURES AND PHASE TRANSITIONS 17.2.1 Oxygen on Pd(111): The Lattice-Gas Plus Monte Carlo Approach

The surface structures that form on adsorbing species on a solid surface are dictated by the lateral interactions between them. Such interactions can also significantly affect the stability of the adsorption phase and thus affect the surface function and properties. This has important consequences, for example, for heterogeneous catalysis, which involves surface processes such as adsorption, diffusion, desorption, and chemical reactions. In particular, the carbon monoxide oxidation reaction has long served as a prototypical “simple” chemical reaction for experimental study, with the aim of achieving a deeper understanding on the microscopic level.11 This reaction is the basic reaction step in many industrial reactions and is also an important reaction in its own right, as illustrated, for example, by the fact that it is one of the main reactions that the three-way automotive catalytic converter catalyzes for pollution control and environmental protection. If atomic oxygen, adsorbed on transition metal surfaces, is exposed to CO gas, the metal catalyzes the formation of carbon dioxide through a Langmuir–Hinshelwood mechanism, in which both reactants are adsorbed on the surface prior to product formation, in this case CO2 .12 The activation energy of this reaction depends on the coverage of adsorbates, indicating that the lateral interactions are significant.13 In particular, for the O/Pd(111) system, it was found that upon exposure to CO, the p(2√× 2) √islands, which initially form on √ adsorption of oxygen, compress into ( 3 × 3)R30◦ (hereafter denoted by “ 3”) domains and finally into p(2 × 1) domains.14 These structural rearrangements have profound effects on the reactivity of CO2 formation: While the p(2 √ × 2) phase is unreactive for temperatures in the range 190 to 320 K, the 3 phase displays half-order kinetics with respect to oxygen coverage, suggesting that the reaction site is at the periphery of the O islands. For the p(2 × 1) phase, the reaction is first order, implying that the reaction proceeds uniformly over the O islands. As an initial step toward a detailed understanding of the role played by lateral interactions in the CO oxidation reaction over Pd(111), it is appropriate to investigate the behavior of the system in the presence of just the oxygen adsorbate. In the following, the lattice-gas Hamiltonian plus (LGH) Monte Carlo (MC) approach15,16 will be used to describe the O/Pd(111) system and to predict order–disorder phase transition temperatures for varying oxygen coverages.17 Such an approach affords identification of unanticipated geometries and stoichiometries and can be used to describe the coexistence of phases and disordered phases, as well as associated order–order and order–disorder phase transitions. The first step is to create a sufficiently accurate lattice-gas Hamiltonian (LGH),

564

SURFACE CHEMISTRY AND CATALYSIS

which can be written as H

LGH

=V

1

i

ni +

r m=1

Vm2

ij m

ni nj +

q m=1

Vm3

ni nj nk + · · ·

(17.1)

ij km

where ni indicates the occupation of site i , which is 0 if the site is empty or 1 if it is occupied; V 1 is the one-body term, which represents the adsorption energy of the isolated adsorbate; Vm2 are the two-body, or pair, interactions (where r pair interactions are considered, with m = 1 corresponding to nearest-neighbor interactions, m = 2 second nearest-neighbor interactions, and so on); Vm3 are the three-body, or trio, interactions (where q trio interactions are considered); and so on. The LGH [Eq. (17.1)] contains an infinite number of terms, but in practice it can be truncated, since higher-order interactions become negligible compared to the lower-order terms. The interactions considered to describe the O/Pd(111) system are illustrated in Fig. 17.1. The values of the interactions are determined from least-squares fits of energies for structures calculated using density functional theory, with oxygen coverages ranging from 19 monolayer (ML) to 1 ML. To determine which interactions to include in the expansion, and to evaluate the accuracy of the LGH, we use the leave-one-out cross-validation (LOO-CV) scheme (see Refs. 18–21). It is found for this system that the set of interaction

Fig. 17.1 (color online) Top view of the oxygen adsorbates on Pd(111), where the lateral interactions between O atoms considered in the lattice-gas Hamiltonian are shown. Light gray spheres represent Pd atoms, and small dark spheres, O atoms. (From Ref. 17.)

PREDICTING SURFACE STRUCTURES AND PHASE TRANSITIONS

565

parameters which yield a high accuracy consist of six lateral interactions: three two-body interactions (V12 , V22 , V32 , with respective values of 244, 39, and −6 meV; see Fig. 17.1), and three three-body interactions (V13 , V23 , V33 , with values 31, 30–49 meV) interactions.17 It is interesting to see that the values of the twobody interactions are remarkably similar to what has been reported for O/Pt(111) (238, 39, −6 meV)18 and for the O/Ru(0001) system (265, 44, −25 meV).22 Once the LGH has been constructed, its reliability can be tested by calculating the ground-state line (or convex hull ), which identifies the lowest-energy surface structures for a given coverage. In particular, it can be observed whether it correctly reproduces that obtained directly from DFT. The formation energies (from DFT or the LGH) are calculated as O(1×1)/Pd

Ef = [EbO/Pd − Eb

]

(17.2)

which shows the stability of a structure with respect to phase separation into a fraction of the full monolayer O(1 × 1)/Pd and a fraction, 1 − , of the clean slab. In Eq. (17.2), Eb represent the binding energy per oxygen atom of a given oxygen adsorption structure on the Pd(100) surface. For example, the binding energy of oxygen on a surface with 1 ML coverage is given by O(1×1)/Pd O(1×1)/Pd O(1×1)/Pd O O Pd Pd Eb = Etot − Etot − 1/2Etot2 , where Etot , Etot , and Etot2 are the total energies of the O(1 × 1)/Pd(100) structure, the clean Pd(100) surface, and an oxygen molecule, respectively. In Fig. 17.2, the formation energy as a function of oxygen coverage is shown. From it, the structures belonging to the convex hull (lowest-energy line) can be identified. All structures with a formation energy higher than that for the same coverage are unstable against phase

Fig. 17.2 (color online) Formation energy, Ef , versus coverage, , of the twenty-two structures calculated directly from density-functional theory (DFT) (large pale dots) and those obtained from the lattice-gas Hamiltonian (LGH). The continuous (lowest energy) line represents the convex hull. (From Ref. 17.)

566

SURFACE CHEMISTRY AND CATALYSIS

separation into the two closest structures belonging to the convex hull. It can be seen that there is an excellent agreement between the DFT and the LGH formation energies, except for very high coverages, where there are large atomic relaxations which are difficult to capture in the LGH. √ The ground-state geometries lying on the convex hull are the p(2 × 2), 3, and p(2 × 1) structures. The former two agree with experimental results.23 The p(2 × 1) structure is also observed experimentally, but only, for example, when the O/Pd(111) system is exposed to CO gas.14,23 Importantly, both DFT and the LGH calculations predict the same ground-state structures, indicating that the LGH is sufficiently accurate to describe the correct ordering of the adsorbates on the surface. Having constructed the LGH, it can be used, for example, to predict temperature-driven phase transitions. Although there are no experimental results for the O/Pd(111) system published to date, it can be expected, for example, that configurational entropy will drive a phase transition to a disordered phase at elevated temperatures. Such phase transitions have been reported for O/Ru(0001),15,24 where it was shown that the transition temperature depends strongly on the oxygen coverage. For this latter system, two peaks occur, one at 0.25 ML (800 K) and the other at 0.50 ML (600 K), which correspond to the stable p(2 × 2) and p(2 × 1) phases. Qualitatively, the same behavior was found for the O/Pt(111) system through similar theoretical simulations.18 Also, the O/Ni(111) system forms a stable p(2 × 2) structure, which exhibits a pronounced peak in the order–disorder transition temperature versus coverage curve.25 To investigate order–disorder phase transitions, Monte Carlo (MC) simulations can be carried out. In particular, we employ the Wang–Landau scheme, which affords an efficient evaluation of the configurational density of states, g(E ), (i.e., the number of system configurations with a certain energy, E ).26 – 29 From this, all major thermodynamic functions can be directly calculated, including the free energy, g(E)e−E/kB T = kB T ln(Z) (17.3) F (T ) = −kB T ln E

where Z is the partition function, kB is the Boltzmann constant, and T is the temperature. The internal energy is given as Eg(E)e−E/kB T (17.4) U (T ) = ET = E Z the specific heat as Cv (T ) =

E 2 T − E2T T2

(17.5)

PREDICTING SURFACE STRUCTURES AND PHASE TRANSITIONS

567

660 630

Tc (K)

600 570 540 510 480

0.2

0.3

0.4 Coverage (ML)

0.5

Fig. 17.3 Order–disorder transition temperature, Tc , as a function of the oxygen coverage. (From Ref. 17.)

and the entropy as X=

U −F T

(17.6)

Using the Wang–Landau scheme for a given coverage, a single simulation yields g(E ) and hence the transition temperature, Tc , while in traditional MC studies based on the Metropolis algorithm, one needs to perform a series of simulations at various temperatures to check the variations of a properly defined order parameter. From the divergence of the specific heat at the order–disorder transition temperature, the dependence on coverage of the transition temperature is obtained as shown in Fig. 17.3. In this figure two pronounced peaks occur, corresponding to the p(2 × 2) and p(2 × 1) phases. As noted above, to date, no experimental results have been reported for order–disorder phase-transition temperatures as a function of coverage for this system; thus, the predictions in Fig. 17.3 await experimental confirmation. A similar theoretical approach has been used to study the O/Pd(100) system.19 This study was limited to low oxygen coverages (i.e., 0 to 0.35 ML), but a similar peak of Tc at 0.25 ML was observed. Zhang et al.,19 through comparison with experiment and from investigation of different theoretical treatments found that the main source of uncertainty in the lateral interactions is the exchange-correlation functional employed, and other approximations, such as a finite number of lateral interactions, neglect of vibrational contributions, and neglect of population of other sites besides the most favorable one, have relatively negligible effects.

568

SURFACE CHEMISTRY AND CATALYSIS

17.3 SURFACE PHASE DIAGRAMS FROM AB INITIO ATOMISTIC THERMODYNAMICS 17.3.1 Ag–Cu Alloy Surface and Chemical Reactions in an Oxygen and Ethylene Atmosphere

The ab initio atomistic thermodynamics approach describes systems in thermodynamic equilibrium, taking into account the effect of the atmosphere or “environment” (e.g., a gas phase of one or more species) through the chemical potential.30 – 35 This method uses results from first-principles electronic structure theory to calculate the Gibbs free energy. Various surface structures can be compared to determine which is the most stable for certain temperature and gas pressure conditions, which is correlated to the chemical potential. It is an indirect approach in that its reliability depends on the structures explicitly considered. These structures are restricted to being ordered, due to the periodic boundary conditions employed in the supercell approach which most modern density functional theory codes use. Despite these restrictions, it represents a very valuable first step in the study of surfaces under realistic conditions. In the following, this approach is used for the study of ethylene epoxidation over an Ag–Cu alloy catalyst. On the basis of experiments and first-principles calculations, it has been proposed that if an Ag–Cu alloy is used instead of the traditional Ag catalyst, the selectivity toward ethylene oxide is improved. Experimentally, it was shown through ex situ x-ray photoelectron spectroscopy (XPS) measurements that the copper surface content is much higher than the overall content of the alloy, indicating copper segregation to the surface.36 This led to the theoretical consideration of a model in which one out of four silver atoms is replaced by a copper atom (i.e., representing a two-dimensional surface alloy).37,38 At the temperatures and pressures used in the experiments (e.g., ∼530 K, 0.1 atm), however, copper oxidizes to CuO, and at higher temperatures or lower pressures, to Cu2 O. Therefore, it is possible that more complex structures are present on the catalyst surface. Indeed, our recent studies show that a two-dimensional Ag–Cu surface alloy is not stable in an environment containing oxygen and ethylene at temperatures and pressures relevant for industrial applications, as explained below. Rather, the results show that thin surface copper oxide–like films form. These predictions are supported by recent XPS measurements and high-resolution transmission electron microscopy results.39 As a first step into the theoretical study of this system, the Ag–Cu alloy surfaces are considered in contact with a pure oxygen environment. As a second step, the effect of the ethylene gas phase is investigated. The most stable surface structures are those that minimize the change in the Gibbs surface free energy, G(μO ) =

1 O/Cu/Ag (G − Gslab − NAg μAg − NCu μCu − NO μO ) A

(17.7)

where NAg is the difference in the number of Ag atoms between the adsorption system and the clean Ag slab, and NCu is the number of Cu atoms. μCu , μAg ,

SURFACE PHASE DIAGRAMS FROM AB INITIO

569

and μO are the copper, silver, and oxygen chemical potentials, respectively. The Ag and Cu chemical potentials are taken to be that of an Ag and Cu atom in the respective bulk material. This assumes that the system is in equilibrium with bulk Ag, which acts as the reservoir. GO/Cu/Ag and Gslab are the free energies of the adsorbate structure and the clean Ag slab, respectively. Normalization to the surface area, A, allows comparison of structures with different unit cells. The temperature and pressure dependence enters through the oxygen chemical potential,31 1 pO2 total 0 0 μO (T , p) = ˜ O2 (T , p ) + kB T ln 0 EO2 (T , p ) + μ 2 p

(17.8)

Here p 0 is the standard pressure (1 atm) and μ ˜ O2 (T , p 0 ) is the chemical potential at the standard pressure. This can be obtained either from thermochemical tables40 (as done in this case) or calculated directly. Contributions to the free energy due to vibrations should be taken into account. For O/Ag34 and O/Cu41 systems studied in the literature, such contributions have been shown to be sufficiently small (e.g., ˚ 2 ) as not to play an important role. This was also found for two −1.23 eV. Figures 17.5a and 17.5b show the atomic geometry of the p2 and p4-OCu3 structures, as well as a CuO-like structure CuO(1L) (Fig. 17.5c), which is like a layer of bulk CuO forced to match the (2 × 2) lattice of the underlying Ag(111) surface. Also shown is a structure with 1 ML of Cu and 1 ML of O on top of the Cu layer, labeled O1ML (Fig. 17.5d). It is worth noting that in the absence of oxygen, Cu prefers to be located in the subsurface layer, that is, beneath the outermost Ag layer, but when there is oxygen in the atmosphere, the copper atoms segregate to the surface and form thin surface oxide–like structures. Moreover, a two-dimensional surface Ag–Cu alloy is not stable anywhere in the range of chemical potential considered. On the other hand, there is a narrow region in which two-dimensional O–Cu surface oxides are stable. This is indicated in Fig. 17.4 by the region labeled “surface oxides.” In this region thin O–Cu structures have the lowest Gibbs surface free energy. The results presented in Fig. 17.4 correspond to the situation where there is no limit to the Cu concentration. For the Ag–Cu alloy catalysts, however, there is only ≈2.5% Cu. At the surface, in an oxygen and reaction atmosphere, it is estimated from experiment that the surface has around 50 times more Cu atoms compared to the nominal bulk component. Moreover, from XPS studies, the Cu content on the surface is suggested to be in the range 0.1 to 0.75 ML.42

SURFACE PHASE DIAGRAMS FROM AB INITIO

(a)

(b)

(c)

(d)

571

Fig. 17.5 (color online) Top view of four surface structures considered: (a) p2; (b) p4-OCu3 ; (c) CuO(1L); (d) O1ML/Cu1ML. The gray spheres represent the underlying Ag(111) substrate. Copper atoms are shown as large dark circles, and oxygen atoms are the small dark circles. The black lines represent the surface unit cells. (From Ref. 30.)

To consider explicit Cu concentrations in the theory, we can use the results of Fig. 17.4 to determine the structures that will be present on the surface as a function of copper content and the oxygen chemical potential. In doing this, published results for many O–Ag structures were also utilized for the system in the absence of copper. To construct such a surface phase diagram, for a given value of the oxygen chemical potential, the surface free energy is plotted versus the copper content in the various considered structures. From this, the convex hull of the stable structures can be identified. By repeating this for the other values of the oxygen chemical potential in the range considered, the phase diagram as a function of the oxygen chemical potential and Cu content can be constructed. This is shown in Fig. 17.6. It can be seen that for a value of μO = −0.61eV, which

572

SURFACE CHEMISTRY AND CATALYSIS

Fig. 17.6 Surface phase diagram showing structures belonging to the convex hull as a function of the Cu surface content and the change in oxygen chemical potential, μO . (From Ref. 30.)

corresponds to conditions typical of industrial applications (p = 1 atm, T = 600 K) and for Cu content below 0.5 ML, the results predict that there will be patches of one-layer oxidic structures (i.e., p4-Cu3 ) which coexist with the clean Ag surface. For higher values of μO , O–Ag structures are predicted in coexistence with the p4-Cu3 structure. For higher Cu contents, the CuO(1L) and p2 structures are predicted to be present above and below μO = −0.75eV, respectively. For even higher Cu contents, bulk CuO is predicted to form on the surface. These predictions are consistent with recent experiments performed on the Ag–Cu system under catalytic conditions,43 where through a combination of in situ XPS and near-edge x-ray absorption fine structure measurements, thin layers of CuO are found to be present on the surface. Areas of clean Ag are also present on the surface, in agreement with theory. Analogous calculations have been carried out for the other two low-index surfaces, (100) and (110).44 A scenario similar to that of the (111) surface is found; that is, the presence of oxygen leads to copper segregation to the surface, and thin copper oxide–like layers are predicted on top of the silver surface, as well as copper-free structures. Having studied Ag–Cu alloy surfaces in a pure oxygen environment, it is important to consider the effect of the (reducing) reactant ethylene. This is discussed below for the (111) surface. To do this, a “constrained thermodynamic equilibrium” approach is assumed, which considers the stability of the thin oxide-like layers toward the oxidation of ethylene to acetaldehyde

SURFACE PHASE DIAGRAMS FROM AB INITIO

573

(thermodynamically favored reaction product). For a surface with stoichiometry Agx Cuy Oz , the condition of stability is μC2 H4 − μO ≤

−2 Hf (T = 0 K) + E mol z

(17.11)

where μC2 H4 is the ethylene chemical potential with respect to its zerotemperature value. Hf (T = 0 K) is the zero-temperature formation energy of the surface structure, and E mol = ECH3 CHO − EC2 H4 − 12 EO2

(17.12)

μC2H4 (eV)

calculated to be −2.18 eV. Considering a Cu surface coverage of 0.5 ML, the surface phase diagram, as a function of oxygen and ethylene chemical potentials is shown in Fig. 17.7. The region corresponding to typical experimental conditions is indicated as that enclosed by the black dashed lines. It can be seen that

μO (eV)

Fig. 17.7 Surface phase diagram for the (111) surface of the Ag–Cu alloy under constrained thermodynamic equilibrium with an atmosphere of oxygen and ethylene. The shaded areas represent the region of stability of a combination of two surface structures giving a Cu coverage of 0.5 ML. The white area corresponds to the clean Ag(111) surface, where Cu is assumed to be in a bulk reservoir, and ethylene is oxidized to acetaldehyde. The dashed polygon encloses the region that corresponds to typical values of temperature and pressure used in experiments (T = 300 to 600 K and pO2 , pC2 H4 = 10−4 − 1 atm). (From Ref. 39.)

574

SURFACE CHEMISTRY AND CATALYSIS

several structures can be present, all stable with respect to reduction by ethylene. Neglecting the effect of ethylene, therefore, the relative stability of the structures from all the low-index surfaces can be investigated as a function of the Cu surface content for a representative oxygen chemical potential (μO = −0.61 eV). Here the chemical potential of Cu is used as a parameter to control the Cu content. The results are shown in Fig. 17.8, where for several values of μCu the shapes predicted for the particles are shown, obtained by minimizing the surface free energy according to the Wulff construction.45 For the value selected of μO selected, the value of μCu above which Cu oxidizes to bulk copper oxide is −0.62 eV. The values of μCu compatible with the experimentally indicated Cu coverages (0.1 to 0.75 ML) are those close to the formation of bulk copper oxide. Around this region, both the (100) and (110) surfaces are covered with

Fig. 17.8 (color online) (Top) Atomic geometry of four of the most stable oxidelike structures on the surface of Ag–Cu particles in an oxidizing atmosphere. Large light gray spheres represent Ag atoms, small spheres, O atoms; and dark spheres, Cu atoms. (Bottom) Surface energy versus the Cu chemical potential for μO of −0.61 eV (corresponding to T = 600 K and pO2 = 1 atm). At selected values of μCu , the predicted particle shape, as obtained through the Wulff construction, is presented. (From Ref. 39.)

SURFACE PHASE DIAGRAMS FROM AB INITIO

575

a one-layer oxidelike structure with a ratio of Cu to O of 1, denoted “CuO/Ag.” For values of μCu < −0.65 eV, all facets are covered with Cu-free structures. Having predicted the equilibrium shape and surface structures of the Ag–Cu catalyst under conditions of practical interest, the adsorption of ethylene and the two competing chemical reactions leading to the formation of acetaldehyde (Ac) and ethylene oxide (EO) (see Fig. 17.9) can be investigated. For the (2 × 2)O/Ag(111) and (2 × 2)-O/Ag(100) surfaces, both reactions are known to proceed through a common oxametallacycle (OMC)37,38,46,47 intermediate, where ethylene is bonded with one C atom to a surface metal atom and with the other C atom bonded to oxygen. The OMC is shown in Fig. 17.9 (leftmost panel). Similar findings have also been reported for Ag oxides.48 From calculations of the reaction pathways for Ac and EO formation over the predicted stable surface structures, it is found that the behavior can be quite varied,49 depending on the surface structure; in particular, for the (111) surface formation of EO does not involve the formation of any intermediate for the p2/Ag(111), p4-OCu3 /Ag(111), and CuO/Ag(111) structures. For formation of Ac over the CuO/Ag(111) surface, the reaction does, however, proceed by an OMC, but this is a metastable state. Ac formation over the p2/Ag(111) surface involves the formation of a different stable intermediate in which ethylene is bound to one oxygen on each carbon. The OMC, on the other hand, is a common intermediate for both Ac and EO formation over the (2 × 2)-O/Ag(111), CuO/Ag(100), and CuO/Ag(110) surfaces. In Fig. 17.10 the transition states for Ac and EO formation over the (2 × 2)-O/Ag(111) and CuO/Ag(111) surfaces are shown as an example. The activation barrier for EO formation is lower than that of Ac for the CuO/Ag(111) structure, while the trend is the opposite for the (2 × 2)O/Ag(111) surface. This is consistent with, and possibly partially explains, the greater selectivity reported experimentally for the Ag–Cu catalysts compared to pure silver. As mentioned above, the nature of the reaction pathways for the surface structures identified to be potentially catalytically relevant for

Fig. 17.9 (color online) Atomic geometry of the oxametallacycle (OMC) intermediate (left) and final states acetaldehyde (Ac) (center) and ethylene oxide (EO) (right) on (2×2)O/Ag(111). (From Ref. 49.)

576

SURFACE CHEMISTRY AND CATALYSIS O(2 × 2)/Ag(111)

CuO/Ag(111)

Ac

EO

TOP

Fig. 17.10 (color online) Transition-state geometries for the formation of acetaldehyde (top panels) and ethylene oxide (central panels) and top view of the surface for the reaction over (2×2)-O/Ag(111) and for the CuO/Ag(111) structure (bottom panels). The large light gray spheres represent Ag atoms; the large dark ones, Cu; the medium dark ones, O; and the very small spheres, H atoms. (From Ref. 49.)

the low-index surfaces are quite varied, but the preliminary results point to the Cu-containing structures providing better selectivity toward EO formation, consistent with experimental measurements. Fore more details, see Ref. 49. 17.4 CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE CARLO SIMULATIONS 17.4.1 CO Oxidation Reaction over Pd(100)

The importance of molecular-level mechanisms and their interplay for determining observable macroscopic (and microscopic) material phenomena is without

CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE

577

question. Often, as, for example, in the study of order–disorder phase transition temperatures discussed in Section 17.2, there is no direct link between the microscopic (electronic) theory and experimental measurables, and appropriate “hierarchical” approaches have to be developed that link the physics across all relevant length and time scales into one multiscale simulation.50 A particularly successful approach is that of ab initio kinetic Monte (kMC). Considering, for example, the study of heterogeneous catalysis, for given gas-phase conditions, such calculations can determine the detailed surface composition and the occurrence of each individual elementary process at any time. From the latter, the catalytic activity (i.e., product formation) per surface area can also be obtained, either time-resolved (e.g., during induction, when the catalyst surface is being restructured to its active form) or time-averaged, during steady state. A recent comprehensive description of the kMC approach using microscopic parameters obtained from ab initio electronic structure total energy calculations for heterogeneous catalysis is given in Ref. 7. First-principles-based kMC involves, first, a determination of the elementary steps involved in the particular process to be studied, and their calculation by electronic structure, total energy calculations (most typically using density functional theory). For catalysis, these would include adsorption and desorption of reactants and reaction intermediates, as well as surface diffusion and surface reactions. The second step concerns describing the statistical interplay of the elementary processes as achieved by kinetic Monte Carlo simulations.51 In kMC the relationship between “MC time” and “real time” is obtained by regarding the MC process as providing a numerical solution to the Markovian master equation describing the dynamic system evolution.52 – 56 A sequence of configurations is generated using random numbers. For each step (new configuration), all possible elementary processes and the rates with which they occur are calculated. These processes are weighted by the rates, and one of the processes is executed randomly to achieve the new system configuration. In this way the kMC algorithm effectively simulates stochastic processes, and a direct relationship between kMC time and real time is established. The flow diagram for the kMC process is shown in Fig. 17.11. Properly evaluating the time evolution requires simulation cells that are large enough to capture the effects of correlation and spatial distribution of the species at the surface. Most processes considered in kMC are highly activated and occur on time scales orders of magnitude longer than, for example, a typical vibration (10−12 s). Due to these “rare events,” the statistical interplay of the elementary processes need to be evaluated over time scales that can reach to seconds and more. A recent application demonstrating the power of this approach is the study of the CO oxidation reaction over the Pd(100) surface. The motivation for this study is related to the increasing awareness that for oxidation catalysis (i.e., under atmospheric oxygen conditions) the surface of a transition metal (TM) catalyst may be oxidized, and instead of being the pure TM surface, which is often the subject of quantitative ultrahigh-vacuum (UHV) surface science studies, the oxidized material may be active for the catalysis. This has recently

578

SURFACE CHEMISTRY AND CATALYSIS

Fig. 17.11 (color online) Flow diagram showing the basic steps in a kinetic Monte Carlo simulation. First, loop over all the lattice sites and determine the elementary atomic processes that are possible for the current system configuration. Then generate two random numbers and advance the system configuration according to the process selected by the first random number. Then, increment the clock according to the rates and the second random number as prescribed by an ensemble of Poisson processes, and then start all over again or stop if the simulation time is sufficiently long. (From Ref. 6.)

been revealed for CO oxidation employing Ru catalysts. In this case, bulk oxide RuO2 is, in fact, the stable phase under reactive conditions.57,58 For TMs farther to the right in the periodic table, the late TM and noble metals, which are also used in oxidation catalysis, the situation is different; thus, it is of great interest to consider the analogous reaction of CO oxidation over the more noble metal, Pd. Briefly, from the kMC simulations described below, it was found that oxide formation in the reactive environment also plays a significant role, but a difference is that this oxide is not a bulklike film that once it becomes stable, actuates the catalysis; rather, the study indicates the relevance of a subnanometer surface oxide structure which is probably formed continuously and reacted away in the sustained catalytic operation. As a first step in this study, using the approach of ab initio atomistic thermodynamics described in Section 17.3, the surface structure and stability of the Pd(100) surface in an atmosphere containing oxygen and carbon monoxide, for a wide range of partial pressures and temperatures, is studied. The resulting phase diagram is shown in Fig. 17.12.59,60 Here, a constrained atomic thermodynamics approach was employed,61,62 as for the Ag–Cu alloy catalysts described in Section 17.3 for ethylene oxidation, in which it is assumed that the surface is in equilibrium with i separate reservoirs representing the i gas-phase species, each characterized by the chemical potential μi (T , pi ) with partial pressure pi and temperature T . The character of the surface phase diagram can be described in terms of three regions: first, a region where bulklike thick oxide films are stable (crosshatched region); then a region consisting of adsorption

CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE

579

√ √ √ phases on a ( 5 × 5)R27◦ (hereafter denoted “ 5”) surface oxide (hatched area), which has recently been characterized and resembles a layer of PdO(101) on the surface63 ; and finally, a region with different CO and O adsorption phases on Pd(100). Gas-phase conditions, representative of technological CO oxidation catalysis (pi ∼ 1 atm, T ∼ 300 to 600 K),√correspond to the phase boundary between the regions of adsorption on the 5 surface oxide and that of COcovered Pd(100). Thus, unlike for Ru, the presence of bulk oxides in the √ reactive environment can be ruled out, while the stability region of the thin 5 surface oxide structure extends into such conditions. √ To investigate the reactivity of the 5 phase, and to see if its stability region changes when the kinetic effects of catalytic reaction on the surface are taken into account, kinetic Monte Carlo calculations are carried out. In these simulations, hollow and bridge sites are considered and all nonconcerted adsorption, desorption, diffusion, and Langmuir–Hinshelwood reaction processes (where both reactants are adsorbed on the surface prior to reaction to the product) involving these sites: in all, 26 elementary processes. Also, nearest-neighbor lateral interactions are taken into account in the elementary process rates. The required (14) interaction parameters are determined from DFT calculations of √ 29 ordered configurations with O and/or CO in bridge and hollow sites of the 5 surface unit cell. The resulting adsorption energies are expressed in terms of the LGH expansion. The kMC simulations are performed on a lattice comprising (50 × 50) surface unit cells for fixed (T , pO2 , pCO ) conditions, in particular for pO2 = 1 atm and temperatures in the range 300 to 600 K. Initially, the CO partial pressure was chosen √ to be low, 10−5 atm, corresponding to the middle of the stability region of the 5 phase, and subsequently increased, moving closer and closer to the √ boundary of the stability region of the 5 phase. This is indicated by the vertical arrows in Fig. 17.12. When √ the surface reaction consumes surface oxygen faster than it is replenished, the 5 phase becomes destabilized. To determine the onset of the structural destabilization from the kMC simulations, the percentage occupation of O atoms in hollow sites is monitored as a function √ of CO partial pressure. Full occupation of these sites corresponds to the intact 5 phase. The results are shown in Fig. 17.13. Interpreting a reduction to 95% occupation as the onset of decomposition, the results predict critical CO pressures of 5 × 10−2 , 10−1 , and 10 atm at 300, 400, and 600 K, respectively. These results are rather similar to those obtained from the constrained atomistic thermodynamics approach, which are shown in Fig. 17.13 as the vertical lines. The critical pressures obtained (e.g., at 400 K pO2 /pCO ≈ 10 : 1) are in good accord with reactor scanning tunneling microscopy (STM) experiments64 performed under such gas-phase conditions. Importantly, the theoretical results show that for relevant pO2 /pCO ratios, the turnover frequencies (number of CO2 molecules produced per site per second) √ for the intact 5 surface oxide alone are already of a similar order of magnitude to those reported experimentally65 for the Pd(100) surface under comparable gas-phase conditions. This shows that this particular surface oxide is certainly not “inactive” with respect to the oxidation of CO, which is contrary to early prevalent general preconceptions.

580 ΔμCO (eV)

600 K

300 K

1

105

400 K

10–10

1

105

0.0

PdO bulk

10–30 10–10

10–20 10–5

10–10 1

1

600 k 300 k

1010

Surface oxide (√5 × √5) R27°

–1.0 –0.5 ΔμO (eV)

10–20

10

10–30

10

P(2 × 2) –O/Pd(100)

–2.5 –1.5

–2.0

–1.5

–1.0

–0.5

0.0

10

pO2 (atm) –5

–10

Surface oxide +O bridge

Surface oxide +CO bridge

Surface oxide +2CO bridge

Fig. 17.12 (color online) Surface phase diagram for the Pd(100) surface in constrained thermodynamic equilibrium with an environment containing O2 and CO. The various surface structures corresponding to the regions in the phase diagram are illustrated. The pressures corresponding to the O2 and CO chemical potentials are shown for temperatures of 300 and 600 K. The thick black line marks gas-phase conditions representative of that employed for technological CO oxidation catalysis (i.e., partial pressures of 1 atm and temperatures between 300 and 600 K). The three vertical lines correspond to the gas-phase conditions employed in the kinetic Monte Carlo simulations shown in Fig. 17.13. (From Ref. 60.)

Clean Pd(100)

(2 √2 × √2) R 45° CO/Pd(100)

(3 √2 × √2) R 45° CO/Pd(100)

(4 √2 × √2) R 45° CO/Pd(100)

(1 × 1)–CO bridge/ Pd(100)

–15

PCO (atm)

CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE

581

Coverage ΘOhol (%)

100

pO = 1 atm 2

50

0 10–5

T = 300 K T = 400 K T = 600 K

100

105

CO pressure (atm)

Fig. 17.13 (color Average coverage (occupation) of oxygen atoms in hollow √ √ √ online) as obtained from kinetic sites of the ( 5 × 5)R27◦ ( 5) surface oxide-like structure √ Monte √ Carlo simulations. 100% corresponds to the intact 5 structure. The reduction of the 5 surface oxide-like phase occurs at CO pressures close to those corresponding to the stability boundary (transition from the hatched to the plain areas in Fig. 17.12) and indicated by the vertical lines in Fig. 17.12. (From Ref. 59.)

17.4.2 Permeability of Hydrogen in Amorphous Materials

In a new application, first-principles kinetic Monte Carlo–based simulations have recently been used for the study of the permeability of hydrogen through crystalline and amorphous membranes.9,66,67 The use of metal membranes can potentially play an important role in the large-scale production of high-purity hydrogen, which is required for its use as a fuel in (polymer electrolyte) fuel cell technologies.68 In these membranes, hydrogen permeates through the film by dissociation of molecular hydrogen, diffusion of atomic H through interstitial sites, and then recombination to H2 . Permeation of hydrogen occurs at much greater rates than other elements; thus, the membranes, can deliver high-purity H2 from gas mixtures containing large concentrations of other species. There has been a recent focus on exploring the possibility that amorphous metals may represent a promising new class of membranes, which are to date relatively unexplored compared to crystalline metals and alloys. Hao and Sholl9 have recently investigated hydrogen permeability through amorphous and crystalline Fe3 B metal films. The scheme involves kinetic Monte Carlo simulations and the goal is that this approach could be used to identify materials with high potential for improved performance through an efficient screening of candidate structures. The structure of crystalline Fe3 B is shown in Fig. 17.14b, while an amorphous structure obtained from molecular dynamics simulations is shown in Fig. 17.14a. Considering H2 transport through a film, the rate is often limited by interstitial diffusion of H through the bulk material. In this case, the flux can be related to the operating conditions if the solubility and diffusion coefficient of interstitial H is known. The latter quantity can be accurately calculated for crystalline materials from first-principles-based approaches. For amorphous solids the situation is, however, more complex. In this case a detailed model for the atomic structure must first be generated. Once this is established, the sites can

582

SURFACE CHEMISTRY AND CATALYSIS

B Fe (a)

B Fe (b)

Fig. 17.14 (color online) Atomic structure of crystalline Fe3 B (b) and an example of an amorphous structure of Fe3 B (a) as generated from a molecular dynamics simulation. (From S. Hao, private communication.)

be occupied with interstitial hydrogen and the transition states for diffusion of H atoms between sites can be identified. For amorphous materials, the solubility is typically stronger than in the crystalline counterpart, due to the greater range of interstitial binding sites, some of which can bind H notably stronger. This results in the effects of H concentration being greater for amorphous systems, and this must be taken into account. To investigate this, Hao and Sholl9 carried out simulations for various concentrations for both crystalline c-Fe3 B and amorphous a-Fe3 B. As the first step, the amorphous geometry was created through an ab initio molecular dynamics simulation of a representative liquidlike sample of 100 atoms, which was rapidly quenched and then an energy minimization carried out. Subsequently, the interstitial sites were identified. This was done using an automatic procedure for the amorphous structure, due to the great number of them.

CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE

583

The binding energies and the interactions between H atoms in the interstitial sites were then calculated using density functional theory. From the site energies and the H–H interaction energies, the solubility of H in a-Fe3 B and c-Fe3 B was obtained using grand canonical Monte Carlo calculations.69 The result is shown in Fig. 17.15, plotted as a function of temperature and H2 pressure. An important finding (Fig. 17.15) is that the H solubility is far larger in the amorphous material than in the crystalline material (e.g., two to three orders of magnitude at 600 K). It can also be noticed that the qualitative dependence of the solubility on temperature is different for the amorphous and crystalline materials, which is attributed to the broad distribution of site energies in the amorphous material.9 Calculation of H diffusion requires the calculation of transition states between adjacent H sites. Initially, Hao and Sholl employed an approximation for the positions of the transition states before carrying out the more computationally expensive DFT calculations. For a-Fe3 B, this involved determining a huge number (462) of transition states, highlighting the complexity of treating the amorphous structure. Once determined, the rates and the H diffusion can be calculated using kinetic MC. On investigating the concentration dependence of the diffusion coefficient for amorphous Fe3 B, it was found that for increasing concentration, the diffusion coefficient increases (e.g., at 600 K by around three orders of magnitude for H concentration varying from 0 to 0.2H/M) and then begins to decrease again. This behavior was explained by the fact that at low concentrations the strongest binding sites are occupied, which have associated large diffusion barriers. For higher concentrations, these sites are occupied, and less favored sites become populated which have smaller barriers for diffusion. For even higher concentration, the diffusion coefficient decreases due to blocking effects by the interstitial H atoms.

Solubility (H/M)

10–1

10–2

10–3

10

–4

10–5 200

a–, 10 atm a–, 1 atm a–, 0.01 atm c–, 10 atm c–, 1 atm c–, 0.01 atm 400

600 800 Temperature (K)

1000

Fig. 17.15 (color online) Calculated H solubility in a-Fe3 B (solid curves) and c-Fe3 B (dashed curves) as a function of temperature for several H2 pressures. Lines are guides to the eye. (From Ref. 9.)

584

SURFACE CHEMISTRY AND CATALYSIS

H2 permeability (mol/m/s/Pa0.5)

10–7 10–8 Pd a– Fe3B c– Fe3B

10–9 10–10 10–11 10–12 10–13

600

700

800 900 Temperature (K)

1000

Fig. 17.16 (color online) Calculated permeability of H2 in a-Fe3 B and c-Fe3 B at different temperatures. The “feed pressure” was 10 atm and the permeate pressure was 1 atm. The permeability of pure Pd is also shown for comparison. (From Ref. 9.)

To make contact with the experimental results, the more relevant quantity is H permeation through these materials, which involves calculation of the flux through the membrane. Here it was assumed that the net transport is dominated by diffusion through the bulk of the membrane. The results obtained are shown in Fig. 17.16 for particular pressures. It can be seen that the permeability of the amorphous material is about 1.5 to 2 orders of magnitude larger than the crystalline material, supporting the notion that amorphous structures can indeed have higher permeabilities. It is noted that the permeability of pure Pd is greater than that of both a-Fe3 B and c-Fe3 B, although the latter material was chosen not because it was thought it may yield greater permeabilities than Pd, but because it represented a system in which a detailed comparison of the behavior of a crystalline versus an amorphous system could be achieved. 17.5 SUMMARY

In this chapter, recent applications and results of first-principles-based approaches to describing and predicting surface properties, such as structures, stoichiometry, phase transitions, and heterogeneous catalysis, and also bulk properties, including solubility, diffusivity, and permeability, were discussed. Three particular calculation approaches were highlighted which are often described under the label “multiscale modeling.” First, using the lattice-gas Hamiltonian (LGH) in combination with equilibrium Monte Carlo (MC) simulations, order–disorder phase transitions for the O/Pd(111) system were presented. This approach is truly predictive in nature in that completely unanticipated structures can be found. It can, in principle, also describe the coexistence of phases and configurational

SUMMARY

585

entropy. For the case of O/Pd(111) the recently introduced MC scheme of Wang and Landau was used. This algorithm enables direct evaluation of the density of (configurational) states, and thus straightforward determination of the main thermodynamic functions. Using the ab initio atomistic thermodynamics approach, the alloy catalyst Ag–Cu was investigated regarding its surface structure and activity for the ethylene epoxidation reaction. In this approach the free energy for surface structures are calculated, from which the stability range of various identified low-energy phases are predicted. The main limitation of this method is that its predictive power is limited to the explicitly considered surface structures, and that due to the supercell approach used in most modern first-principles approaches, the structures investigated are restricted to be periodic. From investigation of the chemical reactions over the surface phases identified, the calculations showed that first under reaction conditions the catalyst surface is very different to a hitherto assumed AgCu surface alloy. In particular, the results point to a dynamical coexistence of thin CuO and AgO–CuO films on the Ag substrate. This is likely to have important consequences regarding the mechanism by which Cu enhances the catalyst selectivity since the active O species will be part of the oxide layer rather than adsorbed O atoms on a metal surface. Preliminary investigations indicate that some reaction pathways for ethylene oxidation over such Cu-oxide layers have a lower activation energy than that of the (undesired) competing reaction to acetaldehyde. These findings may also be of high relevance for understanding the activity of other dilute alloy catalysts. The most complex approach discussed, kinetic MC, links an accurate description of the elementary processes, which have a clear microscopic meaning (obtained through use of first-principles calculations) with a proper evaluation of their statistical interplay. Important to the success of this approach is the identification of all relevant elementary processes, which can be nontrivial. Further, for increasingly complex systems, the number of elementary processes can virtually explode. In the literature there have been some attempts to generate the list of elementary reactions “on the fly” (see, e.g., Refs. 70 and 71, where this approach is discussed in more detail and distributed). Typically, ab initio kMC studies have been carried out with “home-grown” codes written around a particular application. In the present chapter, two recent examples were described: the first, the carbon monoxide oxidation reaction over Pd(100) in which the importance of the formation of a thin surface-oxide-like film was identified, and the second, the permeability of hydrogen through amorphous and crystalline films of Fe3 B. In the latter study, the calculations predicted a greater permeability for the amorphous membrane, pointing to amorphous structures possibly representing a new class of higher-efficiency membranes for hydrogen purification. Over the years there has been a considerable increase in the atomic-level understanding of material systems, which has arisen primarily due to the synergy between experiment and first-principles-based studies. It is envisaged that this trend will continue, with the theoretical methods described here, as well as new

586

SURFACE CHEMISTRY AND CATALYSIS

approaches that will be developed together with the seemingly ever-increasing computer power, proving very valuable for advancing the performance of technological applications right across the multidisciplinary fields of physics, chemistry, biology, engineering, and materials science, yielding many exciting discoveries along the way.

REFERENCES 1. Basic research needs for the hydrogen economy. Presented at the Workshop on Production, Storage and Use, U.S. Department of Energy, Office of Basic Energy Sciences, Washington, DC, 2003. 2. Basic research needs for solar energy utilization. Report of the Basic Energy Sciences Workshop on Solar Energy Utilization, 2005. 3. Satterfield, C. N. Heterogeneous Catalysis in Industrial Practice, McGraw-Hill, New York, 1991. 4. Lundgren, E.; Over, H. J. Phys. Condens. Matter 2008, 20 , 180302, and references therein. 5. Basic research needs: catalysis for energy. Presented at the Workshop on Production, Storage and Use, U.S. Department of Energy, Office of Basic Energy Sciences, Washington, DC, 2007. 6. Reuter, K.; Stampfl, C.; Scheffler, M. Ab initio atomistic thermodynamics and statistical mechanics of surface properties and functions. In Handbook of Materials Modeling, Vol. 1., Yip, S., Ed., Springer-Verlag, Berlin, 2005, pp. 149–194. 7. Reuter, K. First-principles kinetic Monte Carlo simulations for heterogeneous catalysis: Concepts, status and frontiers. In Modeling Heterogeneous Catalytic Reactions: From the Molecular Process to the Technical System, Deutschmann, O., Ed., WileyVCH, Weinberg, Germany, 2009. 8. Stampfl, C. Catal. Today 2005, 105 , 17. 9. Hao, S.; Sholl, D. S. Energy Environ. Sci . 2008, 1 , 175. 10. Sholl, D. S.; Steckel, J. A. Density Functional Theory: A Practical Introduction, Wiley, New York, 2009. 11. Engel, T.; Ertl, G. J. Chem. Phys. 1978, 69 , 1267; Adv. Catal . 1979, 28 , 1; The Chemical Physics of Solid Surfaces and Heterogeneous Catalysis, Vol. 4, King, D. A. and Woodruff, D. P., Eds., Elsevier, Amsterdam, 1982. 12. Campbell, C. T.; Ertl, G.; Kuipers, H.; Segner, J. J. Chem. Phys. 1980, 73 , 5862. 13. Zaera, F. Prog. Surf. Sci . 2002, 69 , 1. 14. Nakai, I.; Kondoh, H.; Shimada, T.; Resta, A.; Andersen, J.; Ohta, T. J. Chem. Phys. 2006, 124 , 224712. 15. McEwen, J.-S.; Payne, S. H.; Stampfl, C. Chem. Phys. Lett. 2002, 361 , 317. 16. Borg, M.; Stampfl, C.; Mikkelsen, A.; Gustafson, J.; Lundgren, E.; Scheffler, M.; Andersen, J. N. ChemPhysChem 2005, 6 , 1923. 17. Piccinin, S.; Stampfl, C. Phys. Rev. B 2010, 81 , 155427. 18. Tang, H.; Van der Ven, A.; Trout, B. L. Phys. Rev. B 2004, 70 , 045420. 19. Zhang, Y.; Blum, V.; Reuter, K. Phys. Rev. B 2007, 75 , 235406.

REFERENCES

587

20. Shao, J. J. Am. Stat. Assoc. 1993, 88 , 486. 21. Zhang, P. Ann. Math. Stat. 1993, 21 , 299. 22. Stampfl, C.; Kreuzer, H. J.; Payne, S. H.; Pfn¨ur, H.; Scheffler, M. Phys. Rev. Lett. 1999, 83 , 2993. 23. Mendez, J.; Kim, S. H.; Cerd´a, J.; Wintterlin, J.; Ertl, G. Phys. Rev. B 2005, 71 , 085409. 24. Piercy, P,; De’Bell, K.; Pfn¨ur, H. Phys. Rev. B 1992, 45 , 1869. 25. Kortan, A. R.; Park, R. L. Phys. Rev. B 1981, 23 , 6340. 26. Wang, F.; Landau, D. P. Phys. Rev. Lett. 2001, 86 , 2050. 27. Wang, F.; Landau, D. P. Phys. Rev. E 2001, 64 , 056101. 28. Schulz, B. J.; Binder, K.; M¨uller, M.; Landau, D. P. Phys. Rev. E 2003, 67 , 067102. 29. Keil, F. J. J. Univ. Chem. Technol. Metall . 2008, 43 , 19. 30. Piccinin, S.; Stampfl, C.; Scheffler, M. Phys. Rev. B 2008, 77 , 075426. 31. Reuter, K; Scheffler, M. Phys. Rev. B 2002, 65 , 035406. 32. Weinert, C.; Scheffler, M. Mater. Sci. Forum 1986, 10–12 , 25. 33. Scheffler, M.; Dabrowski, J. Phil. Mag. A 1988, 58 , 107. 34. Li, W.-X.; Stampfl, C.; Scheffler, M. Phys. Rev. B 2003, 67 , 045408. 35. Stampfl, C. Catal. Today 2005, 105 , 17. 36. Linic, S., Jankowiak, J.; Barteau, M. A. J. Catal . 2004, 224 , 489. 37. Linic, S.; Barteau, M. A. J. Am. Chem. Soc. 2002, 124 , 310. 38. Linic, S.; Barteau, M. A. J. Am. Chem. Soc. 2004, 125 , 4034. 39. Piccinin, S.; Zafeiratos, S.; Stampfl, C.; Hansen, T.; H¨avecker, M.; Teschner, D.; Knop-Gericke, A.; Schl¨ogl, R.; Scheffler, M. Phys. Rev. Lett. 2010, 104 , 035503. 40. Stull, D. R.; Prophet, H. JANAF Thermochemical Tables, 2nd ed., U.S. National Bureau of Standards, Washington, DC, 1971. 41. Soon, A.; Todorova, M.; Delley, B.; Stampfl, C. Phys. Rev. B 2006, 73 , 165424. 42. Jankowiak, J. T.; Barteau, M. A. J. Catal . 2005, 236 , 366. 43. Zafeiratos, S.; H¨avecker, M.; Teschner, D.; Vass, E.; Schn¨orch, P.; Girgsdies, F.; Hansen, T.; Knop-Gericke, A.; Schl¨ogl, R.; Bukhiyarov, V. Unpublished. 44. Piccinin, S.; Stampfl, C.; Scheffler, M. Surf. Sci . 2009, 603 , 1467. 45. Wulff, G. Z. Kristallogr . 1901, 34 , 449. 46. Kokalj, A.; Gava, P.; de Gironcoli, S.; Baroni, S. J. Catal . 2008, 254 , 304. 47. Torres, D.; Lopes, N.; Illas, F.; Lambert, R. J. Am. Chem. Soc. 2005, 127 , 10774. 48. Bocquet, F.; Loffreda, D. J. Am. Chem. Soc. 2005, 127 , 17207. 49. Piccinin, S.; Nguyen, N. L.; Stampfl, C.; Scheffler, M. J. Mater. Chem. 2010, 20 , 10521. 50. Yip, S., Ed. Handbook of Materials Modeling, Springer-Verlag, Berlin, 2005. 51. Voter, A. F. Introduction to the kinetic Monte Carlo method. In Radiation Effects in Solids, Sickafus, K. E., Kotomin, E. A., and Uberuaga, B. P., Eds., Springer-Verlag, Berlin, 2007. 52. Bortz, A. B.; Kalos, M. H.; Lebowitz, J. L. J. Comput. Phys. 1975, 17 , 10. 53. Gillespie, D. T. J. Comput. Phys. 1976, 22 , 403. 54. Voter, A. F. Phys. Rev. B 1986, 34 , 6819.

588

55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.

SURFACE CHEMISTRY AND CATALYSIS

Kang, H. C.; Weinberg, W. H. J. Chem. Phys. 1989, 90 , 2824. Fichthorn, K. A.; Weinberg, W. H. J. Chem. Phys. 1991, 95 , 1090. Reuter, K.; Scheffler, M. Appl. Phys. A 2004, 78 , 793. Over, H.; M¨uhler, M. Prog. Surf. Sci . 2003, 72 , 3. Rogal, J.; Reuter, K.; Scheffler, M. Phys. Rev. B 2008, 77 , 155410. Rogal, J.; Reuter, K.; Scheffler, M. Phys. Rev. Lett. 2007, 98 , 046101. Reuter, K.; Scheffler, M. Phys. Rev. B 2003, 68 , 045407. Reuter, K.; Scheffler, M. Phys. Rev. Lett. 2003, 90 , 046103. Todorova, M.; Lundgren, E.; Blum, V.; Mikkelsen, A.; Gray, S.; Gustafson, J.; Borg, M.; Rogal, J.; Reuter, K.; Andersen, J. N.; Scheffler, M. Surf. Sci . 2003, 541 , 101. Hendriksen, B. L. M.; Bobaru, S. C.; Frenken, J. W. M. Surf. Sci . 2004, 552 , 229. Szanyi, J.; Goodman, D. W. J. Phys. Chem. 1994, 98 , 2972. Semidey-Flecha, L.; Sholl, D. S. J. Chem. Phys. 2008, 128 , 144701. Hao, S.; Sholl, D. S. J. Chem. Phys. 2009, 130 , 244705. Schlapbach, L.; Z¨uttel, A. Nature 2001, 414 , 353. Ling, C.; Sholl, D. S. J. Membr. Sci . 2007, 303 , 162. Henkelman, G.; J´onsson, H. J. Chem. Phys. 2001, 115 , 9657. Pedersen, A.; J´onsson, H. Math. Comput. Simul . 2010, 10 , 1487.

18

Molecular Spintronics WOO YOUN KIM and KWANG S. KIM Center for Superfunctional Materials, Pohang University of Science and Technology, Pohang, Korea

Molecular spintronics is a new rising field to share and maximize the common area between spintronics and molecular electronics. This chapter offers a pedagogical introduction to the theoretical work on molecular spintronics. Theoretical backgrounds for both spintronics and molecular electronics are overviewed and their numerical implementation issues are discussed in detail. In particular, we review molecular analogs of conventional spin valve devices and graphene nanoribbon–based super magnetoresistance.

18.1 INTRODUCTION

Spintronics is a promising research field where electronic devices exploit the spin of an electron as a transport carrier rather than its charge in conventional electronics. Manipulation of the spin using external magnetic fields enables us to store information with high density in an electronic device.1 In addition, nonvolatility of the spin empowers the device to keep the information without electric power. This new idea triggered by the discovery of the giant magnetoresistance (GMR) effect in 1988 has led to the innovation of information storage techniques, with successful application of the GMR device to the read head sensor in hard disk drives.2,3 It eventually advanced an information-oriented era. As a result, in 2007, Nobel prizes were awarded to A. Fert and P. Gr¨unberg for their discovery of the GMR effect. In the meantime, popularization of small and portable electronic devices has led to increased demand to develop not only nonvolatile but also low power consumption, high-speed access, and high-density memory devices. Emergence of tunneling magnetoresistance (TMR) has opened a new way to develop high-performance magnetoresistive random access memory (MRAM), which has attracted great attention as a next generation of information storage.4

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

589

590

MOLECULAR SPINTRONICS

On the other hand, molecular electronics is a rapidly growing field where a single or a few molecules are used as an individual electronic device.5 – 9 Such a bottom-up approach would provide an ideal means to construct nanoscale devices, complementing or even replacing conventional top-down approaches.8,9 In addition, organic molecules have essential advantages to be used in spintronics. There are two intrinsic sources to collapse long spin coherence in materials: spin-orbit coupling and hyperfine interactions. Organic molecules are composed of low-mass atoms, while the strength of the spin-orbit coupling increases with the atomic number Z (proportional to Z 4 in the case of atoms). Carbon-12 (12 C), the most abundant isotopes of carbon as well as the main component of organic molecules, has zero nuclear spin, so that it has no hyperfine interactions. Moreover, delocalized orbitals of conjugated molecules have small hyperfine interactions. These properties of molecules promise long spin-relaxation length, which is vital to fabricate high-performance spintronic devices. In this regard, novel combination of both spintronics and molecular electronics would be the natural evolution toward molecular-scale spintronic devices. This new emerging field, molecular spintronics, has already shown the feasibility of real applications with successful measurements of spin-dependent electrical currents in molecule-based devices.10 – 15 The first experiment was carried out by exploiting a multiwall carbon nanotube (CNT) sandwiched between cobalt electrodes.11 CNTs have attracted much interest because of their superior properties, such as high carrier mobility, ballistic electron transport, and mechanical robustness. Furthermore, they are composed of only carbon atoms, so that they have negligible spin-orbit coupling and hyperfine interactions. Indeed, CNTs have shown very long spin relaxation length reaching over micrometers.14 Subsequently, organic molecules and graphene (a single graphite layer) have been used in spintronic devices.12 – 15 In addition, a new type of spintronic devices can be made when exploiting a magnetic molecule in spintronics.16 – 20 Particular molecules comprised of transition metals show internal spin ordering whose orientation can be controlled by an external magnetic field. Electron transport through such a magnetic molecule shows nontrivial spin-dependent effects due to the internal spin dynamics of the molecule. All this experimental evidence shows the bright future of molecular spintronics. Alongside experimental works, theoretical studies have also been active.8 As quantum chemistry, including density functional theory (DFT), the Hartree–Fock (HF) method, and post-HF methods, has offered versatile tools to study electronic structures for a variety of materials, theoretical modeling should be a powerful means to investigate transport properties in molecular spintronic devices. However, it is not straightforward to use conventional quantum chemistry for this purpose, since we are dealing not only with nonequilibrium states driven by a bias voltage (for which the variational principle is not valid) but also open boundary systems made by a contact between two semi-infinite metallic electrodes and a finite molecule. A general way to study such a system is to utilize the nonequilibrium Green’s function (NEGF) method.21,22 At present, several schemes based

THEORETICAL BACKGROUND

591

on the NEGF method to describe quantum transport quantitatively as well as qualitatively are available23 – 33 (see also Chapters 1 and 19). Some of them are also used for spin-polarized transport.29 – 33 Especially, parameter-free methods enable us to design novel spintronic devices as well as to interpret experimental observations. The goal of this chapter is to offer a pedagogical introduction of the exciting molecular spintronics based on theoretical works. In the following sections we discuss theoretical backgrounds on spintronics and molecular electronics, practical schemes for numerical implementation, and interesting example studies.

18.2 THEORETICAL BACKGROUND 18.2.1 Magnetoresistance

A representative spintronic device is the spin valve that is composed of two ferromagnetic (FM) electrodes connected by a spacer as shown in Figs. 18.1 and 18.2. The resistance in the spin-valve device depends on the relative spin orientation between the two FM electrodes. In general, the resistance is smaller for the parallel spin orientation than for the antiparallel spin orientation. Consequently, the resistance in a spin-valve device is tuned by an external magnetic

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 18.1 (color online) (a,b) Schematic structure of a GMR device with parallel and antiparallel spin alignments; (c,d) corresponding density of states (with respect to energy) and spin-transfer paths (from the left to right electrode through a spacer); (e,f), schematic presentation of resistance for the spin-transfer paths.

592

MOLECULAR SPINTRONICS

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 18.2 (color online)

Same as Fig. 18.1 for a TMR device.

field. Magnetoresistance (MR), the quantitative value measuring the effectiveness of a spin-valve device, is typically defined as follows: MR =

GP − GAP RAP − RP = RP GAP

(optimistic)

(18.1)

MR =

RAP − RP GP − GAP = RAP GP

(pessimistic)

(18.2)

or

where R/G is resistance/conductance and P/AP is parallel/antiparallel. The optimistic version is most commonly used. However, the pessimistic MR is useful when a system has a vanishing GAP , because in this case the pessimistic MR is bounded by 1, while the optimistic MR is unbounded. The type of MR is determined by a spacer material, since the mechanism of spin transport is different according to the spacer material. Figures 18.1 and 18.2 show schematic structures of two conventional spin-valve devices. As shown in Fig. 18.1, a GMR device adopts a nonmagnetic metal (NM) as a spacer, so that spins injected from one of the FM electrodes travel through conducting channels of the NM spacer to the other FM electrode. Figure 18.1c and d show configurations of density of states (DOS) and spin-transfer paths from

593

THEORETICAL BACKGROUND

FM to NM and from NM to FM for the parallel and antiparallel spin cases. Spins of the left FM electrode transfer to the nonmagnetic metal and then to the right FM electrode, which has the same spin DOS as that of the left FM electrode. In this process, spin-up and spin-down carriers have different resistance due to the asymmetric spin DOS at both electrodes as described in Fig. 18.1e and f. Resistances for the parallel and antiparallel spin configurations are as follows: RP =

2(Rlarge Rsmall ) ≈ 2Rsmall (Rlarge + Rsmall )

and RAP =

Rlarge + Rsmall Rlarge ≈ 2 2

Thus, the GMR device gives a substantial MR value. When an insulator is used as a spacer, the spin transfer between two FM electrodes is achieved by quantum mechanical tunneling through the potential barrier due to the insulator, as shown in Fig. 18.2. The magnetoresistance through this mechanism is called TMR. As in the GMR device, both spin carriers have different resistance, as depicted in Fig. 18.2e and f. Resistance according to the relative spin configurations is given by RP =

Rlarge Rsmall ≈ Rsmall Rlarge + Rsmall

and RAP =

Rlarge 2

The spin flip during the tunneling process is negligible, so that the TMR can be directly expressed by spin polarization of the two FM contacts, as derived by Julli`ere34 : TMR =

2P1 P2 RAP − RP = RP 1 − P1 P2

(18.3)

Here P1(2) is the polarization of the first (second) FM electrodes: Pi =

Ni↑ (EF ) − Ni↓ (EF ) Ni↑ (EF ) + Ni↓ (EF )

(18.4)

with the number of spin-up electrons Ni↑ (EF ) and the number of spin-down electrons Ni↓ (EF ) at the Fermi level (EF ). Typical TMR values (∼100%) are larger than typical GMR values (∼10%). Relatively low MR values in GMR devices may originate from spin flip occurred during the diffusion of the injected spins through a NM spacer. 18.2.2 Molecular Electronics

Figure 18.3 is a schematic of a two-terminal molecular electronic device. Under an applied bias voltage, electrical currents are driven through the molecule(center) from the source (left) to the drain (right) electrodes. For small molecules whose spatial extension is smaller than the mean free path of the system, electron transport shows the ballistic behavior if the device has continuum bands, while it

594

MOLECULAR SPINTRONICS

Fig. 18.3

Two-terminal molecular electronic device.

shows resonant or nonresonant tunneling behavior if the device has discrete energy levels.8 Molecular orbitals (MOs) of the device provide channels for electron transport. Therefore, an accurate description of molecular energy levels in the junction is vital to understanding transport properties. As a molecule is bonded to metal electrodes, we need to take into account the following. First, there would be a significant charge transfer between electrodes and a molecule due to the dissimilarity of their electronic structures, resulting in the MO energy level shifts (). Second, the molecular states are coupled to the continuum states of the electrodes, and this coupling results in a finite broadening () of molecular energy levels. Consequently, the MO energy levels are renormalized by the contact effects in the junction as depicted in Fig. 18.4. Here, we discuss how to calculate the renormalized molecular energy levels and electrical currents through them. Before going into the detailed discussion, we describe how electrical currents are determined by alignment of the molecular energy levels with respect to the

Γ

ELUMO

EF

EHOMO

Contacted

Isolated

Fig. 18.4 Renormalization of the molecular energy levels in the metal–molecule contact. (From Ref. 8, with permission of RSC Publishing.)

THEORETICAL BACKGROUND

595

energy bands of both leads. As an external bias voltage is applied, the chemical potential of both electrodes is split by the bias voltage, giving rise to two different Fermi functions at both electrodes. The two Fermi functions determine the energy range to allow transmission of electrons, which is called the bias window . The incoming electrons would transmit through the broadened energy levels as depicted in Fig. 18.5. Some of them transmit with high probability, especially at the resonance energy level, whereas others are reflected. In this way, the transmission probability as a function of energy [T (ε)] is determined by the renormalized molecular energy levels. Finally, we can calculate the current (I ) by integrating this function over all energy ranges in the bias window restricted by the two Fermi functions [fL (ε) and fR (ε)] as follows: 2e ∞ T (ε)[fL (ε) − fR (ε)] dε (18.5) I= h −∞ where h is the Planck constant and e is the electron charge. It should be emphasized that the energy-level shift and broadening are very important to determine the transmission probability and electrical currents. Let us consider the simplest system having a single energy level. In this case, one can intuitively derive the explicit form of the transmission function. The energy broadening factor is related to the electron hopping rate between the energy states of the molecule and one of the electrodes by the energy–time uncertainty principle: E t = τ ∼ h

(18.6)

where τ is the lifetime of an electron in the molecular state, and thus the hopping rate is given by 1/τ(∼/ h). Using the definition of the current, we obtain the mL

mR

R(E)

T(E)

Fig. 18.5 (color online) Transmission probability in a molecular junction. R/T (E) is a reflection/transmission probability as a function of energy. μL/R is the chemical potential of the left/right electrode. T (E) + R(E) = 1. μL − μR = eV , where V is the applied bias voltage.

596

MOLECULAR SPINTRONICS

following formula for the current (IL ) from the left electrode to the molecule: e(N − NL ) L (18.7) =e (N − NL ) IL = τ h where L is the broadening factor due to the left contact, and N and NL [= 2fL (ε)] are the number of electrons in the molecule and the left electrode, respectively. In the same way, the current at the right contact is given by R e(N − NR ) =e (N − NR ) (18.8) IR = τ h where NR = 2fR (ε). Assuming that I = IL = −IR , we calculate the number of electrons in the molecular energy level at the steady state. Then we have N=

L fL (ε) − R fR (ε) L + R

(18.9)

and I (ε) =

2e L R [fL (ε) − fR (ε)] h L + R

(18.10)

On the other hand, the molecular energy level is broadened with a factor (= L + R ) due to the contact effect, as shown in Fig. 18.5. To take such an effect into account, the total current should be obtained by integrating the current as a function of energy in Eq. (18.10) over all the energy range with a weighting factor [D(ε)], which presents an energy-dependent distribution for the broadened molecular energy level: L R 2e ∞ D(ε) [fL (ε) − fR (ε)] dε (18.11) I= h −∞ L + R By comparing Eq. (18.11) with Eq. (18.5), we find that the transmission function for the single energy level is T (ε) = D(ε)

L R L + R

(18.12)

To extend formula (18.12) for the realistic case comprised of multienergy levels, we need to deal with the Keldysh NEGF method.22 18.2.3 Nonequilibrium Green’s Function Method for Quantum Transport

A target system that we want to describe in terms of the NEGF method is composed of the device molecule and the left and right electrodes (Fig. 18.3). To establish the Hamiltonian for the system, we start from an uncoupled state where

597

THEORETICAL BACKGROUND

each part is in its own equilibrium state independently, while the interaction terms between them are turned on later as a perturbative potential. By assuming that both electrodes are noninteracting systems, the Hamiltonian is Hα =

+ εkα ckα ckα

(18.13)

k + where ckα (ckα ) is the creation (annihilation) operator of an electron with momentum k and kinetic energy εkα for the α (= L,R) electrode region. For the device region, the form of the Hamiltonian depends on how to treat electron–electron or electron–phonon interactions. For the sake of simplicity, we concentrate on the noninteracting case. Then the Hamiltonian of the device part (Hdev ) is

Hdev =

εn dn+ dn

(18.14)

n

where dn+ (dn ) is the creation (annihilation) operator of the electron in the state |n with energy εn . We refer readers to the more specialized literature for generalization of the formalism in the case of interacting systems.22,35 In most practical calculations, the electron–electron interaction is effectively considered by the noninteracting Kohn–Sham potential using DFT. The coupling effect is taken into account by turning on the interaction potential term Vint,α between the device and electrode α: Vint,α =

+ τkα,n ckα dn + τ∗kα,n dn+ ckα

(18.15)

k,n

where τkα,n denotes the hopping term from state |n > to state |k >. Finally, the total Hamiltonian is given by H = Hdev + HL + HR + Vint,L + Vint,R

(18.16)

By definition, electrical currents from the left electrode to the device part (IL ) can be calculated from Heisenberg’s equation of motion22,35 : d ie (18.17) eNL (t) = [H, NL (t)] dt + (t)ckL (t) is the number operator of electrons in the left where NL (t) ≡ k ckL electrode. Since HL/R and Hdev commute with the number operator, Eq. (18.17) is simplified as IL =

IL =

ie ie + [Vint,L , NL (t)] = τkL,n ckL (t)dn (t) − τ∗kL,n dn+ (t)ckL (t) k,n

(18.18)

598

MOLECULAR SPINTRONICS

TABLE 18.1

Definition of Various Green’s Functions

Definition of Various Green’s Functionsa Grij (t, t ) = −iθ(t − t ) {ci (t), cj+ (t)} Gaij (t, t ) = θ(t − t) {ci (t), cj+ (t)} + G< ij (t, t ) = i cj (t )ci (t) + G> ij (t, t ) = −i ci (t)cj (t )

Gtij (t, t ) = −i T {cj+ (t )ci (t)} Gtij (t, t ) = −i T {cj+ (t )ci (t)}

Name Retarded Green’s function Advanced Green’s function Lesser Green’s function Greater Green’s function Time-ordered Green’s function Anti-time-ordered Green’s function

Physical Meaning

Particle propagator Hole propagator

Source: Ref. 22. a + ci (ci ) denotes the particle creation (annihilation) operator for state |i>. T (T ) is the time-ordering ˆ over the ˆ means the thermal average of the operator A (anti-time-ordering) operator. Symbol A grand canonical ensemble.

By introducing the lesser Green’s function defined in Table 18.1, Eq. (18.18) becomes IL =

e ∗ < τkL,n G< kL,n (t, t) + τkL,n Gn,kL (t, t) k,n

(18.19)

Equation (18.19) can be rewritten in the energy domain by using Fourier transform: e ∞ dε ∗ < [τkL,n G< (18.20) IL = n,kL (ε) + τkL,n GkL,n (ε)] k,n −∞ 2π Equation (18.20) indicates that the current at the left contact equals the sum of all possible contributions of the particle (electron) propagations from the arbitrary state |n > in the device part to an arbitrary state |k > in the left electrode, or vice versa. According to the Keldysh nonequilibrium Green’s function formalism, the lesser Green’s function in Eq. (18.20) is decomposed into the propagation part in the electrodes and the propagation part in the device molecule with a corresponding hopping term between them22 : G< kL,n (ε) =

t < t τkL,m [gkL,kL (ε)G< m,n (ε) − gkL,kL (ε)Gm,n (ε)]

(18.21)

< t τ∗kL,m [gkL,kL (ε)Gtn,m (ε) − gkL,kL (ε)G< n,m (ε)]

(18.22)

m

G< n,kL (ε) =

m

THEORETICAL BACKGROUND

599

Here we introduced time-ordered and anti-time-ordered Green’s functions from Table 18.1. In Eqs. (18.21) and (18.22), Gn,m (ε) represents particle propagation between states |n > and |m > in the device part, and gkL,kL (ε) denotes the Green’s function for the noninteracting left electrode: < gkL,kL (ε) = 2πif (ε)δ(ε − εk )

(18.23)

> (ε) = −2πi[1 − f (ε)]δ(ε − εk ) gkL,kL

(18.24)

By inserting Eqs. (18.21) and (18.22) into Eq. (18.20), one finally arrives at the following: ie ∞ r a dετL,n τ∗L,m ρL (ε){G< IL = n,m (ε) + fL (ε)[Gn,m (ε) − Gn,m (ε)]} n,m −∞ (18.25) where ρL (ε) is the density of states for the left electrode and we use the following relations22 : Gt (ε) + Gt (ε) = G> (ε) + G< (ε) and G> (ε) − G< (ε) = Gr (ε) − Ga (ε). In Eq. (18.25), Gr (ε) and Ga (ε) denote the retarded and advanced Green’s functions for the device part, respectively, which can be obtained by Fourier transformation of the retarded and advanced Green’s functions defined in Table 18.1 to the energy domain. We can evaluate the current at the right contact IR in the same way. For a steady state, which means that I = IL = −IR , the current in a matrix version is ie ∞ Tr{[fL (ε)L (ε) − fR (ε)R (ε)][Gr (ε) − Ga (ε)]} I = 2 −∞ + Tr{[L (ε) − R (ε)]G< (ε)} dε

(18.26)

where + r r L/R (ε) = 2τ+ L/R ρL/R (ε)τL/R = −2 Im[τL/R gL/R (ε)τL/R ] = −2 Im[L/R (ε)] (18.27) The L/R (ε) is twice the imaginary part of the retarded self-energy for the left/right electrodes [L/R (ε)]. The lesser Green’s function in the device part for the noninteracting system is defined by35

G< (ε) ≡ ifL (ε)Gr (ε)L (ε)Ga (ε) + ifL (ε)Gr (ε)R (ε)Ga (ε) Finally, one obtains the electrical current: e Tr[Ga (ε)R (ε)Gr (ε)L (ε)][fL (ε) − fR (ε)] dε I= h

(18.28)

(18.29)

The final expression for the noninteracting system is exactly the same as Eq. (18.5) if Eq. (18.29) is multiplied by 2 to take into account the spin

600

MOLECULAR SPINTRONICS

degeneracy. Thus, the transmission in the noninteracting regime is given by T (ε) ≡ Tr[Ga (ε)R (ε)Gr (ε)L (ε)]

(18.30)

The next step is to calculate the retarded/advanced Green’s function and the left/right coupling (i.e., self-energy) terms.

18.3 NUMERICAL IMPLEMENTATION

Theoretical description of quantum transport requires sophisticated calculations for a metal–molecule junction composed of a large number of atoms. Density functional theory (DFT), as reviewed in Chapters 1 to 3, enables us to perform accurate calculations of electronic structure for such a system at the firstprinciples level with computational efficiency. In addition, the NEGF method can easily be implemented in a usual DFT code, since an electron density, the main ingredient in DFT, can be obtained directly from the NEGF method for an open system. In this section we discuss the detailed numerical implementation issues of the NEGF method based on DFT. 18.3.1 Green’s Function

Accurate description of the metal–molecu

COMPUTATIONAL METHODS FOR LARGE SYSTEMS Electronic Structure Approaches for Biotechnology and Nanotechnology

Edited by

Jeffrey R. Reimers

A JOHN WILEY & SONS, INC., PUBLICATION

Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Computational methods for large systems : electronic structure approaches for biotechnology and nanotechnology / [edited by] Jeffrey R. Reimers. p. cm. Includes index. ISBN 978-0-470-48788-4 (hardback) 1. Nanostructured materials–Computer simulation. 2. Nanotechnology– Data processing. 3. Biotechnology– Data processing. 4. Electronics–Materials–Computer simulation. I. Reimers, Jeffrey R. TA418.9.N35C6824 2011 620 .50285– dc22 2010028359 Printed in Singapore oBook ISBN: 978047093077-9 ePDF ISBN: 978047093076-2 ePub ISBN: 978047093472-2 10 9 8 7 6 5 4 3 2 1

To Noel Hush who showed me the importance of doing things to understand the critical experiments of the day and the need for simple models of complex phenomena, and to George Bacskay who taught me the importance of getting the right answer for the right reason.

Contents Contributors

xiii

Preface: Choosing the Right Method for Your Problem

xvii

A

DFT: THE BASIC WORKHORSE

1

1

Principles of Density Functional Theory: Equilibrium and Nonequilibrium Applications

3

Ferdinand Evers

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 2

Equilibrium Theories, 3 Local Approximations, 8 Kohn–Sham Formulation, 11 Why DFT Is So Successful, 13 Exact Properties of DFTs, 14 Time-Dependent DFT, 19 TDDFT and Transport Calculations, 28 Modeling Reservoirs In and Out of Equilibrium,

34

SIESTA: A Linear-Scaling Method for Density Functional Calculations

45

Julian D. Gale

2.1 2.2 2.3 3

Introduction, 45 Methodology, 48 Future Perspectives, 73

Large-Scale Plane-Wave-Based Density Functional Theory: Formalism, Parallelization, and Applications

77

Eric Bylaska, Kiril Tsemekhman, Niranjan Govind, and Marat Valiev

3.1 3.2 3.3 3.4

Introduction, 78 Plane-Wave Basis Set, 79 Pseudopotential Plane-Wave Method, Charged Systems, 89

81

vii

viii

CONTENTS

3.5 3.6 3.7 3.8 3.9 3.10

Exact Exchange, 92 Wavefunction Optimization for Plane-Wave Methods, 95 Car–Parrinello Molecular Dynamics, 98 Parallelization, 101 AIMD Simulations of Highly Charged Ions in Solution, 106 Conclusions, 110

B

HIGHER-ACCURACY METHODS

117

4

Quantum Monte Carlo, Or, Solving the Many-Particle Schr¨odinger Equation Accurately While Retaining Favorable Scaling with System Size

119

Michael D. Towler

4.1 4.2 4.3 4.4 4.5 4.6 4.7 5

Introduction, 119 Variational Monte Carlo, 124 Wavefunctions and Their Optimization, Diffusion Monte Carlo, 137 Bits and Pieces, 146 Applications, 157 Conclusions, 160

127

Coupled-Cluster Calculations for Large Molecular and Extended Systems

167

Karol Kowalski, Jeff R. Hammond, Wibe A. de Jong, Peng-Dong Fan, Marat Valiev, Dunyou Wang, and Niranjan Govind

5.1 5.2 5.3 5.4 5.5 6

Introduction, 168 Theory, 168 General Structure of Parallel Coupled-Cluster Codes, 174 Large-Scale Coupled-Cluster Calculations, 179 Conclusions, 194

Strongly Correlated Electrons: Renormalized Band Structure Theory and Quantum Chemical Methods

201

Liviu Hozoi and Peter Fulde

6.1 6.2 6.3 6.4 6.5

Introduction, 201 Measure of the Strength of Electron Correlations, Renormalized Band Structure Theory, 206 Quantum Chemical Methods, 208 Conclusions, 221

204

CONTENTS

ix

C

MORE-ECONOMICAL METHODS

225

7

The Energy-Based Fragmentation Approach for Ab Initio Calculations of Large Systems

227

Wei Li, Weijie Hua, Tao Fang, and Shuhua Li

7.1 7.2 7.3 7.4 7.5 8

Introduction, 227 The Energy-Based Fragmentation Approach and Its Generalized Version, 230 Results and Discussion, 238 Conclusions, 251 Appendix: Illustrative Example of the GEBF Procedure, 252

MNDO-like Semiempirical Molecular Orbital Theory and Its Application to Large Systems

259

Timothy Clark and James J. P. Stewart

8.1 8.2 8.3 8.4 9

Basic Theory, 259 Parameterization, 271 Natural History or Evolution of MNDO-like Methods, Large Systems, 281

278

Self-Consistent-Charge Density Functional Tight-Binding Method: An Efficient Approximation of Density Functional Theory

287

Marcus Elstner and Michael Gaus

9.1 9.2 9.3 9.4 9.5

Introduction, 287 Theory, 289 Performance of Standard SCC-DFTB, 300 Extensions of Standard SCC-DFTB, 302 Conclusions, 304

10 Introduction to Effective Low-Energy Hamiltonians in Condensed Matter Physics and Chemistry Ben J. Powell

10.1 10.2 10.3 10.4 10.5

Brief Introduction to Second Quantization Notation, 310 H¨uckel or Tight-Binding Model, 314 Hubbard Model, 326 Heisenberg Model, 339 Other Effective Low-Energy Hamiltonians for Correlated Electrons, 349

309

x

CONTENTS

10.6 10.7

D

Holstein Model, 353 Effective Hamiltonian or Semiempirical Model?,

358

ADVANCED APPLICATIONS

367

11 SIESTA: Properties and Applications

369

Michael J. Ford

11.1 11.2 11.3 11.4

Ethynylbenzene Adsorption on Au(111), 370 Dimerization of Thiols on Au(111), 377 Molecular Dynamics of Nanoparticles, 384 Applications to Large Numbers of Atoms, 387

12 Modeling Photobiology Using Quantum Mechanics and Quantum Mechanics/Molecular Mechanics Calculations

397

Xin Li, Lung Wa Chung, and Keiji Morokuma

12.1 12.2 12.3 12.4

Introduction, 397 Computational Strategies: Methods and Models, Applications, 410 Conclusions, 425

400

13 Computational Methods for Modeling Free-Radical Polymerization

435

Michelle L. Coote and Ching Y. Lin

13.1 13.2 13.3 13.4 13.5

Introduction, 435 Model Reactions for Free-Radical Polymerization Kinetics, 441 Electronic Structure Methods, 444 Calculation of Kinetics and Thermodynamics, 457 Conclusions, 468

14 Evaluation of Nonlinear Optical Properties of Large Conjugated Molecular Systems by Long-Range-Corrected Density Functional Theory Hideo Sekino, Akihide Miyazaki, Jong-Won Song, and Kimihiko Hirao

14.1 14.2 14.3 14.4 14.5

Introduction, 476 Nonlinear Optical Response Theory, 478 Long-Range-Corrected Density Functional Theory, 480 Evaluation of Hyperpolarizability for Long Conjugated Systems, 482 Conclusions, 488

475

CONTENTS

15 Calculating the Raman and HyperRaman Spectra of Large Molecules and Molecules Interacting with Nanoparticles

xi

493

Nicholas Valley, Lasse Jensen, Jochen Autschbach, and George C. Schatz

15.1 15.2 15.3 15.4

Introduction, 494 Displacement of Coordinates Along Normal Modes, 496 Calculation of Polarizabilities Using TDDFT, 496 Derivatives of the Polarizabilities with Respect to Normal Modes, 500 15.5 Orientation Averaging, 501 15.6 Differential Cross Sections, 502 15.7 Surface-Enhanced Raman and HyperRaman Spectra, 506 15.8 Application of Tensor Rotations to Raman Spectra for Specific Surface Orientations, 507 15.9 Resonance Raman, 508 15.10 Determination of Resonant Wavelength, 509 15.11 Summary, 511 16 Metal Surfaces and Interfaces: Properties from Density Functional Theory

515

Irene Yarovsky, Michelle J. S. Spencer, and Ian K. Snook

16.1 16.2 16.3 16.4 16.5

Background, Goals, and Outline, 515 Methodology, 517 Structure and Properties of Iron Surfaces, 521 Structure and Properties of Iron Interfaces, 538 Summary, Conclusions, and Future Work, 553

17 Surface Chemistry and Catalysis from Ab Initio–Based Multiscale Approaches

561

Catherine Stampfl and Simone Piccinin

17.1 17.2 17.3 17.4 17.5

Introduction, 561 Predicting Surface Structures and Phase Transitions, 563 Surface Phase Diagrams from Ab Initio Atomistic Thermodynamics, 568 Catalysis and Diffusion from Ab Initio Kinetic Monte Carlo Simulations, 576 Summary, 584

18 Molecular Spintronics Woo Youn Kim and Kwang S. Kim

18.1 18.2 18.3

Introduction, 589 Theoretical Background, 591 Numerical Implementation, 600

589

xii

CONTENTS

18.4 18.5

Examples, 604 Conclusions, 612

19 Calculating Molecular Conductance

615

Gemma C. Solomon and Mark A. Ratner

19.1 19.2 19.3 19.4 19.5 19.6 19.7 Index

Introduction, 615 Outline of the NEGF Approach, 617 Electronic Structure Challenges, 623 Chemical Trends, 625 Features of Electronic Transport, 630 Applications, 634 Conclusions, 639 649

Contributors

Jochen Autschbach,

University at Buffalo–SUNY, Buffalo, New York

Eric Bylaska, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Lung Wa Chung, Fukui Institute for Fundamental Chemistry, Kyoto University, Kyoto, Japan Timothy Clark, Computer-Chemie-Centrum, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany Michelle L. Coote, ARC Centre of Excellence for Free-Radical Chemistry and Biotechnology, Research School of Chemistry, Australian National University, Canberra, Australia Wibe A. de Jong, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Marcus Elstner, Institute of Physical Chemistry, Universit¨at Karlsruhe, Karlsruhe, Germany; Institute for Physical and Theoretical Chemistry, Technische Universit¨at Braunschweig, Braunschweig, Germany Ferdinand Evers, Institute of Nanotechnology and Institut f¨ur Theorie der Kondensierten Materie, Karlsruhe Institute of Technology, Karlsruhe, Germany Peng-Dong Fan, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Tao Fang, School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing, China Michael J. Ford, School of Physics and Advanced Materials, University of Technology, Sydney, NSW, Australia Peter Fulde, Max-Planck-Institut f¨ur Physik komplexer Systeme, Dresden, Germany; Asia Pacific Center for Theoretical Physics, Pohang, Korea Julian D. Gale, Department of Chemistry, Curtin University, Perth, Australia xiii

xiv

CONTRIBUTORS

Michael Gaus, Institute of Physical Chemistry, Universit¨at Karlsruhe, Karlsruhe, Germany; Institute for Physical and Theoretical Chemistry, Technische Universit¨at Braunschweig, Braunschweig, Germany Niranjan Govind, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Jeff R. Hammond, The University of Chicago, Chicago, Illinois Kimihiko Hirao,

Advanced Science Institute, RIKEN, Saitama, Japan

Liviu Hozoi, Max-Planck-Institut f¨ur Physik komplexer Systeme, Dresden, Germany Weijie Hua, School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing, China Lasse Jensen,

Pennsylvania State University, University Park, Pennsylvania

Kwang S. Kim, Center for Superfunctional Materials, Pohang University of Science and Technology, Pohang, Korea Woo Youn Kim, Center for Superfunctional Materials, Pohang University of Science and Technology, Pohang, Korea Karol Kowalski, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Shuhua Li, School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing, China Wei Li, School of Chemistry and Chemical Engineering, Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, Nanjing University, Nanjing, China Xin Li, Fukui Institute for Fundamental Chemistry, Kyoto University, Kyoto, Japan Ching Y. Lin, ARC Centre of Excellence for Free-Radical Chemistry and Biotechnology, Research School of Chemistry, Australian National University, Canberra, Australia Akihide Miyazaki, Toyohashi University of Technology, Toyohashi, Japan Keiji Morokuma, Fukui Institute for Fundamental Chemistry, Kyoto University, Kyoto, Japan; Cherry L. Emerson Center for Scientific Computation and Department of Chemistry, Emory University, Atlanta, Georgia Simone Piccinin, CNR-INFM DEMOCRITOS National Simulation Center, [email protected] Group, Trieste, Italy

CONTRIBUTORS

xv

Ben J. Powell, Centre for Organic Photonics and Electronics, School of Mathematics and Physics, The University of Queensland, Queensland, Australia Mark A. Ratner, Northwestern University, Evanston, Illinois George C. Schatz, Northwestern University, Evanston, Illinois Hideo Sekino, Toyohashi University of Technology, Toyohashi, Japan Ian K. Snook, Applied Physics, School of Applied Sciences, RMIT University, Victoria, Australia Gemma C. Solomon, Northwestern University, Evanston, Illinois Jong-Won Song, Advanced Science Institute, RIKEN, Saitama, Japan Michelle J. S. Spencer, Applied Physics, School of Applied Sciences, RMIT University, Victoria, Australia Catherine Stampfl, School of Physics, The University of Sydney, Sydney, Australia James J. P. Stewart, Stewart Computational Chemistry, Colorado Springs, Colorado Michael D. Towler, TCM Group, Cavendish Laboratory, Cambridge University, Cambridge, UK Kiril Tsemekhman, University of Washington, Seattle, Washington Marat Valiev, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Nicholas Valley, Northwestern University, Evanston, Illinois Dunyou Wang, William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington Irene Yarovsky, Applied Physics, School of Applied Sciences, RMIT University, Victoria, Australia

Preface: Choosing the Right Method for Your Problem Computational methods have now advanced to the point where there is choice available for almost any problem in nanotechnology and biotechnology. In this book, the various methods available are presented and applications developed. Given the difficulty in solving (relativistic) quantum mechanical equations for systems containing thousands of atoms, this situation is truly amazing and demonstrates the results of dedicated work by many researchers over a long period of time. Once demeaned by researchers as being useless for everything practical, computational methods have come into their own, providing fresh insight and predictive design power for wide-ranging problems: from superconductivity to semiconductivity to giant magnetoresistance to molecular electronics to spintronics to natural and synthetic polymer composition and properties to color design to nonlinear optics to energy flow to electron transport to catalysis to protein function to drug design. Although much modern software is to be commended for its accessibility and ease of use, this advantage can be a luring trap. Electronic structure calculations on systems of any size are never simple. Many things can go wrong, and just because a method has always done the job in the past doesn’t mean that it will continue to do so for a new problem that may appear very similar but which in fact embodies an additional unexpected effect. Proper understanding of the methods, including their strengths and weaknesses, is always essential. This book sets out to provide the background required for a range of approaches, containing extensive literature references to many of the subtle features that can arise. Practical examples of how this knowledge should be applied are then given. Amazing as progress has been, many significant problems in physics, chemistry, biology, and engineering will forever remain outside the reach of direct quantum mechanical electronic structure calculations. By no means does this mean that the technologies now available cannot be usefully employed to tackle these problems, however, and a significant part of this book is devoted to multiscale-linking methods. For example, the surfaces of most heterogeneous catalysts are extremely complex, and hundreds of chemical reactions may be involved. Applications of this type of problem include the combustion of fossil fuels, atmospheric pollution modeling, and many industrial chemical reactions and smelting processes. Natural and synthetic polymers present similar challenges. What existing electronic structure methods offer is the data to go into more complex, perhaps multiscale models of the phenomena. Other xvii

xviii

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

examples in quite different areas include protein folding, biological processes on the microsecond-to-second time scale, including the origin of intelligence, and long-range strong electron correlations in superconductors and other materials. The fortunate position that we are in today is owed primarily to the development of density functional theory (DFT). This is the basic workhorse for electronic structure computations on large systems, being appropriate for biological, chemical, and physical problems. Part A of the book is devoted to the fundamentals of DFT, stressing the basics in Chapter 1 and then its two most common implementations strategies, atomic basis sets in Chapter 2 and planewave basis sets in Chapter 3. In the early days, atomic basis sets were designed to solve the burning issues at the time, such as the nature of the hydrogen molecule and the water molecule, while plane-wave basis sets could tackle problems of similar difficulty, such as the structure of simple metals. Today, both types of methods can be applied to almost any problem, each with its own advantages and disadvantages. An important feature of Chapter 1 is that it describes not only traditional DFT for the ground state of molecules and materials but also modern time-dependent approaches designed for excited states and nonequilibrium transport environments. Deliberately missing from this book is an extensive discussion of which density functional to use. This may seem a terrible oversight in a book that is really intended as a practical tool for a new science. DFT gives the exact answer if the exact density functional is used, but alas this is unknown and perhaps even unknowable. So what we now have is a situation in which computational programs can let the user select between hundreds of proposed approximate functionals, or even make a new one. However, from a practical perspective, the situation is not that bad. Only a handful of density functionals are in common use, with just 14 mentioned in this book (B3LYP, B97D, BLYP, BOP, BP86, CAM-B3LYP, LC-BOP, LDA, LDA+DMFT, LDA+U, PBE, PBE0, PW91, and SOAP), with the most commonly used functionals being B3LYP, LDA, PBE, and PW91. B3LYP is the most commonly used functional for chemical problems, owing to its inclusion of more physical effects, whereas PW91 and PBE are the most commonly used functionals in the physics community, as they are typically good enough in these applications and are much faster to implement. A density functional is not a single unit but usually comes as a combination of various parts, each intended to include some physical effect. Choosing a functional that includes all of the physical effects relevant to a particular application is thus essential. In this book the applications chapters provide significant discussion as to which functionals are appropriate for common applications. Many specialized functionals exist that are not discussed, so although the book describes what is good for most, experienced users should be aware that other attractive options do exist. The most common physical effects included in modern density functionals are short-range correlation, short-range exchange, long-range correlation, long-range exchange, asymptotic correction, and strong correlation. All density functionals include short-range correlation and short-range exchange, with LDA including

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

xix

only these contributions and thus being one of the simplest and most computationally expedient functionals available. LDA gives the exact answer for the free-electron gas, a problem to which many simple metals can realistically be compared. When the nature of the atomic nuclei become important, this functional takes the wrong qualitative form, however. Nevertheless, it provides a useful point even in the worst-case scenarios and hence forms a simple and useful approach. It does not provide results of sufficient accuracy to address any chemical question, however, so its realistic use is confined to a few problems involving simple metals. The next simplest functionals improve on LDA by adding a derivative correction to the local correlation description and are generically termed generalized-gradient (GGA) approximations, with classic functionals of this type including BP86, PW91, and PBE. In general, GGAs provide descriptions that attain chemical accuracy and hence can be widely applied. Sometimes LDA provides results in better agreement with experiment than common GGAs, however, and researchers are thus tempted simply to use LDA. This is a very bad practice, as GGAs always contain more of the essential physics than does LDA, and what is required instead is to move to a more complex functional that includes even more interactions. Get the right answer for the right reason. In widespread use for chemical properties are hybrid functionals such as B3LYP and PBE0, which include long-range exchange contributions in the density functional. This improves magnetic properties, long-range interactions, excited- and transition-state energetics, and so on. Such methods are intrinsically much more expensive than GGAs, however. Recent advances of great relevance to biological simulations include the development of density functionals containing long-range exchange, such as B97D, as is required to model dispersive van der Waals intermolecular interactions. As the exchange and correlation parts of the density functionals are obtained independently, physical constraints concerning their balance are not usually met, leading to errors in their properties at long range that become important for charge separation processes, extended conjugation, band alignment at metal–molecule interfaces, and so on. Modern functionals such as CAM-B3LYP and LC-BOP contain corrections that reestablish the proper balance, improving properties computed. Finally, approaches such as LDA+U provide explicit empirical corrections for the extremely short range, strong electron correlation effects that dominate the chemistry of the rare earth elements, for example, and are often relevant for metal-to-insulator transitions and superconductivity. Over the next decade, the future for density functional theory looks bright. There is much current interest not only in developing corrections to account for the shortcomings of standard GGA-type functionals, but there is also keen interest in developing new classes of functionals that contain intrinsically the correct asymptotic properties for electrons in molecules. This should dramatically simplify functional design and implementation, making the use of DFT much easier for users. Certainly the most significant issue with current implementations of DFT is that no systematic process exists for improving functionals toward the illusive

xx

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

exact functional. This is where alternative computational strategies of an ab initio nature can be very useful. Part B of the book looks at methods that can be used when modern DFT just doesn’t work. Historically, the most common ab initio method for electronic structure calculation has been Hartree–Fock configurationinteraction theory. This involves use of a simplistic approximation, that proposed by Hartree and Fock, followed by expansions that converge to or even explicitly determine the exact answer (within the basis set used). The Hartree–Fock approximation itself is about as accurate as LDA and is not suitable for studying chemical problems, but like LDA can provide good insight into the operation of more realistic approaches. Although codes exist that can in principle give the exact solution to any problem, in practice this can only be achieved for the smallest systems, certainly nothing of relevance to this book. As a result, some empirically determined level of truncation of the ab initio expansion is necessary (coupled to a choice of basis set, of course), making their practical use rather similar to that of DFT—always find out what works for your problem using model systems for which the correct answer is known. The coupled-cluster method provides the “gold standard” for chemical problems, often producing results to an order-of-magnitude higher accuracy than can be achieved by DFT, but at much greater computational expense. Nevertheless, how such methods can be applied to large systems of nanotechnological and biotechnological relevance is shown in Chapter 5. These methods fail for metals, however, and so are less popular in solid-state physics applications. They handle strong electron correlations properly and easily, of course, and how they may be combined with DFT to solve such key problems as those relevant to metal–insulator transitions and superconductivity, the combination allowing the strengths of each method to be exploited while circumventing the weaknesses, is described in Chapter 6. Hartree–Fock-based approaches will always scale extremely poorly as the system size increases, and an alternative ab initio method exists that scales much better while being applicable to molecules and metals alike: quantum Monte Carlo. The problem with this method has always been its startup cost, as even the simplest systems require enormous computational effort. But the time has now come where algorithms and computers are fast enough to solve many chemical and physical problems to a specifiable degree of accuracy. The method has come of age, and these advances are reviewed in Chapter 4. Because of the excellent scaling properties of this method, applications to larger and larger systems can now be expected to appear at a rapid rate. But no matter how far computational methods such as DFT, configuration interaction, or quantum Monte Carlo methods advance, the researcher will hunger for the ability to treat larger systems, even if at a more approximate level. Part C of this book addresses these needs. Chapter 7 covers approximate but accurate schemes for implementing DFT and other methods that allow complex systems to be broken down into discrete fragments, achieving considerable computational savings while allowing chemical intuition to be used to ensure accuracy. Chapter 8 describes semiempirical Hartree–Fock-based approaches in which most of the interactions are neglected and the remainder parameterized,

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

xxi

leaving a priori computation schemes that at times achieve chemical accuracy and are available for all atoms except the rare earths. A similar approach, but this time modeled after DFT, is described in Chapter 9. The DFT approach widely applicable to both biological systems and materials science but requires parameters to be determined for every pair of atoms in the periodic table, providing increased accuracy at the expense of severe implementational complexity. It is now sufficiently parameterized to meet wide-ranging needs in biotechnology and nanotechnology. Even so, some problems, such as superconductivity and the Kondo effect, require the study of electron correlations on length scales well beyond the reach of semiempirical electronic structure calculations. In Chapter 10 we look at a range of basic chemical models that describe the essential features of such systems empirically, leaving out all nonessential aspects of the phenomena in question. These methods follow from the analytical models used to put together the basics of chemical bonding and band structure theories in the 1930s–1960s, with the semiempirical methods described in Chapters 8 and 9 also originating from these sources. Accurate electronic structure calculations remain important, but in Chapter 10 we see that they only need to be applied to model systems to generate the empirical parameters that go in the electronic structure problem of the full system. So, no matter what the size of the system, electronic structure methods are now in a position to contribute to the modeling of real-world problems in nanotechnology and biotechnology. Choosing whether to use empirical models parameterized by high-level calculations, use the DFT workhorse, or use methods that allow systematic improvement toward the exact answer is now a pleasant problem for researchers to ponder. Just because a certain type of problem has been solved historically by one type of approach does not mean that this is the best thing to do now . I hope that this book will allow informed choices to be made and set new directions for the future. Part D presents applications of electronic structure methods to nanoparticle and graphene structure (Chapter 11), photobiology (Chapter 12), control of polymerization processes (Chapter 13), nonlinear optics (Chapter 14), nanoparticle optics (Chapter 15), heterogeneous catalysis (Chapters 16 and 17), spintronics (Chapter 18), and molecular electronics (Chapter 19). This book has its origins in the Computational Methods for Large Systems satellite meeting at the very successful WATOC-2008 conference organized by Leo Radom in Sydney, Australia. I hope the book captures some of the excitement of that meeting and the overwhelming feeling that we are now at the tip of an enormous expansion of electronic structure computation into everyday research in newly emerging technologies and sciences. I have had a go at most things described in this book at some stage of my career, and can vouch for a lot of it. As for the rest, well, they are things that I always wanted to do! I hope that you enjoy reading the book as much as I have enjoyed editing it.

xxii

PREFACE: CHOOSING THE RIGHT METHOD FOR YOUR PROBLEM

Color Figures

Color versions of selected figures can be found online at ftp://ftp.wiley.com/public/sci_tech_med/computational_methods Acknowledgments

I would like to thank Dianne Fisher and Rebecca Jacob for their help in assembling the book, Anita Lekhwani at Wiley for the suggestion of making a book based around WATOC-2008, Leo Radom for organizing WATOC-2008, and the many referees whose anonymous but difficult work helped so much with its production. Jeffrey R. Reimers School of Chemistry The University of Sydney January 2010

PART A DFT: The Basic Workhorse

1

Principles of Density Functional Theory: Equilibrium and Nonequilibrium Applications FERDINAND EVERS Institute of Nanotechnology and Institut f¨ur Theorie der Kondensierten Materie, Karlsruhe Institute of Technology, Karlsruhe, Germany

Arguably, the most important method for electronic structure calculations in intermediate- and large-scale atomic or molecular systems is density functional theory (DFT). In this introductory chapter we discuss fundamental theoretical aspects underlying this framework. Our aim is twofold. First, we briefly explain our view on several aspects of DFTs as we understand them. Second, we discuss the fundamentals underlying applications of DFT to transport problems. Here, we offer a derivation of the salient equations which is based on single-particle scattering theory; the more standard approach relies on the nonequilibrium Green’s function (or Keldysh) technique. More practical aspects of applying DFT to large systems such as nanoparticles, liquids, large molecules, and proteins are described in Chapter 2 (using atomic basis sets) and Chapter 3 (using plane-wave basis sets). Other recent reviews of basic application procedures by K¨ummel and Kronik1 and Neese2 are also available. Chapters 11 to 19 focus on applications, introducing extensions of the basis methods when required. 1.1 EQUILIBRIUM THEORIES

The interacting N -electron problem is a formidable challenge for the theoretical disciplines of physics and chemistry. It is formulated in terms of a Hamiltonian, Hˆ , which has the general structure Hˆ =

i

[ε(pˆ i ) + vex (rˆ i )] +

1 u(rˆ i − rˆ j ) 2 ij

(1.1)

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

3

4

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Here we have introduced the following notation: vex describes the system-specific time-independent external potential, which is generated, for example, due to the atomic nuclei. ε(p) denotes the dispersion of the free particle, establishing the relation between the momentum of the particle and its energy in free space (i.e., in the absence of vex and the third term in u). For example, a single free particle with mass m has a dispersion ε(p) = p2 /2m. The third term introduces the twoparticle interactions [e.g., u(r) = e2 /|r| for the Coulomb case]. (We indicate an operator by Oˆ to distinguish it from its eigen- or expectation values.) Density functional theory in its simplest incarnation serves to calculate several ground-state (GS) properties of this interacting many-body system. For example, one obtains the GS particle density, n(r), the GS energy, E0 , or the workfunction (ionization energy), W. DFT owes its attractiveness to the fact that all of this can be obtained, in principle, by solving an optimization problem for the GS density alone without going through the often impractical explicit calculation of the GS wavefunction, 0 , of the Hamiltonian (1.1). The actual task is to find a density profile, n(r), so that the functional inside the brackets, ˜ + drvex (r)n(r) E0 = min F [n] ˜ (1.2) n(r) ˜

is invariant under small variations, δn(r). ˜ Here F is a certain functional of the test density n(r) ˜ that depends on the free dispersion, ε(p), and the type of twoparticle interactions, but not on the (static) environment, vex (r). [The explicit definition of F is given in Eq. (1.10)]. The optimizing density coincides with the GS density and the optimum value of the functionals inside brackets delivers the GS energy. 1.1.1 Density as the Basic Variable

At first sight, the very existence of a formalism that allows us to obtain the GS properties mentioned without evaluating 0 itself may perhaps be surprising. After all, the particle density appears to involve a lot fewer degrees of freedom than 0 , which is the canonical starting point for calculation of the expectation values of the observables. Indeed, 0 (r1 , . . . , rN ) is a complex field that depends on the individual coordinates of each of the N particles. By contrast, the density is an expectation value of the density operator: n(r) ˆ =

N

δ(r − rˆ i )

(1.3)

i=1

which may be obtained by integrating out most of the coordinates (“details”) of 0 : (1.4) n(r) = dr1 · · · drN δ(r − ri )|0 (r1 , . . . , rN )|2 i

n(r) is a real field depending on a single coordinate only.

EQUILIBRIUM THEORIES

5

At a second glance, however, the essential concepts underlying DFT are quite naturally understood. From a certain perspective, most of the information content of the ground state 0 is redundant. To see why this is a case, we discuss an example. Consider all thermodynamic properties of a system described by the Hamiltonian (1.1). Each property corresponds to calculating some ratio of expectation values: O=

ˆ −βHˆ ] Tr[Oe Tr[e−βHˆ ]

(1.5)

with an inverse temperature, β = 1/kT , and Oˆ denoting the operator corresponding to the observable of interest. The important thing to notice is that the system characteristics enter the average only via Hˆ . Therefore, within a given set of systems with members sharing the same kinetic energy and two-body interaction (“universality class”), all system specifics (i.e., observables) are determined uniquely by specifying the external potential , so O is a functional of vex: O[vex ]. This simple observation already implies that within such a universality class, the system behavior can be reconstructed from knowledge of a scalar field [here vex (r)], and in this sense most of the information content of 0 is redundant. In the Schr¨odinger theory, the classifying scalar field is the external potential. DFT amounts to a change of variables that replaces vex (r) → n(r). Such a transformation is feasible because the density operator and the external potential v ( r ˆ ) = drvex (r)n(r). ˆ Therefore, the average vex enter Hˆ as a product, N i=1 ex i density and vex are conjugate variables and a relation n(r) =

∂E0 [vex ] ∂vex (r)

(1.6)

holds true. Under the assumption that Eq. (1.6) can be inverted (at least “piecewise”), we can employ a Legendre transformation to facilitate the change in variables from vex to n: (1.7) F [n] = E0 [vex ] − dr n(r)vex (r) where the external potential is now the dependent variable given by vex (r) =

−∂F [n] ∂n(r)

(1.8)

Thus, it is suggested that the density n can also be considered the fundamental variable, so that observables are functionals thereof. The ground-state energy is an example of this. Summarizing: Underlying DFT is the insight that within a given universality class, each physical system can be identified uniquely either by the belonging “environment,” vex (r), or by its GS density, n(r). Therefore, in principle, knowing just the ground-state density is enough information to determine any observable (equilibrium) quantity of a many-body system.

6

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Remarks

• •

A formal proof that the density can act as the fundamental variable was presented by Hohenberg and Kohn3 ; see Section 1.1.1. A generalization of DFT to spin or current DFT may be indicated for systems with degeneracies. Then additional fields such as magnetization and current density are needed to distinguish among the system states.

1.1.2 Variational Principle and Levy’s Proof

Just the mere statement that equilibrium expectation values of observables can be calculated from some functionals once the GS density, n, is known, is not very helpful. For DFT to be self-consistent, also needed is a procedure to obtain this GS density by not referring to anything other than the functionals of n itself. This is where the variational principle kicks in, which says that the GS has a unique property in that it minimizes the system’s total energy. This implies, in particular, that the GS has a density that minimizes (for a fixed environment vex ) the functional E0 [n]. Hence, we can find n by solving the optimization problem (1.2), involving only variations of the density. A particularly instructive derivation of Eq. (1.2) has been given by Levy.4 We summarize the essential logical steps, to remind ourselves that the connection between the variational principle and DFT is actually deep and not related only to practical matters. In fact, Levy’s proof starts with the variational principle for the GS. It implies that there is a configuration space, C, of totally anti˜ with the normalization property N = dr | ˜ n(r)| ˜ symmetric functions, , ˆ ), ˆ ˜ ˜ ˜ together with a functional E[] = |H | defined on this space, which is optimized by the GS, 0 , with the GS energy, E0 , being the optimum value; explicitly, ˜ = | ˜ Tˆ + Uˆ | ˜ + E[]

dr vex (r)n(r) ˜

(1.9)

where Tˆ abbreviates the kinetic energy and Uˆ the interaction energy appearing ˜ The trick in Levy’s in Eq. (1.1), and n˜ is the particle density associated with . argument is to organize the minimum search in two steps. In the first step the total configuration space, C, is subdivided into subspaces such that all wavefunctions ˜ n(r)| ˜ Next, inside a given subspace have identical density profiles n˜ = | ˆ . within each subspace a search is launched for the elements that minimize E. Thus, a submanifold, Mpreopt , is identified which contains a set of “preoptimized” elements. By construction, each element n˜ of Mpreopt is uniquely labeled by the associated density profile n˜ (see Fig. 1.1). In the second step, the minimum search is continued, but it can now be restricted to finding the one element, 0 , of Mpreopt that minimizes E. The motivation behind this particular way of organizing the search is the following: The division procedure in step 1 has been constructed such that the second term in Eq. (1.9) does not contribute to preoptimizing; within a given

EQUILIBRIUM THEORIES

7

preopt

~ n3 ~ n1

~ n2

Fig. 1.1 (color online) Schematic Al representation of the constraint search strategy in C space. One sorts the space of all possible (i.e., antisymmetrized, normalizable) wavefunctions into submanifolds. By definition, wavefunctions belonging to the same submanifold generate the same density profile, n(r). ˜ Each submanifold has a wavefunction [n(r)] ˜ (at fixed external potential vex ), which has the lowest energy. These wavefunctions sit on a hypersurface (a “line”) in the configuration space which is parameterized by n(r). ˜ The surface is continuously connected if the evolution of [n(r)] ˜ with the density profile is smooth (i.e., if degenerate shells with more than one optimum state do not exist). (We identify with each other states that differ only by a spatially homogeneous phase.) Typically, for every external potential, vex , there is exactly one such surface. The groundstate energy is found by going over the surface and searching for the global energy minimum.

subspace it is just a constant. In this step, only the first term is minimized, with an extremal value, F [n] ˜ ≡ n˜ |Tˆ + Uˆ |n˜

(1.10)

The important observation is that by construction the functional F [n] ˜ is universal (i.e., independent of external conditions, vex ). (This statement is contained in the Hohenberg–Kohn theorem.3 ) Therefore, F is found by preoptimizing once and for all. After F has been identified, the calculation of system-specific properties (depending on vex ), which was described in Eq. (1.2), requires only a restricted search within the submanifold Mpreopt . The benefit is tremendous, since the volume to be searched, Mpreopt , is tiny compared to the original wavefunction space C. Remarks • F [n] has the exact property

∂F [n] ˜ + vex (r) = μ ∂ n(r) ˜ n=n ˜

8

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

• •

Proof: The ground-state density, n, is an extremal point by construction under the constraint N = dr n(r). ˜ Introducing a Lagrange parameter, μ, we can release theconstraint and perform an unrestricted search minimizing F [n] ˜ + μN + dr[vex (r) − μ]n(r). ˜ The claim follows after functional differentiation. The minimum search in Eq. (1.2) is in a space of scalar functions n, ˜ which have the property that they are “-representable”: For a given n(r) ˜ there ˜ n(r)| ˜ This is at least one element of C with the property n(r) ˜ = | ˆ . implies, for example, positivity: n˜ ≥ 0. We presented Levy’s argument for ground-state DFT. It is obvious, however, that the restriction to GS and the collective mode “density” was not crucial. Only the variational principle and a linear coupling of an environmental field to some collective mode (e.g., density, spin density, current density) should be kept. Therefore, generalizations of ground-state DFT to many other cases have been devised: for example, (equilibrium) thermodynamic DFT at nonzero temperature, magnetic properties (spin DFT and current DFT), and relativistic DFTs. Moreover, it has been shown that certain excited states can also be calculated exactly with a ground-state (spin) DFT. This happens when the Hamiltonian, Hˆ , exhibits symmetries, such as spin rotational invariance. Then the Hilbert space decomposes into invariant subspaces each carrying its own quantum number(s), q: for example, a spin multiplicity. The minimum search may then proceed in every subspace, separately, giving a separate functional Fq for each of them. The local q-minima thus obtained are valid eigenstates of the full Hamiltonian (Gunnarsson–Lundqvist theorem5 ).

1.2 LOCAL APPROXIMATIONS

The precise analytical dependency of the energy functional F [n] on the density n(r) is not known, of course. Available approximations employ knowledge, analytical and computational, about homogeneous interacting Fermi gases (i.e., the case vex = const). Indeed, it turns out that the homogeneous system also provides a very useful starting point to build up a zeroth-order description in the inhomogeneous environments that are relevant for describing atoms and molecules. 1.2.1 Homogeneous Electron Gas

Homogeneous gases are relatively simple. The particle density, n, is just a parameter and all functionals, which in general involve multiple spatial integrals over expressions involving n(r) at different positions in space (nonlocality property), turn into functions of n. Analytical expressions for them can usually be derived from perturbative treatments of E0 (n), which are justified in two limiting cases: where a control parameter, rs , is either very large or very small. For the homogeneous electron gas, rs can easily be identified: It is the ratio of two energies. The first energy is the typical strength of the interaction that two

LOCAL APPROXIMATIONS

9

particles feel in the electron gas in three-dimensional space: (e2 /ε0 )n1/3 . To see whether or not this energy is actually sizable, one should compare it to another energy. The correct energy scale to consider will be a measure of the kinetic energy of the particles. The average kinetic energy of a fermion depends on the gas density, n. To derive an explicit expression, we recall that due to the Pauli principle, all particles that share the same spin state must be in different momentum states, |p. Therefore, when filling up the volume, higher and higher momentum states, up to a maximum momentum value, pF , will be occupied. The kinetic energy of the particles occupying the highest-energy (Fermi energy) states, εF (n) ≡ ε(pF ), will be a good measure for the typical kinetic energy of a gas particle. The situation is best visualized recalling the familiar quantum mechanical textbook problem of “a particle in a box” with box size L. The energy levels of the box can be ordered according to the number of nodes exhibited by the corresponding wavefunctions. The spatial distance between two nodes gives half the wavelength, λ/2, with an associated wavenumber k = 2π/λ and momentum p = k. The maximum wavelength reached by N particles (with spin 12 ) filling the box is λF /2 = L/(N/2) = 2/n, giving rise to a maximum wavenumber, the Fermi wavenumber kF = πn/2, and a maximum momentum pF = kF . In three dimensions, similar considerations yield πkF3 /3 = (2π)3 (n/2). Employing these results, our dimensionless parameter can now be specified as rs ∼ e2 n1/3 /ε0 εF (n), which conventionally is cast into the form 1 4π 3 rs = 3 3 na0 stipulating a parabolic dispersion ε(p) = p2 /2m (ε0 : effective dielectric constant; ˚ Bohr’s radius). Analytical expansions of E0 (n) are a0 = 4πε0 2 /me2 ≈ 0.529 A: available in the limiting cases 1/rs 1 or rs 1. Typically, in particular with molecular systems, one has the marginal case rs 1. Here, computational methods such as quantum Monte Carlo calculations (see Chapter 4) help to interpolate the gap. Motivated from the weakly interacting limit (rs 1), conventionally we consider the following splitting of the GS energy per unit volume† : ε(k) + vXC (n) (1.11) ε˘0 (n) = 2 |k|≤kF (n)

For homogeneous densities, the Hartree term reads n dr u(r − r ). Since the spatial summation over the Coulomb potential, ∼1/r, does not converge, the integral makes a contribution to the energy balance which is formally infinite. This divergency is an artifact of modeling the interacting electron gas without taking into consideration the (positive) charge of those atomic nuclei (“counter charges”) that provide the source of the electrons to begin with. The physical system is always (close to) charge neutral, so that (on average) nnuclei = −nelectrons . This implies that the nuclei provide a “background” potential, nnuclei dr u(r − r ), that leads to an exact cancellation of the divergent contribution in the Hartree term. Therefore, this particular term should be ignored when dealing with the homogeneous electron system (the Jellium model). †

10

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

where the factor of 2 accounts for the electron spin. The first term comprises the kinetic energy of the free gas. Its dependency on the density is regulated via the Fermi wavenumber, kF (n). The second term includes the remaining correlation effects and therefore has a weak coupling expansion. For the Coulomb case, the leading term is ∼1/rs with subleading corrections,6 vXC (n) = −n

0.9163 + n[−0.094 + 0.0622 ln rs + 0.018rs ln rs + O(rs )] (1.12) rs

in Rydberg units (ERy = EHartree /2 ≈ 13.6 eV). 1.2.2 Local Density Functional

The information taken from homogeneous systems for constructing functionals describing inhomogeneous systems is the dependency of the GS energy per volume on the particle density, ε˘ 0 (n). A leading-order approximation for the general F -functional is obtained by (1.13) F [n] = dr˘ε0 (n(r)) This approximation is valid if the inhomogeneous system is real-space separable, meaning that it can be decomposed into a large number of subsystems that (1) still contain sufficient particles to allow for treatment as an electron gas with a finite density, (2) are already small enough to be nearly homogeneous in density, and (3) have negligible interaction with each other. Systems exhibiting a relative change of density, which is large even on the shortest length scale available, the Fermi wavelength λF , do not satisfy (1) and (2) simultaneously. So a minimal condition for the applicability of Eq. (1.13) is λF ∇n n 1

(1.14)

Remarks (3) implies that the interaction is short range, ideally u(r − r ) ∼ • Condition δ(r − r ). For the Coulomb case, we separate from the 1/|r − r |-interaction a long-range term, which is then treated by introducing an extra term, the Hartree potential. • Since the Fermi wavelength itself depends on the density, λF ∼ n−1/d , relation (1.14) is satisfied typically only in the large n-limit. There, the main contribution to the energy (1.13) stems from the kinetic term in Eq. (1.11). Therefore, the leading error in the local functional (1.14) usually comes from the fact that the Thomas–Fermi approximation [kF (r) ≡ kF (n(r))] ε(k) (1.15) Tˆ ≈ 2 dr |k|≤kF (r)

KOHN–SHAM FORMULATION

•

11

gives only a very poor estimate of the kinetic energy of an inhomogeneous electron gas, even for noninteracting particles. The failure of the Thomas–Fermi approximation is the main reason that orbital-free DFT has a predictive power too limited for most practical demands. The search for more accurate representations of the kinetic energy in terms of n-functionals is at present an active field of research.7,8

1.3 KOHN–SHAM FORMULATION

Better estimates for the kinetic energy can be obtained within the Kohn–Sham formalism.9 One addresses the optimization problem (1.2) by reintroducing an orbital representation of the density with single-particle states, n(r) =

N˜

|φ (r)|2

(1.16)

=1

called the Kohn–Sham or molecular orbitals. The orbitals φ are sought to be ortho-normalized; the parameter N˜ is free, in principle. However, with an eye on approximating the kinetic energy of the interacting system by the energy of the free gas, N˜ is usually chosen to be equal to the number of particles, N˜ = N . With this choice, the optimization problem formally reads 1 ∂ [E0 [n(r)] − ε (φ |φ − 1)] = 0 2 ∂φ∗ (r)

(1.17)

featuring the Kohn–Sham energies (or molecular orbital energies), ε , which play the role of Lagrange multipliers ensuring normalization. Equation (1.17) can be cast conveniently into a form reminiscent of a Schr¨odinger equation of N single particles: [ε(p) + vs (r)]φ (r) = ε φ (r)

(1.18)

where we have employed a substitution (p = −i∂x ), 1 ∂ E0 [n(r)] = [ε(p) + vs (r)]φ (r) 2 ∂φ∗ (r)

(1.19)

which is merely a definition of an auxiliary quantity, the effective potential vs (r). The set of N equations given by Eq. (1.18) constitutes the Kohn–Sham equations. Remarks

•

The Kohn–Sham (KS) formalism should give a much improved description of the kinetic energy, because by construction it reproduces exactly the kinetic energy of the inhomogeneous, noninteracting gas.

12

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

• •

The fictitious KS particles live in an effective potential which modulates their environment such that their density and all related properties coincide with those of a true many-body system. The potential term has a decomposition vs (r) = vex (r) + vH (r) + vXC (r)

•

where the second term includes the Hartree interaction, which for a specific two-body interaction potential u(r − r ) reads vH (r) = dr u(r − r )n(r ). The third term, the exchange–correlation potential , incorporates all the remaining, more complicated many-body contributions. In particular, we have also lumped the difference between the free and interacting kinetic energies into this term. Solving the KS equations requires diagonalization of a KS-Hamiltonian: ˆ + vs (r) ˆ Hˆ KS = ε(p)

•

•

•

(1.20)

(1.21)

The dimension of the corresponding Hilbert space, Nφ , usually exceeds the particle number substantially: Nφ N. Therefore, occupied (real) eigenstates that finally enter the construction of the density [Eq. (1.16)] need to be distinguished from unoccupied (virtual) ones. The selection process follows the variational principle. Similar to the Hartree theory and in pronounced contrast to the Schr¨odinger equation for a single particle, the KS equations pose a self-consistency problem: The potential vs (r) is a functional of n(r), so it needs to be determined “on the fly.” We emphasize that even though the functional vs [n](r) may exhibit a very complicated—in particular, nonlocal —dependency on the ground-state particle density, the effective potential that finally is felt by the KS particles is perfectly local in space. It provides an effective environment for the KS particles, so that the many-body density can be reproduced. The self-consistent field (SCF) problem in DFT is much easier to solve than the Hartree–Fock (HF) equations, which are nonlocal in space and, what is much worse, even orbital dependent. As a consequence of the orbital dependency of the Fock operator, a real HF orbital interacts with N − 1 other real orbitals, whereas a virtual orbital interacts with N real orbitals. The situation in DFT is much simpler in the sense that occupied and unoccupied orbitals all feel the same effective potential vs [n](r). Notice, however, that this computational advantage comes at the expense of the derivative discontinuity, an unphysical feature of exact exchange correlation functionals (see Section 1.5.3) that is very difficult to implement in efficient approximation schemes. Our derivation of the Kohn–Sham equations was tacitly assuming the following: The density of any electron system, including the interacting systems, can be represented in the manner of Eq. (1.16), where the orbitals

WHY DFT IS SO SUCCESSFUL

13

φ are normalizable solutions of a (single-particle) Schr¨odinger equation. Is this really true? The answer is: Not always. That is, systems with degenerate ground states may exhibit a particle density that can only be represented as a sum of independent contributions coming from a number g of single Slater determinants. A general statement that is valid for all practical purposes is that any fermionic density may be represented uniquely as a weighted average of g degenerate ground-state densities of some effective single-particle Schr¨odinger problem [Eq. (1.18)].10,11 1.3.1 Is the Choice of the KS–Hamiltonian Unique?

For an interacting many-body system, splitting between kinetic and potential energy as suggested in Eqs. (1.19) and (1.20) is not as unique as it may appear at first sight. To give a straight argument, recall that the dispersion relation of the free particles, ε(p), can be altered substantially by interaction effects. For example, the mass of the electron describes how the particle’s energy depends on its momentum. In the presence of interactions, an electron always moves together with its own screening cloud, brought about by the presence of other electrons. Although this does not change the wavelength (i.e., the momentum) of the electron, it does change its velocity. It tends to make it slower, so that the “effective” mass increases. Such interaction effects on parameters such as the mass, the thermodynamic density of states, and the magnetic susceptibility are called Fermi-liquid renormalizations. Having this in mind, one could easily imagine another splitting featuring a renormalized kinetic energy, ε∗ (p), which would describe a more adapted description of the dispersion of charged excitations (e.g., the propagation of screened electrons) in the interacting quantum liquid.12 A remaining, residual res interaction, VXC , would appear to be designed so that the ground-state density produced by this effective system would also coincide with the true density. Such a renormalized splitting is rarely employed in practice, perhaps because a good approximation for the residual functionals is not available. For the effective single-particle problem that yields the exact ground-state density, we conclude that various choices are possible, the choices differing from one another in the dispersion ε(p) that enters the kinetic part of the KSHamiltonian. Very few restrictions on the possible functional forms of ε(p) exist; the parabolic shape and the trivial form ε ≡ 0 (with proper readjustments of vXC ) are just two choices out of many. 1.4 WHY DFT IS SO SUCCESSFUL

The precise dependency of the exchange–correlation potential vXC on the density n(r) is not known. In the simplest approximation, the local density approximation (LDA), one takes for vXC the result obtained from the homogeneous electron gas [Eq. (1.12)], but replacing the homogeneous density with n(r) (see Section 1.2.2).

14

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Remarks • The universal success of DFT in chemistry and condensed matter physics came with the empirical finding that the combination of KS theory with LDA (and its close relatives) works in a sufficiently quantitative way to make it possible to calculate ground-state energies (and hence to determine molecular and crystal structure) even outside the naive regime of the validity of LDA as given by relation (1.14). This is due to a cancellation of errors in the kinetic and exchange correlation part of the KS-Hamiltonian (1.21).13 • In analogy with Hartree–Fock theory, a fictitious “KS–ground state” wavefunction, , is often considered. It is constructed by building a Slater determinant from the real KS orbitals. In contrast to HF, this state is not optimal in an energetic sense. It does, however, reproduce the exact particle density. In the same spirit, KS energies are often interpreted as single-particle energies, even though from a dogmatic point of view there is no (close) connection between the Lagrange multipliers and the true many-body excitations; indeed, to the best of our knowledge, a precise justification of this practice has never been given. Still, the pragmatic approach has established itself widely, since it often gives semiquantitative estimates for Fermi-liquid renormalizations, which are important, for example, in band structure calculations. • The implementation of efficient codes is much easier in DFT than in HF theory, due to the fact that functionals are only density and not orbital dependent. For this reason, many powerful codes are readily available in the marketplace. • At present, because of the virtues noted above, DFT is by far the most widely used tool in electronic structure theory (lattice structures, band structures) and quantum chemistry (molecular configurations), with further applications in many other fields, such as nuclear physics, strongly correlated systems, and material science. 1.5 EXACT PROPERTIES OF DFTs

Since there is no analytic solution of the general interacting many-body problem, it is not surprising that exact statements about exchange correlation functionals are scarce. Precise information is, however, available in the presence of an interface to the vacuum. Imagine a situation in which a molecule or a piece of material is embedded in a vacuum. The material is associated with an attractive KS potential “well,” vs , which binds N electrons to the nuclei (or atomic ion cores). Outside the material, the binding potential and the particle density rapidly approach their asymptotic zero values. Exact information is available about how the asymptotic value is approached.

EXACT PROPERTIES OF DFTs

15

1.5.1 Asymptotic Behavior of vXC

Consider the Hartree term vH (r) =

occ

dr u(r − r )|φ (r )|2

(1.22)

=1

in the KS equations [ε(p) ˆ + vex (r) + vH (r) + vXC (r)]φ (r) = ε φ (r)

(1.23)

It contains at = a piece u(r − r )|φ (r )|2 , which incorporates an interaction of a particle in the occupied orbital φ with its own density. This spurious, nonphysical interaction is known as a self-interaction error. In principle, it should be eliminated by an counterpiece contained in the exchange part of vXC .† The construction and application of empirical corrections for this effect are the subject of Chapter 14. The Hartree term is known exactly in the asymptotic region. This is the reason that it is possible to draw a rigorous conclusion about vXC . To be specific, we consider the case of Coulomb interactions. In the asymptotic regime a distance r away from the materials center, where the particle density is totally negligible, all spurious contributions made by an occupied orbital add up to e2 /r. To cancel this piece we must have vXC (r) → − r→∞

−αN−1 e2 + + ··· r 2r 4

(1.24)

whenever the particle density vanishes. The correction term, which we have also given here, describes the polarizability, αN−1 , of the many-body system (with N − 1 particles). This term incorporates the interactions with the fluctuating charge density of the mother system that particles feel when they explore the asymptotic region. † This cancellation may be seen explicitly within the Hartree–Fock approximation. That is, the interaction term reads

σ =↑,↓

dr u(r − r )φ∗ σ (r )[φ σ (r )φσ (r) − δσσ φ σ (r)φσ (r )]

so that the piece with l = l, σ = σ in the first (Hartree) term is eliminated by a corresponding piece in the second (Fock) term.

16

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Remarks

• •

A more intuitive way to rationalize the leading asymptotics of vXC is to recall that an electron that makes a virtual excursion from its host material into vacuum still interacts with the hole that it leaves behind. The first term in Eq. (1.24) describes the interaction with this virtual hole. Both terms appearing in Eq. (1.24) are not recovered in local approximation schemes, such as LDAs and generalized-gradient approximations (GGAs), which stipulate the form vXC (r) ≈ vXC (n(r), ∇n(r), . . .). The statement is obvious, because the density is exponentially small in the asymptotic region (see Section 1.5.2), whereas the potential (1.24) is not. This defect has very serious consequences, since the van der Waals dispersion interactions, vXC ∼ −αN−1 /r 4 , ignored in LDAs and GGAs, provide the dominating intermolecular forces that prevail, for example, in biochemical environments. To address this problem, Grimme14 has proposed an ad hoc empirical procedure that adds a long-range term to standard energy functionals. The functional contains specific parameters, essentially modeling the local polarizability of single atoms or molecular groups chosen so that a rough description of the van der Waals interaction is retained.

1.5.2 Workfunction

Now, consider the KS potential well in its ground state with N occupied bound orbitals φ. Generically, every such orbital contributes to the particle density n(r) at a point r unless it happens that φ has a node there: φ(r) = 0. This is also true in the asymptotic region far away from the well’s center. However, in this region the state φHOMO with the largest KS energy [highest occupied molecular (or material) orbital (HOMO)] gives the dominating contribution almost everywhere (i.e., at all points where |φHOMO (r)|2 > 0). It is easy to see why this is. In the asymptotic region vs (r) decays in a power-law manner with the distance r from the well’s center (Fig. 1.2). Therefore, the KS equations read −

2 2 ∂ (rφ ) = ε (rφ ) 2m r

(1.25)

where ε < 0 denotes the ionization energy of a bound KS state. The solution is φ ∼

1 −√2m|ε |/2 r e r

(1.26)

so that generically the HOMO orbital has the smallest KS energy by modulus, |εHOMO |. At large enough distances, it will give the only relevant contribution. [Exceptions to the rule occur only in the case of a vanishing prefactor not written in Eq. (1.26).] For this reason, the KS energy of the highest occupied molecular level is actually a physical observable; it gives the ionization energy or workfunction (Janak’s theorem15,16 ).

EXACT PROPERTIES OF DFTs 0

vs W

17

r −e2/r

−|εHOMO|

Fig. 1.2 Effective potential (solid line) near a surface of a simple metal. Surface atoms (dark balls) and the electron liquid (light background) are also indicated.

1.5.3 Derivative Discontinuity

The derivative discontinuity17,18 (DD) is perhaps one of the less intuitive properties that an exact XC potential must exhibit. We discuss it here in some detail, since the fact that local approximations are not capable of capturing it even qualitatively often leads to very important artifacts in the KS spectra which are not a genuine feature of DFT itself but, rather, of the LDA. We will see that the DD is related intimately to the fact that the N (real) particles in a many-body system interact with only N − 1 partners, while an infinitesimal test (virtual) charge in such a system would interact with N (i.e., all the other particles). Since vXC [n] has access to the total density only, it cannot easily distinguish real and virtual orbitals with their different interacting environments (as HF does). It turns out that the way DFT implements such behavior is via a very sharp (i.e., nonanalytic) behavior of vXC [n] on the particle density n(r). 1.5.3.1 Isolated System Consider an isolated quantum dot, such as a single atom or a molecule, with N electrons. The corresponding KS system exhibits a number of N KS particles that occupy the N lowest-lying KS states. It is important to recall that each KS particle interacts with the total charge density, vXC [nN ], only, including the density contribution that comes from itself. In this respect, KS particles are fundamentally different from physical particles, which do not interact with themselves, of course. Next, add one additional particle, the excess charge, δN = 1; to be specific, put it into the lowest unoccupied molecular orbital (LUMON ). The new XC functional of the “anion” will be vXC [nN+1 ]. What are the consequences of charging for the KS energies? Due to the change nN → nN+1 , every original particle interacts with one more charge, δN , the excess particle in the LUMON . Therefore, the energy of every one of the first N orbitals shifts by the amount U , which measures the interaction with the excess particle (see Fig. 1.3). Notice also that the energy of the LUMON (now, better, HOMON+1 ) has shifted by U after it was occupied. This is because in KS theory, all orbitals, occupied and unoccupied, are calculated in the same effective potential.

18

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

U

HOMON+1

LUMON

HOMON

Fig. 1.3 Evolution of the energy of KS-frontier orbitals with increasing electron number from N (left) to N + 1 (right). The KS-LUMON jumps upon occupation by an amount U . By contrast, in Hartree–Fock (HF) energy the HF-LUMON is already calculated anticipating an interaction with one more particle (as compared to HF-HOMON ). Therefore, such a jump does not occur in HF theory.

So far, no peculiarities have appeared. To see that there is indeed something looming on the horizon, now add a fractional excess charge, say an infinitesimally small one, δN ≪ 1, rather than an integer charge. Then the original KS orbitals should remain invariant by definition, since the perturbation is infinitesimally small so that the charge density is not disturbed. But, what are the energy and shape of the newly occupied orbital? The salient point is that a real particle does not interact with itself. Therefore, the energy of a physical orbital should not be sensitive to its occupation. Hence, the workfunction of an atom with a fractionally occupied HOMO is the same as that of one with an integer occupation. We conclude that the fractionally occupied orbital must have the energy HOMON+1 , which exceeds the energy of the empty orbital LUMON by the amount U . So evolution of the energy of HOMON+δN with δN is not smooth; an arbitrarily small change in the density, δN , must result in a finite reaction of vXC [n] if the particle number, N , is near integer values: δEXC [n] δEXC [n] − (1.27) XC (r) = δn(r) N+δN δn(r) N−δN This is the (in)famous derivative discontinuity (DD). 1.5.3.2 Coupled Subsystems (Partial Charge Transfer) To illustrate the importance of the DD, we now give a typical example where fractional charge occurs.

TIME-DEPENDENT DFT

19

Consider two subsystems, which are partially decoupled in the sense that electronic wavefunctions interact only weakly. Such could be, for example, two functional groups in the same molecule or two neighboring molecules in a biological environment. To be specific, we imagine here the atom from Section 1.5.3.1 and a second many-body system, a metal surface. Each system has its own workfunction: for example, WAN+1 > WS . Let us bring the atom into the vicinity of the surface, but keeping their distance d extremely large. Since only the total particle number N = NA + NS is conserved, there will be a net exchange of charge, δN , between S → A. This implies that the atomic orbitals acquire a finite broadening, , which however is small, |WAN+1 − WS |, since d is large. In this situation and in the absense of ionization, the net particle flow from S → A is exponentially small. As a consequence, the HOMON+1 fills up, but only with a very small fraction of an electron. A To describe correctly how the HOMON+1 fills upon approach of the two A subsystems, it is crucial that the piece of the XC functional describing A indeed reacts to the flow, so that the LUMON A of the coupled atom is shifted upward against the uncoupled atom by U . If U is on the order of the mean level spacing or even bigger—as it tends to be for nanoscopic systems such as atoms and small molecules—this shift is important for understanding charge transfer in DFT. On a qualitative level, the DD suppresses charge fluctuations between weakly coupled subsystems. Remarks

•

• •

The spatial modulation of vXC induced by the DD reflects the differences in the workfunction seen in different charge states of the isolated subsystems before they have been coupled. Therefore, quantitative estimates about the size of the DD-induced modulations can be obtained by calculating workfunctions of the constituting subsystems and their anions/cations. The DD enters in a crucial way the DFT-based description of the gate dependence of the charge inside a quantum dot. Without DD, the width of the Coulomb oscillations is U rather than max(, T ) and therefore qualitatively wrong.19 In LDA-type approximations the DD is missing, since by construction the potentials evolve smoothly when an infinitesimal probing charge is added. Currently, attempts are under way to design orbital-dependent functionals which can take the DD into account (in a spirit similar to HF theory). K¨ummel and Kronik1 have compiled a review about the most recent developments in this direction.

1.6 TIME-DEPENDENT DFT

Since the 1980s, attempts have been made to generalize equilibrium theory into time dependent phenomena. A detailed account of its foundations may be found in recent monographs.20,21 We discuss only those most basic aspects which are

20

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

important to shed some light on the connection between TDDFT and transport calculations. Consider the time-dependent Schr¨odinger equation ˆ ˆ ˆ i∂t (t) = T + U + Vex + dr φex (rt)n(r) ˆ (t) (1.28) where Tˆ and Uˆ abbreviate the kinetic and interaction energies given explicitly in Eq. (1.2) and, again, ˆ Vˆex = dr vex (r)n(r) describes the electrostatic environment. The time evolution of all observables is fixed by (1) the time-dependent external potential φex (rt) and (2) the initial conditions (i.e., the wavefunction i at the initial time t = 0). This suggests that the response of all those systems, which have been prepared in an identical way and therefore share the same initial state, is dictated by a single scalar field vex (t). In this respect, the situation is very reminiscent of the equilibrium case. To prove also that for time-dependent phenomena the density may serve as the fundamental variable, one should demonstrate that an invertible relation analog to Eq. (1.6) exists, at least in principle, which allows reconstruction of the probing potential φex (t) from knowledge of n(t) (and i ) at all times t ≥ 0. A proof that this indeed is the case for a wide class of potentials φex (t) was constructed first by Runge and Gross22 and corroborated by many later authors, in particular by van Leeuwen.23 1.6.1 Runge–Gross Theorem

The Runge–Gross theorem emphasizes that the time evolution of the density n(t) is a unique characteristic of the probing potential φex (t): Two probing fields, which differ by more than a homogeneous shift in space, invoke two different density evolutions. This insight is then later used to argue that a density profile, n(rt), that is driven in one system with interaction Uˆ by φex (t) can also be seen in another system with a different interaction Uˆ after φex (t) has been replaced by the appropriate modulation φex (t). In particular, Uˆ can also be zero, which is the foundation of the time-dependent DFT. We offer a proof of these statements which relies on the familiar fact that a solution of a partial differential equation (here in time) is unique once the initial situation and the evolution law have been specified. Proof The strategy is to relate the probing field φex to the second time derivatives n. ¨ For the first time derivative, Heisenberg’s equation of motion tells us that

n(rt) ˙ =

1 (t)|[n(r), ˆ Tˆ ]|(t) i

(1.29)

TIME-DEPENDENT DFT

21

because all other terms in Uˆ , Vˆex , and φex commute with the density operator n(r). ˆ By comparing with the continuity equation, n(rt) ˙ + ∂r (t)|jˆ (r)|(t) = 0

(1.30)

one may identify the proper definition of a current density operator, jˆ (r). The procedure is familiar from elementary textbooks on quantum mechanics. The second derivative reads 2 1 (t) [n(r), ˆ Tˆ ], Hˆ (t) (t) (1.31) n(rt) ¨ = i where Hˆ (t) is the Hamiltonian driving the time evolution in Eq. (1.28). This equation is readily recast into the shape δn(rt) ¨ = − dr (rt, r t)∂r φex (r t) (1.32) where we have introduced a correlator, i ˆ (t) jˆ (r ), n(r) (t)

(1.33)

1 ∂r (t) jˆ (r), Tˆ + Uˆ + Vˆex (t) i

(1.34)

(rt, r t) = and the abbreviation δn(rt) ¨ = n(rt) ¨ +

The second term appearing in this expression describes the internal relaxation of the electron system (“gas” or “liquid”; e.g., due to viscoelastic forces). The equal-time commutator in Eq. (1.33) is closely related to the density matrix; in terms of fermionic field operators, one has ˆ † (r)ψ(r ˆ ) + ψ ˆ † (r )ψ(r)|(t) ˆ n(rt, r t) = 12 (t)|ψ so that (rt, r t) =

1 [n(rt, r t)∂r δ(r − r ) − δ(r − r )∂r n(rt, r t)] m

(1.35)

Feeding this expression back into Eq. (1.32) and recalling that n(rt, rt) ≡ n(rt), we recover Newton’s third law, δn(rt) ¨ =

1 ∂r n(rt)∂r φex (rt) m

(1.36)

as we should. Clearly, a spatially homogeneous part of the probing potentials can never be recovered from the density evolution, since such potentials do not

22

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

exert a force. By contrast, the inhomogeneous piece can be reconstructed from its accelerating effect on the density.† Technically speaking, Eq. (1.36) represents a linear, first-order (in space) differential equation for the probing field φex (t). Combining with the Schr¨odinger equation (1.28), i∂t (t) = Hˆ (t)(t) one obtains a system of two linear equations, which are local in time and readily integrated starting from the initial time t = 0. This is how, in principle, the probing field may be reconstructed (up to a homogeneous constant), if only n(rt) is known: n(rt) → φex (rt). Since the other direction, φex (rt) → n(rt), is provided trivially by the Schr¨odinger equation, we readily conclude that φex (rt) ↔ n(rt) Extension So far we have shown how the probing potential φex (rt) can be calculated if the density evolution and the initial state are given. It is also tacitly understood here that the Hamiltonian (i.e., the dispersion, Tˆ , the electrostatic environment, Vˆex , and the interaction, Uˆ ) are known. Their structure cannot be reconstructed with n(rt). In conjunction with Eq. (1.36), this last observation has an important implication. Consider, for example, two systems with two different interactions, Uˆ and Uˆ , and two different initial states, i and i , that both satisfy the con˙ i ), dition that their initial density n(rti ), together with the time derivative n(rt coincide. Under this condition, for both systems an equation of the type (1.36) holds true, since the derivation made no special assumption about the structure of Uˆ . Therefore, for any (reasonable) interaction Uˆ we can find a time-dependent single-particle potential such that the density of the many-body system follows a predefined time evolution n(rt). We can even go a step further. In fact, we have shown how to calculate Uˆ -depending single-particle potentials, vs , such that systems with different interactions can exhibit the same time-dependent density. This means, in particular, that we can model the time evolution n(rt) of interacting systems driven by φex (rt) by studying a reference system of noninteracting particles that experience a particular driving field vs (rt). This field can be constructed from the (invertible) mapping

φex (rt)

Uˆ

↔

Eq. (1.28)

n(rt)

Uˆ = 0

↔

Eq. (1.36)

vs (rt)

(1.37)

at least in principle. Some of the conclusions, which we have arrived at here, were presented earlier by van Leeuwen24 based on the same equations but with somewhat different arguments.‡ statement is true in those spatial regions where the particle density is nonvanishing n(r) ≥ 0. thank G. Stefanucci for bringing Ref. 24 to our attention and for a related discussion.

† This ‡ We

TIME-DEPENDENT DFT

23

Remarks

•

•

By including in addition to the scalar probing potential φex (t) a vector probing potential, Aex (t), and keeping the current density explicit as a second collective field, one can generalize the argument presented above to derive a time-dependent current DFT. A proof in the spirit of van Leeuwen24 has been given by Vignale.25 Exactly the same arguments that have been presented for the case of a single wavefunction (t) also apply to an ensemble of wavefunctions characterized by a statistical operator ρˆ with only minor modifications: (1) quantum mechanical expectation values turn into ensemble averages, and (2) the Schr¨odinger equation is replaced by the von Neumann equation ρˆ =

•

•

i [ˆρ, Hˆ (t)]

(1.38)

This prompts a generalization of TDDFT to finite temperatures. In principle, one can in this way also consider systems with a coupling to a heat bath (e.g., bosons). The only essential modification occurs in Newton’s law, which now needs to account, for example, for a change in the effective dispersion 1/m due to the electron–boson coupling. First attempts to develop a TDDFT for a system coupled to reservoirs have been reported.26 – 28 Notice that the appearance of the gradients in Eq. (1.36) is due to particle number conservation. The reason is that symmetric correlators of the type ˆ n(r (t)|[[n(r), ˆ O], ˆ )]|(t)

•

vanish after integration over one of the spatial coordinates if Oˆ commutes ˆ Nˆ ] = 0. Indeed, in Eq. (1.31) with the total particle number operator, [O, this is the case, because any term in the Hamiltonian commutes with the total particle number operator Nˆ . Hence, such correlators have vanishing (real space) Fourier components at zero wavenumbers, q = 0. Assuming analyticity, we can say that the correlator is proportionate to the product of two wavenumbers, q and q , and for this reason two gradients appear in Eq. (1.36). The validity of time-dependent DFT is based on three elementary observations all of which relate to the fact that (quantum) mechanics is governed by linear differential equations in time: 1. The total force can be deduced from its action on the particle density. 2. This force can be split into an external and internal component; the internal component acting at time t can be calculated knowing just (t). 3. To calculate (t), only forces acting prior to t and the initial conditions have to be known.

24

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

1.6.2 Dynamical Kohn–Sham Theory

The Runge–Gross theorem and its extensions teach us that there is a reference system of noninteracting particles living in a potential vs (rt) [Eq. (1.37)], so that at t > 0 its density evolves in time in exactly the same way that it does for many-body system. The dynamics of this reference system are governed by an effective Schr¨odinger-type equation, the dynamic Kohn–Sham equations. With the decomposition vs = vex + vH + vXC + φex , they read i ˆ + vex (r) + φex (rt) + vH (rt) + vXC (rt)]φ (r) ∂t φ (r) = [ε(p)

(1.39)

where φex (rt) is the time-dependent probing field and n(r, t) =

N

|φ (rt)|2

=1

vH [n](rt) =

dr u(r − r )n(r t)

(1.40)

The functional vXC [n](rt) is the piece of vs [n](rt) that accommodates the interactions beyond the mean field (Hartree) type. It depends on the time-dependent particle density, including its history. Moreover, as a first-order differential equation, Eq. (1.39) needs to be complemented with an initial condition. Part of this is, of course, that n(r, t = 0) coincides with the density of the many-body system at t = 0. However, in addition, the functional vXC will in general also depend on the many-body wavefunction of the initial state, I ≡ (t = 0), which may—but does not have to be—an equilibrium state. 1.6.3 Linear Density Response

Consider a situation where the many-body system is in thermal equilibrium at times t < 0 before the probing field φex (rt) is switched on. Moreover, assume that the perturbation is going to be very weak, so that the requirements for the application of the linear response theory are met. Under this condition, an explicit expression for the XC-functional vXC is readily written down. Indeed, there is a matrix χ(rt, r t ), the density susceptibility, which relates the probing field to the (linear) system response, n = n − neq : (1.41) n(rt) = dt dr χ(rt, r t )φex (r t ) The matrix χ(t, t ) is an equilibrium correlation function of the system, and it therefore depends only on the time differences t − t . We can use its inverse, χ−1 , to define an operator kernel fXC via the decomposition χ−1 = χ−1 KS − fH − fXC

(1.42)

TIME-DEPENDENT DFT

25

The operator χKS describes the density response of the equilibrium KS system, ignoring the feedback of φex (t) into vH and vXC [Eq. (1.39)]; explicitly, χKS (rr z) =

1 f (ε ) − f (ε ) |n(r)| ˆ |n(r ˆ )| ε − ε − z ,

where |, | and ε, denote the unperturbed (φex ≡ 0) KS orbitals and KS energies and z = ω + iη lies in the complex plane. The feedback is then taken into account by fH = u(r − r ) for the Hartree term vH and by fXC for the exchange correlation potential, vXC , in Eq. (1.39). From this point of view it is obvious how to construct the dynamic correction of the XC functional to the equilibrium functional: vXC [n](rt) =

eq vXC [neq ](r)

+

dt

dr fXC (r, r ; t − t )n(r t )

(1.43)

Remarks

•

•

•

We have just constructed a single-particle theory, which has the property that it gives the correct linear dynamical response of the many-body system. The procedure relies on the familiar notions of linear response theory only and does not make reference to the underpinnings of the time-dependent DFT. It is emphasized here that the genuine statements of time-dependent DFT, when applied to systems that are in equilibrium at t < 0, reside in the claim that an effective single-particle description exists even outside the linear regime. Much of the recent improvement29 in quantitative calculations of optical spectra of single molecules is due to including the terms fH and in particular fXC into the analysis (in addition to χKS ), which have often been ignored before. In this way the single-particle spectrum of the bare Kohn–Sham system is dressed so as to produce the correct many-body excitations. Often, the success of this procedure is attributed to the time-dependent DFT. This is misleading, however, since it is merely the consequence of a proper application of the standard theory of linear responses. The best used approximation on fXC is the adiabatic LDA (ALDA). It comprises two steps. First is the adiabatic approximation, ad (rt, r t ) fXC

eq ∂vXC [n](r ) = ∂n(r)

δ(t − t )

(1.44)

n(rt)

This step, by definition, erases all memory effects, so a δ-function in time appears. The complete absence of memory suggests one more approximation, which also eliminates nonlocal correlations in space. This is necessary, because signal propagation occurs with a finite velocity and therefore always

26

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

has a retardation time. Therefore, density fluctuations in different spatial regions cannot be correlated instantaneously. This aspect is built into eq dvXC (n) ALDA δ(r − r )δ(t − t ) (1.45) fXC (rt, r t ) = dn n(rt)

automatically, where in Eq. (1.44) approximant.

eq vXC

has been replaced by its LDA

1.6.4 Time-Dependent Current DFT

The frequency structure of fXC has been worked out in the hydrodynamic regime of small wavenumbers and frequencies by Kohn, Vignale, and co-workers.30,31 It is seen explicitly there that severe memory effects indeed exist due to general conservation laws, which express themselves as singular behavior in correlation functions with respect to wavenumber and frequency. As usual, singularities may be partly eliminated by reformulating in terms of correlation functions of the (generalized) velocities. In the case of the particle density, one introduces the longitudinal current density, j (qω) =

−iω qn(qω)

(1.46)

In this way one absorbs factors q −1 , thus removing nonlocal behavior in the density kernels, which indicates, for example, the slow density relaxation due to particle number conservation. In this spirit the time-dependent current DFT (TDCDFT) was developed.30,31 Apart from the fact that it works with current-density kernels, which are more local than those in TDDFT, TDCDFT offers yet another attraction. In addition to the density [or j , Eq. (1.48)] it also features a second independent collective field, the transverse currents j t . Therefore, TDCDFT can in principle also describe the orbital response to probing vector potentials (i.e., magnetic fields). 1.6.5 Appendix: Variational Principle

Unlike the case with equilibrium theory, a variational principle is not required in order to derive the dynamical Kohn–Sham equations. Still, it is desirable to have a formulation of TDDFT available in terms of an action, for example, because one may hope to be able to calculate vs by performing a functional derivative. In this section we investigate the “naive” trial action ∞ ˆ (t)|(t) ˜ ˜ ˜ dt (t)|i∂ A[] = t −H

0 ∞

= 0

ˆ ˆ ˆ ˜ ˜ dt (t)|i∂ − t − T − U − Vex |(t)

∞

dt

drφex (rt)n(rt) ˜

0

(1.47)

TIME-DEPENDENT DFT

27

˜ which is defined over the space CI of complex fields (t) with constraints given by (1) the antisymmetry requirement in all N coordinates r1 · · · rN , and (2) the ˜ initial condition (0) = I . The solution of the Schr¨odinger equation for a given ˜ external field φex (rt) is the one element (t) of CI that optimizes A[]. In full analogy to the equilibrium case, the functional equation (1.47) can be used as a basis to find an action functional of the density alone by preoptimizing. We first perform a decomposition of CI into subsets; the elements of each subset have the same evolution n(rt). ˜ Second, we find within each one of these subsets ˜ These states form the that are optimal with respect to A[]. those states n(rt) ˜ † ensemble Mpreopt of preoptimized fields. In this way we arrive at an action functional, which is defined on Mpreopt : ˜ = SI [n]

0

∞

dt n˜ (t)|i∂t − Tˆ − Uˆ |n˜ (t)

(1.48)

Sn˜ is the dynamical analog of F [Eq. (1.37)]. The Schr¨odinger time evolution of the density, n(rt), is the single one that optimizes the full action, AI [vex , n] ˜ = SI [n] ˜ −

∞

dt[vex (r) + φex (rt)]n(rt) ˜

(1.49)

0

The variational space associated with this action is spanned by all those n(rt) ˜ ˜ ˜ which are -representable: There is at least one element (t) of CI such that ˜ ˜ n(rt) ˜ = (t)| n(r)| ˆ (t). Remarks

• •

Preoptimizing is a constrained minimum search in the subspace of possible wavefunctions that satisfy the initial condition (2). Therefore, each initial condition carries its own functional: SI [n]. By construction, the search over -representable densities leads to a variational equation, ˜ δSI [n] = φex (rt) + vex (r) δn(rt) ˜ n(rt)=n(rt) ˜

(1.50)

Its solution, n(rt), defines the Schr¨odinger dynamics for the density corresponding to a given probing field φex (rt). A more explicit expression for the left-hand side may be obtained by taking the time derivative and comparing with Eq. (1.36). † With every optimum (t), the related function e iϕ(t) (t) with ϕ(0) = 0 is an optimum, which n˜ n˜ differs by a time-dependent, spatially homogeneous phase shift. The shift merely reflects the necessity to fix the zero of energy. We identify all those states with one another that differ only by a spatially homogeneous phase ϕ(t).

28

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

•

• •

• •

Consider to generate all possible solutions of Eq. (1.50) by scanning through the space of all allowed (i.e., sufficiently smooth) probing fields φex (rt). This subset of the -representable variational space is called v-representable. An arbitrary element of the variational space n(rt) ˜ is certainly -representable but may not be v-representable. The Schr¨odinger dynamics is unitary: N = dr n(rt) is an invariant of motion. v-representable states obey unitarity, but -representable states may not. By taking a functional derivative, ∂φex (r t ) δSI [n] ˜ ∂ = χ−1 (r r, (t − t)) = (1.51) ∂n(rt) δn(rt ˜ ) n=n ∂n(rt) ˜ a relation to the reciprocal of the density correlation function is derived. Note that the ∂ derivative relates to density differences within the set of all n(rt) that are v-representable. Our notation emphasizes this difference with the earlier δ derivative [Eq. (1.50)]. The right-hand side of Eq. (1.51) is subject to causality; the density n(rt) indicates changes in the probing potential φex (rt ) only at later times, t > t . Equation (1.51) pays respect to this asymmetry, since the ∂ and δ derivatives must not be interchanged. The causality issue noted above makes it very obvious that an action principle should not be based solely on the variational space of v-representable histories n(rt). This issue has been discussed in detail by van Leeuwen.23,32 In response, this author derives an action S employing the Keldysh formalism. The procedure by itself does appear to lead to fundamentally new insights. However, it has the charming feature against the naive starting point [Eq. (1.47)] that only one (enlarged) variational space for n(rt) appears. In addition, there is an important conceptual advantage, since—in principle—within this approach it is clear how one can calculate vXC in a systematical perturbation theory.

1.7 TDDFT AND TRANSPORT CALCULATIONS

In this section we discuss the application of TDDFT in the context of charge transport. The focus will be on the dc limit. There are various ways how to formulate the transport problem; we shall elaborate on the consequences from linear response and scattering approaches. We concentrate on the presentation of those elementary facts that are specific of a treatment of transport within the framework of TDDFT. An attempt is being made to be as self-contained as possible. 1.7.1 Linear Current Response

One way to establish a current flow in a system, which initially is in a thermodynamic equilibrium, is to switch on an electric field Eex (rt). This field is not

TDDFT AND TRANSPORT CALCULATIONS

29

the one that an electron feels when it accelerates. The accelerating (local) field, E, also contains an induced component, E = Eex + Eind

(1.52)

We restrict ourselves to initial situations that respect time-reversal invariance. Then the induced field is generated by a shift of charges, e n, under the influence of Eex ; we have Eind (rt) = −∂r dr u(r − r ) n(r t) (1.53) By definition, the conductivity matrix, σij , relates only the total field, E, to the linear response of the current density by ji (rω) =

dr σij (r, r , ω)Ej (r ω)

(1.54)

To make contact to TDDFT, we decompose j into a longitudinal (curl free) piece, j , and a transverse (source free) field, jt . 1.7.1.1 Magnetization (Transverse) Currents By construction, jt incorporates the orbital ring currents that may be understood as a local magnetization density defined via jt (rt) = c∂r × m(rt), where c denotes the velocity of light. Nonvanishing magnetizations occur in equilibrium systems only in the presence of (spontaneously) broken time-reversal invariance. In these cases, the current DFT (CDFT) has to be employed, where the magnetization is explicitly kept as a second collective field in addition to the particle density. We consider here only systems that are invariant under time reversal. Then, ring currents vanish in the initial state, jt = 0. In such systems transverse currents can emerge in the presence of external driving fields.† Since they are not accompanied by density fluctuations, TDDFT does not monitor them. This implies, in particular, that the transverse currents of the time-dependent KS system do not, in general, coincide with the physical magnetization currents. 1.7.1.2 Longitudinal Currents The continuity equation connects j with the time dependency of the particle density. Therefore, the physical longitudinal current density and the longitudinal KS currents coincide. Hence, it makes sense to introduce a conductivity of the KS particles via

ji (r, ω) =

dr σKS,ij (r, r , ω)[Eex + Eind + EXC ]j (r , ω)

(1.55)

† As an example we mention a ring current flowing in a perfectly conducting cylinder that closes around a time-dependent magnetic flux.

30

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Just like physical particles, KS particles do not react to the external field but, rather, to the local field. This field contains the same Hartree-type term that originates from vH in Eq. (1.39) and that was already present for the physical particles [Eq. (1.53)]. However, for KS particles not only vH but also vXC acquires a correction with a change in the density since fXC (r, r , t − t ) =

∂vXC [n](rt) ∂n(r t )

(1.56)

does not vanish [see Eq. (1.43)]. The resulting excess force EXC from this contribution reads (1.57) EXC (rω) = −∂r dr fXC (r, r , ω) n(r , ω) in full analogy with Eq. (1.53). Remark

•

The exchange–correlation field EXC comprises a piece that originates from the adiabatic term given in Eq. (1.44). On the level of the ALDA, we have eq dvXC (n) ALDA EXC (rω) = −∂r n(r, ω) (1.58) dn eq n (r)

In addition, EXC also comprises a second piece, which brings in the viscoelastic properties of the correlated electron liquid. This piece is usually ignored in TDDFT, because it is very difficult to formulate in a purely density-based language. This is not surprising, because the viscosity is intimately related to shear forces within the liquid that derive from mixed terms ∂jx /∂y typical of transverse current patterns. Such forces are more naturally described within time-dependent current DFT.30,31 1.7.1.3 Quasi-One-Dimensional Wire We consider as an illustrative example the dc response of a quasi-one-dimensional wire of length L to an electric field in longitudinal direction, E(r) = ez E(z). The dc current, I , is given by

L

I =

dz gKS (z, z )[Eex + Eind + EXC ](z )

(1.59)

dr⊥ dr⊥ σKS (r, r )

(1.60)

0

gKS (z, z ) =

where it was assumed that the longitudinal field components have negligible variation in the perpendicular wire direction r⊥ . Since any configuration of driving fields has as an associated dc current I that is the same for all observation points

TDDFT AND TRANSPORT CALCULATIONS

31

z, we conclude that the kernel (1.60) is independent of its arguments and define a KS conductance: GKS = gKS (z, z ).

L

I = GKS

dz [Eex + Eind + EXC ](z )

(1.61)

0

The first two terms in the integral add up to the physical voltage drop, V , along the wire. The appearance of the third term indicates that the KS particles experience another voltage, which differs by the amount

L

VXC =

dz EXC (z )

(1.62)

0

Remarks

•

The ALDA contribution to the effective driving field is conservative, so it may be written as a gradient of a potential,

L 0

•

n(L) eq ALDA dz EXC (z ) = −vXC (n(z))n(0)

As long as observation times are considered such that the effect of the charge transfer on the local charge density is still negligibly small (long wire limit), we can take n(L) = n(0), so that the ALDA contribution vanishes (for macroscopically homogeneous wires). Nonzero contributions to VXC come from the viscous term. The viscosity tends to reduce the response of the electron liquid to external forces. Density functional theories take this behavior into account by “renormalizing” the true forces with EXC . On a very qualitative level, the viscous forces tend to hinder the current flow through narrow constrictions with “sticky” walls. For this reason, their effect has been investigated in the context of current flows through single molecules.33 However, as pointed out previously19 (and what underlies the debate34,35 ), borrowing concepts from hydrodynamics to apply them on the molecular scale is not straightforward—for example, the viscosity: This describes how much momentum is transferred per time from a fast-moving stream to a neighboring one that flows into the same direction but with a lower speed. On a microscopic level, momentum exchange is mediated via collisions between the flowing particles. Therefore, it is clear that a description in terms of the macroscopic parameter “viscosity” can be valid only on length and time scales that substantially exceed the interparticle scattering length and time. Both scales become very large in fermion systems at low temperature, and in particular can easily exceed the dimensions of those atomistic or molecular systems that one would like to treat. Applications in mesoscopic semiconductors enjoy a much better justification.

32

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

1.7.2 Scattering Theory

The linear response theory is a framework for calculating the dynamical reaction to linear order in the probing field of any many-body system. Its advantage is that it is completely generally applicable. For the same reason, situations are easily identified, where alternative formalisms are better adapted and therefore allow a simpler and more transparent analysis. In this section we consider an example thereof—the dc transport through a quantum dot (e.g., a molecule) which has been wired to a left and a right reservoir (see Fig. 1.4). We consider quasi-one-dimensional well-screened wires, so that particles inside the wire do not interact with each other. The traveling waves along the wire are categorized by scattering states. Each such state is equipped with a continuous longitudinal degree of freedom associated with a wavenumber, k, a discrete transverse degree, the channel index n [which should not be confused with the particle density n(r)], and a dispersion relation En (k). In this language the current flowing through the wire is described by a superposition of scattering states. How the particles that enter the wire from a reservoir distribute over the available scattering states is dictated by distribution functions, fL,R (E), which are properties solely of the left and right reservoirs. The specifics of the quantum dot enter the construction of the scattering states in terms of the reflection and transmission coefficients, r˜nn (E, E ) and t˜nn (E, E ). They describe the probability amplitude for a particle that approaches the quantum dot with energy E in channel n to be either reflected or transmitted into the channel n with energy E . 1.7.2.1 Landauer Theory The scattering description is particularly convenient if scattering is elastic, so in each single scattering process the state of the quantum dot is preserved; in particular, each scattering event conserves the energy of the incoming particle, E = E . Under this specific condition, the current is simply given by the Landauer formula, (1.63) I = dE T (E)[fL (E) − fR (E)]

n

n′

k

t k

k′

r

Fig. 1.4 (color online) Wiring a molecule to source and drain reservoirs: scattering states description with longitudinal (k) and transverse (n) quantum numbers.

TDDFT AND TRANSPORT CALCULATIONS

33

with a transmission function T (E) =

|tn n (E)|2 ≡ Tr tt †

(1.64)

nn

where tn n = t˜ν ν (vν /vν )1/2 , with vν = ∂εν (k)/∂k being the group velocity of particles traveling in channel n with energy E. Here we follow the common convention that each reservoir acts as a thermal bath characterized by a temperature and an electrochemical potential, μL,R . Then the distributions fL,R are simply Fermi functions with bath parameters. 1.7.2.2 Scattering Theory and TDDFT: Relaxation Problem Scattering theory describes a nonequilibrium situation that is (quasi-)stationary in time. Even though a current flows, expectation values of local (intensive) operators, in particular of jˆ(r) and n(r), ˆ are time independent.† By contrast, TDDFT has been developed to describe the time evolution of the density, n(rt), under the action of a time-dependent potential, φex (t), away from some initial condition. Both approaches may apply simultaneously if in the course of time evolution a quasistationary nonequilibrium situation develops.36 – 38 This can happen if the superposition of φex (t) and the induced field, vind (t), shifts the electrochemical potentials of the two reservoirs against each other:

[vex (rt) + vind (rt)]RL

→ μR − μL

tτtrans

(1.65)

Then, after waiting a time τtrans in which transient dynamic phenomena have died out due to internal relaxation processes, a flow may establish that indeed it is quasistationary. The current will be monitored properly by TDDFT, since it equals the flux of particles out of one of the reservoirs: I = N˙ L = −N˙ R . In this quasistationary regime, by definition the particle and current densities are time independent. One might then suspect that the KS potentials should also have become stationary. This point is perhaps not quite as obvious as it might look. Namely, the fact that the density is time independent by itself does not always imply that the Hamiltonian is stationary. For example, homogeneous ring systems that close around time-dependent fluxes can exhibit time-dependent ring currents that leave the density completely invariant. To exclude such artifacts, one can operate with probing fields φex (t) that couple to the density itself and that become time independent after switching on. Then, at least in the linear response regime, functionals are guaranteed to become time independent, since they derive from linear-response kernels [Eq. (1.43)] (see the remark below). Once we accept that potentials become stationary, we may define scattering states. However, whether this concept is useful or not depends on whether one † We are assuming here that the reservoirs are ideal. They remain in thermodynamic equilibrium with fixed temperature, chemical potential, and so on, even in the presence of a current flow. In reality, this condition requires a separation of scales: macroscopic reservoirs and microscopic currents.

34

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

can identify the rules pertaining to how the physical current should be constructed from them. Whether or not the same rules apply for the KS scattering states of TDDFT that work for the truly noninteracting case is not a priori clear, however. Indeed, after switching on the bias voltage, V , the workfunction of each reservoir shifts against the vacuum level. Apart from this effect, each reservoir stays in complete thermal equilibrium due to their macroscopic size each all the time. According to the general principles of the DFT outlines in earlier sections, the distribution function of KS particles inside each reservoir should still be given by fL,R with the appropriate chemical potentials μL,R and eV = μL − μR , as usual. This was the point of view that has been adopted elsewhere.36 However, this conclusion is not fully consistent with a result that we derived above. Namely, as we have seen in the linear response theory, the KS voltage does not in general coincide with the difference of the reservoir workfunctions. This effect has been incorporated37,38 using Fermi functions with chemical potentials that do not coincide with physical values. Here it remains an open question as to how this finding could be reconciled with the requirement that each reservoir must stay in its own equilibrium. This apparent inconsistency of DFT-based scattering theory at the moment is seemingly unresolved. Remarks • The precise conditions under which a nonequilibrium current flows in a quasistationary manner are very difficult to state. That flow at small enough currents is always quasistationary is supported by linear response analysis. It suggests (1) that linear responses to a sufficiently weak field never mix frequencies (i.e., they simply follow the external stimulus in time). Furthermore, (2) slow-enough driving fields, ωτtrans 1, signalize the dc behavior. So, combining (1) and (2), one concludes that the linear regime should always be quasistationary. • A breakdown of the quasistationary regime at sufficiently large currents is suggested by analogy to hydrodynamics as described by the Navier–Stokes equations. Here it is known that a laminar (i.e., quasistationary) regime should be separated from turbulence that develops at larger currents. Since at least on a qualitative level, the micro- or nanoscopic flow of the electron liquid is also a hydrodynamic phenomenon, a “turbulent” regime could exist here as well. This is also supported from the observation that the TDDFT equations are nonlinear in the density and therefore should host chaotic regimes. 1.8 MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM 1.8.1 External and Internal Hilbert Spaces

Scattering theory operates in a basis of scattering states; that is, it uses those quantum numbers that reflect the behavior of wavefunctions in the asymptotic (i.e., free of scattering potential) region of space (the external Hilbert space).

MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM

35

HC HL

u

u

HR

HC

Fig. 1.5 (color online) Partitioning of the scattering zone near a molecule or quantum dot underlying the Hamiltonian equation (1.66).

For some applications, this representation is suboptimal. From a computational perspective, this can happen if the Hilbert space of states in the vicinity of the scatterer (the internal or microscopic Hilbert space) is very large or complicated, so that computations do not allow us to keep explicit track of additional degrees of freedom. For example, if one is to describe the current flow through a molecule (molecular electronics)or a quantum dot, one can keep molecular states that incorporate the molecule itself plus the states of a few lead atoms. The entire contact, which encompasses 1023 atoms, can certainly not be dealt with in a computer. In more technical terms, we consider a partitioning of the system into left and right asymptotic regions, which are connected by a center region as given in Fig. 1.5 and detailed in the Hamiltonian ⎞ ⎛ 0 HL u† (1.66) H = ⎝ u HC v ⎠ 0 v † HR The matrices HL,R comprise all the leads and are macroscopic, whereas HC describes only the scattering region and therefore should have a microscopic size. If HC is still very complicated, a formulation is desired that does not refer explicitly to the external, macroscopic Hilbert space (leads and reservoirs) but just focuses on the internal space. Roughly speaking, one would like to convert the trace over the external, channel degrees of freedom [Eq. 1.64] into another trace, which is only over the internal space of the molecule or quantum dot. A formal way to derive such a representation employs the Keldysh technique, also referred to as the nonequilibrium Green’s function method .39 For noninteracting particles it yields predictions for physical observables which are identical to the scattering theory. Similar to earlier authors,40 we employ the latter method here to derive the key formulas that underlie a great many applications of ab initio transport calculations for nanostructures. 1.8.2 Born Approximation, Tˆ -Matrix, and Transmission Function

Consider the situation where the left and the right leads are decoupled, u = v = 0 at t = 0. As before, we denote their eigenstates by a pair of indices |nk (left) and |n k (right). When contact is established at t = 0, an initial state |nk becomes unstable. It can decay into the state |n k . The rate for this process is given

36

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

to lowest order by the Born approximation, which is equivalent to the familiar “golden rule” when applied to the scattering problem: ˆ 2 τ−1 n n (En (k)) = 2πδ(En (k) − En (k ))|n k |T (En (k)|nk|

(1.67)

Here, we have already refined the bare expression by introducing the Tˆ -matrix , which makes it formally exact. How to relate Tˆ to the original Hamiltonian, (1.66), will be shown in Section 1.8.3. The right-going current injected in this way from a left-hand-side wire state |nk into the right lead is just dk τ−1 n n (En (k))fL (En (k))(1 − fR (En (k ))) n

where fL (En (k)) is the occupation of the initial state and 1 − fR (En (k )) is a measure of the available space in the final state. The total current is the difference between all right- and left-flowing components: (1.68) dk dk τ−1 I =e n n (En (k))[fL (En (k)) − fR (En (k ))] n n

Comparing this expression with the Landauer formula, Eq. (1.63), we conclude that (1.69) dk dk δ(E − En (k))τ−1 T (E) = n n (E) n n

= (2π)2

dk dk δ(E − En (k))δ(E − En (k ))|n k |Tˆ (E)|nk|2

n n

(1.70) =

(2π)2 |n k |Tˆ (E)|nk|2 |v v | n n

(1.71)

nn

where the last line should be complemented with E = En (k) = En (k ). Keeping Eq. (1.64) in mind, we have the identification (up to a phase factor) tn n = √

2π n k |Tˆ (E)|nk |vn vn |

(1.72)

Equation (1.70) has a compact notation if one introduces separate traces TrL,R,C over the Hilbert spaces of HL,R,C : T (E) = (2π)2 TrR [δ(E − HR )Tˆ (E)δ(E − HL )Tˆ † (E)]

(1.73)

MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM

37

1.8.3 Tˆ -Matrix and Resolvent Operator

We now specify how to relate Tˆ to the original Hamiltonian, H , detailed in Eq. (1.66). Our derivation starts with the observation that all information about transport across the center region is encoded in the resolvent operator, G(z) =

1 z−H

(1.74)

Retarded (advanced) operators are defined via Gret (E) = G(E + iη)[Gav (E) = G(E − iη)]; the matrix elements x|Gret,av (E)|x define the Green’s functions.† Actually, we care only for transfer processes, so only those matrix elements n k |G(z)|nk are of interest that connect states in the left and right leads. The corresponding off-diagonal sector of the full resolvent matrix may be obtained from an elementary matrix inversion. Its matrix elements have the property n k |G(z)|nk = n k |gR (z)[v † GC (z)u]gL (z)|nk

(1.75)

The matrix product that appears here inside · · · has the form familiar from the Dyson equation in T -matrix notation41 : G = G0 + G0 Tˆ G0

(1.76)

where G−1 0 = z − H0 is the bare Green’s function in the absence of an interlead coupling, u, v = 0. In Eq. (1.75) the first term in the Dyson equation is missing, since the off-diagonal matrix elements that connect different leads vanish if there is no transmission. Thus it is clear that the desired relation is just Tˆ (z) = v † GC (z)u

(1.77)

with the resolvent operators of the central region and the leads 1 z − HC − R − L 1 gR,L (z) = z − HR,L GC (z) =

(1.78) (1.79)

and self-energies L (z) = ugL (z)u†

R (z) = vgR (z)v †

(1.80)

† The infinitesimal parameter η in Eq. (1.74) shifts the poles of G into the complex plane. In this way it is ensured that the density of states, −(1/π)G(E + iη), becomes a smooth function of energy. Otherwise, the Hamiltonian (1.66) could not model metallic reservoirs, which by definition have a smooth, nonvanishing density of states near the Fermi energy.

38

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

Notice that G and R,L act on the Hilbert space of HC only, whereas gR,L acts on the spaces of HR,L . With this result, we can rewrite Eq. (1.73), av T (E) = TrC [L Gret C (E)R GC (E)]

(1.81)

where we have introduced L = 2πuδ(E − HL )u†

R = 2πvδ(E − HR )v †

(1.82)

ret † so that R,L = −2 R,L . Equation (1.81) is the desired relation. The leads appear only implicitly in the self-energies, L,R ; they have been “integrated out.”

Remarks

•

•

Formula (1.81) is most useful whenever (1) one can give recursive algorithms, so can be calculated without having to deal with the full Hilbert space at a time, or (2) one can design approximations for so that it is not necessary to deal with the Hilbert space of the leads at all. One can argue that simple but highly accurate approximations can indeed be given if HC is “large enough”, (i.e., comprises a sufficiently large part of the leads). Almost all scientific works that perform a channel decomposition begin by rewriting Eq. (1.81), which employs the matrix 1/2

1/2

τ = L GC R

(1.83)

so that by construction, T (E) = TrC ττ† . Authors interpret τ as a transmission matrix and hence identify the eigenvectors of ττ† as the transmission channels. We wish to point out here that this widespread practice has to be taken with a grain of salt. 1. The trace in Eq. (1.81) is over the states of the central region and not over the (transverse) Hilbert space of the leads. Ironically, this is why we have derived it in the first place. Therefore, the matrix product in TrC [· · ·] acts on a Hilbert space that is disconnected from the transverse lead space, where the product tt † that appears in the Landauer formula, Eq. (1.63), lives. Hence, the channels of the leads and the eigenvectors of ττ† have nothing to do with each other. 2. In particular, τ should not be confused with the true transfer matrix t, given in Eq. (1.72). 3. One of the irritating artifacts that an uncontemplated adoption of this practice may prompt is related to the fact that the size of the central Hilbert space is a matter of convention. For this reason, the common channel analysis produces results that cannot be, in general, model † We

have used δ(E) = (i/2π)[G(E + iη) − G(E − iη)].

MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM

39

independent. For example, the number of transmitting states (evanescent and propagating ones) may increase with the Hilbert space size. A more detailed discussion of this and related issues can be found elsewhere.42,43 1.8.4 Nonequilibrium Density Matrix

So far, we have used scattering theory to describe the current flow through a nanojunction or molecule. A very similar analysis allows us to derive even a slightly more general object, the density matrix, ρ(x, x ), in the presence of nonequilibrium. It is a matrix representation of the operator dk |nkr r nk|fL (En (k)) + dk |n k l l n k |fR (En (k )) (1.84) ρˆ = n

n

where |nkr (|n k l ) denote the right (left)-going states emerging from the left (right) electrodes. The diagonal elements are of particular importance, since they give the particle density, n(x) = ρ(x, x), at any position x: dk |x|nkr |2 fL (En (k)) ρ(x, x) = n

+

dk |x|n k l |2 fR (En (k ))

(1.85)

n

In this section we repeat what we did in the previous section for the Landauer formula, but now for the density matrix. We derive an expression that relates those elements of ρˆ from the central Hilbert space only, in terms of GC and L,R alone. Indeed, consider the expression for the equilibrium density per spin inside the central region: neq (x) = dE x|δ(E − H )|xf eq (E) (1.86) Employing a series of standard transformations, which rely upon nothing but the definitions given in the preceding section, we may cast it into a form that is already similar to Eq. (1.85): 1 eq av eq (1.87) dEx|Gret n (x) = − C (E) − GC (E)|xf (E) 2iπ ret av 1 ret eq dE x|Gret =− C (E) L + R GC (E)|xf (E) (1.88) π 1 av eq (1.89) dE x|Gret = C (E) [L + R ] GC (E)|xf (E) 2π

40

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

=

n

+

2 eq dk |x|Gret C (En (k))u|nk| f (En (k))

2 eq dk |x|Gret C (En (k ))v|n k | f (En (k ))

(1.90)

n

The states |nk (|n k ) denote the eigenstates of the left (right) lead in the absence of a coupling, u, v = 0. Comparing Eq. (1.90) with the equilibrium limit of Eq. (1.85), f eq = fL = fR , suggests the identification x|nkr = x|Gret C (En (k))u|nk l

x|n k =

x|Gret C (En (k ))v|n k

(1.91) (1.92)

for point x inside the central region. The educated reader may recognize the relations above as an incarnation of the well-known Lippmann–Schwinger equation. Thus equipped, we rephrase the original expression for the density operator in the following way: dE ret ret av (1.93) [G L Gav ρˆ = C fL (E) + GC R GC fR (E)] 2π C which is valid inside the central region (matrix notation suppresses the argument energy, E). This equation is the main result of the present section. Needless to say, by differentiating off-diagonal elements of ρˆ , the current density and therefore also the Landauer formula may be rederived. 1.8.5 Comment on Applications

By far the largest fraction of the vast body of DFT-based transport literature employs scattering theory in the formulation of the preceding section. The logic is that one solves the KS equations (1.39) with a particle density, n(x), which is calculated from the nonequilibrium density operator (1.93), which also takes the reservoirs into account. The KS-Hamiltonian is then used, in turn, to construct the central Green’s function and finally, also, the transmission function, (1.81), and the current, (1.63). In this final section we comment briefly on several general aspects of this research. Also, practical aspects of applications in spintronics and molecular electronics are highlighted in Chapters 18 and 19, respectively. Transmission functions, T (E), are of interest mostly near the Fermi energy, EF , since one has for the zero-bias conductance, G = T (EF ). In this region, T (E) usually is dominated by the resonances originating from just two (transport) frontier orbitals. Calculations should yield the positions EHo, Lu and the broadenings Ho, Lu of the resonances. In the case of resonances that do not interfere with others (isolated resonances), these parameters may be extracted by simply fitting a Breit–Wigner (Lorentzian) lineshape to T (E). Sometimes more complicated situations exist,

MODELING RESERVOIRS IN AND OUT OF EQUILIBRIUM

41

where electrons can flow through the molecule via different paths that interfere with each other.44 In this case the lineshape is not just a Lorentzian, but may, for example, be of the Fano type. Also, this structure is characterized by very few parameters only, which may be extracted from a suitable fit. The numerical accuracy of both types of parameters, resonance positions and line widths, that one can get from the DFT-transport calculation depends on the approximations made in the underlying exchange correlation (XC) functional, of course. In transport calculations additional complications arise due to the presence of the electrodes (or reservoirs), which make it necessary to find a good approximation for the self-energies R,L . 1.8.5.1 Self-Energies R,L The self-energies are crucial for the calculation of the resonance width. This is obvious, since without them, R,L = 0, there would be no level broadening at all: Each transport resonance would be arbitrarily sharp. Therefore, care is needed with the construction of these objects. However, quite in contrast to a widespread perception in the scientific community, it is not necessary—and in practice not even always helpful—to perform an exact construction of R,L along the lines of Eq. (1.80). This point has been made earlier19,45,46 and we rephrase it here. Consider the KS equation of the central region in the presence of a coupling to the electrodes:

[E − HC − L (E) − R (E)]| = 0

(1.94)

The Hermitian sector of adds to the Hamiltonian HC and therefore shifts the bare eigenvalues of HC . The anti-Hermitian sector, L,R , leads to a violation of the continuity equation; it shifts eigenvalues away from the real axis into the complex plane, thus providing a finite lifetime. The physics that is incorporated in this way is transparent: Any traveling wave that moves toward the interface between the central region and the left and right electrodes will just penetrate it without being backscattered. From the viewpoint of the central system, the interface is absorbing. It is well known since the early days of nuclear physics that proper modeling of absorbing boundaries is via optical (i.e., non-Hermitian) potentials. This is exactly what the self-energy does. With this picture in mind, it is obvious that an interface modeling of L,R with the property that incident waves are fully absorbed will give the same values for positions and lifetimes of transport resonances. Therefore, as long as the boundary of the central region does not itself hinder the current flow, a modeling of in terms of an optical potential will give accurate results. All the material specifics that are contained in the exact L,R matrices can readily be ignored. To meet the condition for simple modeling, in practical terms the central region should comprise pieces of the electrodes that are large enough. Then complete absorption may be achieved with a leakage rate per interface site η that is still sufficiently small, to prevent feedback into the resonance energies.

42

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

1.8.5.2 System-Size Dependency: Separation of Scales To the best of our knowledge, all prominent DFT-based transport codes work with approximated self-energies. Unfortunately, a systematic check of quantitative results on the approximation scheme used is still not a standard procedure. If optical potentials with strength η are employed, the transmission resonances, , that we would ultimately like to calculate should be invariant under a change of η by a factor of 10 or more. The existence of such an invariance is a consequence of a separation of scales. The transport resonances reflect the lifetime of a state located in that subregion (“bottleneck”) of the central region, which determines the resistance (see Fig. 1.5). If the particle has escaped this region, it vanishes into the leads once and for all—in reality. To catch this aspect, the modeling parameter η has just to be big enough to prevent the model particle from returning to the bottleneck. If the size of the central region is taken sufficiently large, much larger than the bottleneck, one can allow for η , and a separation of scales has been achieved. Remark

•

Self-energies, , offer a rich toolbox for including effects of reservoirs with precision without keeping a large number of degrees of freedom explicit in the calculations. Recent applications of the principle describe systems with an inhomogeneous magnetization.47 Also in this context, working with model self-energies rather than (formally) exact expressions proves reasonably accurate and highly useful.48

Acknowledgments

In this chapter I give a pedagogical introduction to the field, which has grown partly out of several lectures given at Karlsruhe University in recent years. This explicit style is at the expense of accounting for a great many interesting developments pursued by many of my colleagues. Therefore, the chapter cannot serve as—and certainly has not been meant to be—a fair and proper review of the field. Finally, it is a pleasure to thank numerous colleagues for generously sharing their insights with me. Most notably, I am indebted to Alexei Bagrets, Kieron Burke, Peter Schmitteckert, and Gianluca Stefanucci for useful discussions that took place over recent years. Also, I am grateful to Alexei Bagrets and Soumya Bera for critical proofreading of the manuscript.

REFERENCES 1. 2. 3. 4. 5.

K¨ummel, A.; Kronik, L. Rev. Mod. Phys. 2008, 80 , 3. Neese, F. Coord. Chem. Rev . 2009, 253 , 526–563. Hohenberg, P.; Kohn, W. Phys. Rev . 1964, 136 , 864. Levy, M. Proc. Natl. Acad. Sci. USA 1979, 76 , 6062. Gunnarsson, O.; Lundqvist, B. I. Phys. Rev. B 1976, 13 , 4274; ibid., 1977, 15 , 6006.

REFERENCES

43

6. Mahan, G. D. Many Particle Physics, Plenum Press, New York, 2000. 7. Parr, R.; Yang, W. Density Functional Theory of Atoms and Molecules, Oxford University Press, New York, 1989. 8. Igor, V.; Ovchinnikov,; Neuhauser, D. J. Chem. Phys. 2006, 124 , 024105. 9. Kohn, W.; Sham, L. J. Phys. Rev . 1965, 140 , 1133. 10. Ullrich, C. A.; Kohn, W. Phys. Rev. Lett. 2002, 89 , 156401–1. 11. Chayes, J. T.; Chayes, L.; Ruskai, M. B. J. Stat. Phys. 1985, 38 , 497. 12. Ho, K. M.; Schmalian, J.; Wang, C. Z. Phys. Rev. B 2008, 77 , 073101. 13. Burke, K. The ABC of DFT, chem.ps.uci.edu, 2007. 14. Grimme, S. J. Comput. Chem. 2004, 15 , 1463. 15. Janak, J. F. Phys. Rev. B 1978, 18 , 7165–7168. 16. Almbladh, C.-O.; von Barth, U. Phys. Rev. B 1985, 31 , 3231. 17. Perdew, J. P.; Parr, R. G.; Levy, M.; Balduz, J. L. Phys. Rev. Lett. 1982, 49 , 1691. 18. Perdew, J. P.; Levy, M. Phys. Rev. Lett. 1983, 51 , 1884. 19. Koentopp, M.; Burke, K.; Evers, F. Phys. Rev. B 2006, 73 , 121403. 20. Dreizler, R. M.; Gross, E. K. U. Density Functional Theory, Springer-Verlag, Berlin, 1990. 21. Marques, M. A. L.; Ullrich, C. A.; Nogueira, F.; Rubio, A.; Burke, K.; Gross, E. K. U., Eds. Time-Dependent Density-Functional Theory, Springer Lecture Notes in Physics, Vol. 706. Springer-Verlag, Berlin, 2006. 22. Runge, E.; Gross, E. K. U. Phys. Rev. Lett. 1984, 52 , 997. 23. van Leeuwen, R. Phys. Rev. Lett. 1998, 80 , 1280. 24. van Leeuwen, R. Phys. Rev. Lett. 1999, 82 , 3863. 25. Vignale, G. Phys. Rev. B 2004, 70 , 201102. 26. Burke, K.; Car, R.; Gebauer, R. Phys. Rev. Lett. 2005, 94 , 146803. 27. D’Agosta, R.; Di Ventra, M. Phys. Rev. B 2008, 78 , 165105. 28. Hyldgaard, P. Phys. Rev. B 2008, 78 , 165109. 29. Onida, G.; Reining, L.; Rubio, A. Rev. Mod. Phys. 2002, 74 , 601–659. 30. Vignale, G.; Kohn, W. Phys. Rev. Lett. 1996, 77 , 2037–2040. 31. Vignale, G.; Ullrich, C. A.; Conti, S. Phys. Rev. Lett. 1997, 79 , 4878. 32. van Leeuwen, R. Int. J. Mod. Phys. B 2001, 15 , 1969. 33. Sai, N.; Zwolak, M.; Vignale, G.; Di Ventra, M. Phys. Rev. Lett. 2005, 94 , 186810. 34. Sai, N.; Zwolak, M.; Vignale, G.; Di Ventra, M. Phys. Rev. Lett. 2007, 98 , 259702. 35. Jung, J.; Bokes, P.; Godby, R. W. Phys. Rev. Lett. 2007, 98 , 259701. 36. Evers, F.; Weigend, F.; Koentopp, M. Phys. Rev. B 2004, 69 , 235411. 37. Stefanucci, G.; Almbladh, C.-O. Europhys. Lett. 2004, 67 , 14. 38. Stefanucci, G.; Almbladh, C.-O. Phys. Rev. B 2004, 69 , 195318. 39. Meir, Y.; Wingreen, N. S. Phys. Rev. Lett. 1992, 68 , 2512. 40. Khomyakov, P. A.; Brocks, G.; Karpan, V.; Zwierzycki, M.; Kelly, P. J. Phys. Rev. B 2005, 72 , 035450. 41. Ferry, D. K.; Goodnick, S. M. Transport in Nanostructures, Cambridge Studies in Semiconductor Physics and Microelectronic Engineering, Cambridge University Press, New York, 1997.

44

PRINCIPLES OF DENSITY FUNCTIONAL THEORY

42. Bagrets, A.; Papanikolaou, N.; Mertig, I. Phys. Rev. B 2007, 75 , 235448. 43. Solomon, G. C.; Gagliardi, A.; Pecchia, A.; Frauenheim, T.; Di Carlo, A.; Reimers, J. R.; Hush, N. S. Nano Lett. 2006, 6 , 2431–2437. 44. Cardamone, D. M.; Stafford, C. A.; Mazumbdar, S. Nano Lett. 2006, 6 , 2422. 45. Evers, F.; Arnold, A. Molecular conductance from ab initio calculations: self energies from absorbing boundary conditions, arXiv:cond-mat/0611401, Lecture Notes, Summerschool on Nano-Electronics, Bad Herrenalb, Germany, 2005. 46. Arnold, A.; Weigend, F.; Evers, F. J. Chem. Phys. 2007, 126 , 174101. 47. Jacob, D.; Rossier, J. F.; Palacios, J. J. Phys. Rev. B 2005, 71 , 220403. 48. Bagrets, A. Unpublished, 2009.

2

SIESTA: A Linear-Scaling Method for Density Functional Calculations JULIAN D. GALE Department of Chemistry, Curtin University, Perth, Australia

This chapter provides a practical overview of the basic theory required to perform density functional calculations on nanoparticles, materials, and large biological systems using the SIESTA program. This program uses discrete atomic basis sets to enable rapid interpretation of results in terms of chemical models, a feature key to many applications, including an understanding of transport properties of materials. It achieves linear scaling (the computer resources required scale linearly with system size for very large systems) using basis set confinement techniques. Many examples of the use of SIESTA are provided in Chapter 11.

2.1 INTRODUCTION

The past two decades have seen the rise of density functional theory (DFT) from a technique largely confined to solid-state physics to arguably the most popular quantum mechanical technique, embraced by chemists, geologists, and most scientific disciplines concerned with the atomic structure of nature. This popularity has arisen largely from its ability to provide a reasonable quality description of properties at a relative modest computational cost in comparison to traditional wavefunction theory–based approaches. Whereas DFT in its purest sense is an exact theory,1 the practical realization through modern functionals is recognized as having several limitations, including the lack of a pathway for continuous improvement of the answers in the manner possible within postHartree–Fock techniques. Despite such caveats, there are many systems for which density functional theory is a valuable and worthwhile approach.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

45

46

SIESTA: A LINEAR-SCALING METHOD FOR DFT

In this chapter we do not set out to critique the use of DFT, but assume that the reader has already studied Chapter 1, which covers this approach to electronic structure theory, and determined that it represents an appropriate choice to solve the problem of interest. Instead, we focus on another aspect of DFT that has led to its widespread use: the plurality of numerical implementations of the method and the availability of efficient software. Because of the focus on the density for the exchange and correlation potentials, which typically represent the most complex contributions to calculate within electronic structure theory, Kohn–Sham DFT has lent itself to a far greater diversity of practical calculation schemes. While wavefunction theory (WFT) has been dominated by the use of Gaussian basis sets to expand the eigenstates (see, e.g., Chapter 5), DFT has seen a plethora of choices, including plane waves (see Chapter 3), Slater orbitals (see Chapter 15), Gaussians, grids, finite elements, and wavelets, to name but a few. Nanoscience has pushed experiments to the lower limits of the length scale for the fabrication of materials. Conversely, for computational methods it has led to push toward calculations with a greater number of atoms than ever before. Given that many nanoscale phenomena are related to the effects of quantum confinement on electronic properties, this has, in particular, driven the desire to perform largescale theoretical studies based on electronic structure techniques rather than forcefield approaches. Although simplified quantum mechanical approaches, such as tight binding (see Chapter 10) or semiempirical (see Chapter 8) methods, have a valuable role to play in this realm, ideally it would be possible to use firstprinciples methods to ensure the reliability of results. In light of the above and the fact that there are many different numerical schemes for density functional theory, it is possible to reconsider the choice of algorithms and ask what represents the optimal implementation for large systems? Although there will never be an unambiguous answer to this question, we can define the key characteristics of any such method. First, the method must scale with the lowest power possible of the size of the system, typically related to the number of basis functions required, N , or number of atoms. Second, the cost per basis function, which represents the prefactor, or slope of the cost versus system size, must be as low as possible. If we consider Hartree–Fock or Kohn–Sham theory specifically, there are two main steps in a calculation: the construction of the Hamiltonian matrix and determination of the eigenstates at a self-consistent field. For small system sizes in a localized basis set, such as Gaussians, the first step is the dominant expense and scales formally as N 4 , since the Hartree energy depends on the interaction of two density contributions and therefore up to four different basis functions. However, this can be reduce to N 3 for Kohn–Sham theory via density fitting.2 In practice, for large systems, the scaling is typically reduced through neglect of terms against a threshold. As system size increases, the solution for the eigenstates becomes the major cost since they must be orthogonalized with respect to each other, which leads to a scaling of N 3 . The key to achieving improved scaling is locality, which is usually considered to be in real space. For example, if an atom were only to interact with other

INTRODUCTION

47

particles out to a given radius, then once the dimensions of the system exceed this cutoff value, the number of interactions per atom remains constant regardless of increasing size. In other words, the total cost will scale linearly with the dimensions of the system. This will be equally true regardless of whether the system is a finite nanoparticle or a periodic solid. This raises the question of whether it is likely to be feasible in electronic structure theory to confine interactions to within a finite range. Given the central role of the long-range Coulomb potential in the Hamiltonian, at first sight it might be thought that this would not be possible. However, through screening, it turns out that even such interactions lead to quite short-ranged behavior in real space, leading to the near-sightedness principle.3 For example, in an insulating or semiconducting material it is known that states decay exponentially with distance, where the rate of decay depends on the bandgap of the substance. Even metals, where there is no gap, exhibit power-law decay behavior. Provided that it is possible to reformulate density functional theory in a way which ensures that both the generation and solution of the Kohn–Sham equations exploits the inherent locality that exists in many systems, it should be possible to achieve linear scaling of the computational expense for large enough problems. The challenge then becomes to lower the prefactor (i.e., cost per atom) sufficiently that the crossover point at which such algorithms become more efficient than traditional ones is as low as possible. Linear-scaling methods will only be of value if this occurs for numbers of atoms that are currently accessible and of interest for scientific study. Although the specific crossover point can vary strongly according to the details of the method, linear-scaling methods typically become competitive with established algorithms for a few hundred atoms in density functional theory. Having set the scene, the objective of this chapter is to present an overview of one approach for achieving linear-scaling density functional theory, known as the SIESTA methodology4 and embodied in the code of the same name. This is just one of several possible methods, and a list of some of the other most widely used candidates is given in Table 2.1. It would take too long to review the relative strengths and weaknesses of each particular implementation. However, the main differences between methods usually involve a compromise between the ability to have a systematically improvable basis set (similar to the manner that is possible with plane waves) and the lowering of the prefactor of the linear scaling, which requires the most compact basis set representation. To place the SIESTA approach in context, it targets the lowest prefactor by using physically motivated basis functions while sacrificing the arbitrary convergence with respect to the size of the basis. The aim of this chapter is to provide a conceptual and practical guide to the use of SIESTA that will be useful to those encountering the program that implements the methodology for the first time. For full mathematical details of the SIESTA methodology we refer the reader to the original manuscripts where this can be found.4 Although the focus will be specifically on SIESTA, it is hoped that an understanding of the motivation and background will also be valuable to those wishing to engage in linear-scaling electronic structure theory, regardless of the particular implementation.

48

SIESTA: A LINEAR-SCALING METHOD FOR DFT

TABLE 2.1 Various Methodologies for Linear-Scaling Density Functional Theory, Classified According to the Nature of the Basis Functionsa Basis Set Gaussian atomic orbitals Gaussians/plane waves Numerical atomic orbitals Blips Periodic sync functions

Implementation

Availability

FreeOn (MONDO set) GAUSSIAN Q-CHEMb QUICKSTEP

GPL Commercialc Commercial GPL

SIESTA PLATO OpenMX CONQUEST

Free to academics Contact authors GPL Contact authors (GPL proposed) Commercial

ONETEP

Ref. 5 6 7 8 9 10 11 12

a

Note that this tabulation aims to highlight the most widely known implementations rather than being exhaustively comprehensive. It is also subject to constant change due to developments in the field. b The construction of the Fock matrix can be linear scaling, but diagonalization is used to solve the SCF. c Features required for a fully linear-scaling calculation may not be available in the distributed version.

2.2 METHODOLOGY 2.2.1 Density Functional Theory

The fundamentals of density functional theory were outlined in Chapter 1, so only a concise statement of the relevant aspects is made here. For the purposes of the present discussion, we focus solely on the Kohn–Sham formulation of DFT, where a set of orthogonal wavefunction-like one-electron states are introduced to facilitate calculation of the kinetic energy, and the exchange-correlation potential is formulated as a local functional of the density and, where appropriate, its curvature. Thus, we will consider the linear-scaling implementation of the local density approximation (LDA) and the generalized-gradient approximation (GGA) formulations of DFT.13 Extension to other forms of approximation, such as metaGGAs,14 hybrid functionals,15 or LDA + U16 is possible, but beyond the scope of the present chapter. 2.2.2 Pseudopotentials

When solving for the electronic structure of a system, in principle, all electrons must be included since they contribute to the potential experienced by other particles and determine the nodal structure of the eigensolutions. In practice, it is intuitive that the core electrons of an atom are weakly perturbed by chemical changes to the geometry and bonding arrangements, in comparison to the valence

METHODOLOGY

49

electrons, and therefore, several approximate methods have evolved to treat these core states in order to reduce computational expense. At the simplest level, the frozen-core approximation can be made in which the occupancy of the core states is fixed to remove them from the self-consistent procedure. Alternatively, the core electrons and nucleus, which have opposite sign charges and therefore partially cancel each other, can be replaced by a combined effective potential, known as a pseudopotential . In brief, the concept of a pseudopotential is that it replaces the exact potential due to nucleus and core electrons, within a given radius of the atomic center, by an effective potential. Within this distance, known as the core radius, the potential is smoothed and tends to a finite value at the nucleus while matching the true potential at the boundary. Due to the smoothing of the potential, the radial nodes of the valence states are lost in the core region since there is no longer a requirement to maintain orthogonality to the core states. In nearly all cases, a nonlocal pseudopotential is used, which implies that there is a different potential for each l angular momentum channel, with a separate core radius, rcore , appropriate to that channel. Outside the core radii, all channels, regardless of angular momentum, experience exactly the same potential, known as the local component. Thus, the nonlocal contribution to the pseudopotential acts only within a small spherical region close to the nucleus. Nonlocal pseudopotentials are most commonly formulated according to the prescription of Kleinmann and Bylander.17 While in many implementations the local component of the pseudopotential is chosen to be one of the angular momentum channels, there is no requirement to do so. Indeed, SIESTA exploits the freedom to select the local component independently and chooses the potential that results from the smooth electron density: sinh(1.82r/rcore ) 2 ρlocal (r) ∝ exp − sinh(1)

(2.1)

The construction of a pseudopotential generally involves satisfying at least four criteria: 1. Boundary matching. Beyond the core radius, the all-electron and pseudowavefunctions must match for each angular momentum channel. 2. Smoothness. Within the core radius, the pseudovalence wavefunction should have no radial nodes. 3. Eigenvalue matching. The eigenvalues for the pseudopotential problem must match the all-electron values for the atomic reference state chosen. 4. Norm conservation. The integral of the valence electron density from the nucleus to the core radius must be equal in the pseudopotential and allelectron cases.

50

SIESTA: A LINEAR-SCALING METHOD FOR DFT

Other conditions may also be imposed; for example, the logarithmic derivative and their first energy derivative may also be required to match outside the core region.18 An all-electron and a pseudo-wavefunction are compared in Fig. 2.1. Although the conditions noted above are necessary for most pseudopotentials, this does not lead to a unique definition of what form the potential should take, so numerous schemes for the generation of pseudopotentials have arisen. In the case of SIESTA, pseudopotentials are usually generated through the use of a separate program known as ATOM, which presently supports three types of pseudopotential; improved Troullier–Martins (TM2),19 Hamann–Schl¨uter–Chiang (HSC),18 and Kerker.20 Of these, the Troullier–Martins scheme has been become the standard choice for use with SIESTA. In the plane-wave community, the use of pseudopotentials is almost mandatory for practical calculations since the effective potential is smoothed out and the nuclear cusp removed, thereby drastically reducing the number of basis functions required to construct the Fourier expansion of the eigenstates. Even when working with localized orbitals there are some benefits to the use of pseudopotentials, aside from the reduction of the number of electrons and orbitals. The core electrons are much more strongly bound than the valence electrons and therefore dominate the total energy. Because electronic structure calculations often rely on computing small energy differences between large total energies, the inclusion of the core electrons can decrease the level of numerical precision in such quantities. Furthermore, as the atomic number of an element increases, it becomes important to correct the calculation for relativistic effects, which most strongly affect the core electrons. Through the use of a pseudopotential it is possible to

Wavefunction

0.6 0.4 0.2 0 –0.2 –0.4

0

1

2

3

4

5

6

Radius (a.u.)

Fig. 2.1 All-electron ( ) versus pseudovalence state (- - -) for the silicon 3s orbital. The core radius for the 3s state is 1.9 a.u. For comparison, a poorly constructed pseudo-3s state (– · –) is included for the case when the core radius is too small (1.1 a.u.), leading to an inner maximum.

METHODOLOGY

51

subsume the majority of the relativistic effects into the effective potential, such that a full relativistic calculation is required for the isolated atom only during generation of the pseudopotential, rather than for the entire problem. Of course, it is important to note that some relativistic effects must be taken into account explicitly when necessary, such as spin-orbit coupling. Recent years have seen a number of developments in the area of pseudopotentials with the advent of the ultrasoft pseudopotential (USP)21 and projector augmented wave (PAW)22 methods. For USPs, the requirement of norm conservation is relaxed and this is compensated for by the addition of an augmentation charge density. The PAW approach focuses on the augmentation of the wavefunction, rather than the density, and thus makes it possible to recover all-electron properties in the frozen core limit. Both methods lead to a dramatic reduction in the reciprocal space cutoff associated with the pseudopotential, which greatly accelerates the computation. In the case of SIESTA, which as we shall see works with real space-localized basis functions, there is likely to be little benefit associated with a switch to either of these more contemporary pseudopotential types, while the complexity of implementation is greatly increased. Consequently, SIESTA continues to employ norm-conserving pseudopotentials, which are generally more robust and easier to construct (see, e.g., an article by Bili´c and Gale23 ). Although it is impossible to give a comprehensive guide to the generation of pseudopotentials, some important general guidelines can be given. 2.2.2.1 Choice of Electronic Configuration When generating a normconserving pseudopotential it is necessary to specify an atomic configuration whose eigenvalues and wavefunctions will be reproduced outside the core region. Usually, this is chosen to be the ground state for the isolated atom. However, for the study of ionic materials there may be merit in using a positively ionized state if this is closer to the real oxidation state of the cation. Although, in principle, a pseudopotential is supposed to be transferable across a range of charge states, it will be more accurately closer to the state for which it is generated. In the case of anions in ionic materials (e.g., the oxide ion), it is not generally a good idea to use the negatively charged state since this will be very diffuse and may be unbound (as is the case for O2− ). 2.2.2.2 Choice of Functional It is important to use the same density functional for generation of the pseudopotential as you intend to employ in the explicit valence calculation. Although the use of an LDA pseudopotential in a GGA calculation can often lead to fortuitously good results with respect to experimental data, it is important to remember that the objective is to reproduce the all-electron limit for a single given functional. 2.2.2.3 Choice of Core Radius The general guiding principle in the choice of the core radius is that a larger radius leads to a softer (and for plane waves, therefore more efficient) pseudopotential, whereas a smaller radius should ensure

52

SIESTA: A LINEAR-SCALING METHOD FOR DFT

greater transferability and reliability. Beyond this broad statement, there are a number of limitations on the upper and lower bounds to the core radius. If the radius becomes too large, there is a risk that the core regions of two adjacent atoms might overlap and this would invalidate the calculation. On the lower bound, the core radius must lie farther from the nucleus than the last radial node of the all-electron wavefunction; otherwise, the removal of nodal structure will not be possible. In practice, making the core radius too small can lead to spurious features in the pseudo-wavefunction, such as inner maxima, due to enforcement of the norm-conversation condition (see Fig. 2.1 for an example of what happens as the core radius becomes too small). The optimal choice for the core radius usually will lie close to the outer maximum in the all-electron wavefunction. With the Troullier–Martins construction scheme, the core radius can lie outside the maximum, and the wavefunction will still be well reproduced beyond the turning point. 2.2.2.4 Choice of Core–Valence Split For many elements, especially those toward the right-hand side of the periodic table, there is no ambiguity as to the valence electrons of an atom. However, for quite a large number of elements there may be cause for careful consideration, depending on the material to be studied. For example, aluminum has the electron configuration [1s2 2s2 2p6 ]3s2 3p1 , where the brackets delimit the conventional core electrons. If one were to perform a study of aluminum nanoparticles, for example, only including the 3s and 3p electrons in the valence would be a reasonable choice, since the atom is close to the charge neutral state. However, if one were instead to study the material Al2 O3 , where the nominal oxidation state is Al(III), the 3s and 3p electrons have been largely ionized. Here the 2p electrons then become the highest occupied state of aluminum, and the conventional choice of valence would lead to a poor pseudopotential description. For elements toward the beginning of a new block of the periodic table, it is therefore necessary to modify the pseudopotential choice to allow for these semicore states. 2.2.2.5 Evaluating Pseudopotential Accuracy A good indicator as to whether semicore states need to be included is whether there is any significant overlap between the electron density of the valence and core electrons (see Fig. 2.2, which shows the case of Fe where there is significant overlap between the 4s/3d states and the underlying 3s/3p). There are two common methods for handling semicore states; either the electrons can be explicitly included in the calculation, or partial core corrections can be applied.24 Partial core corrections, also known as nonlinear core corrections, aim to correct for the fact that exchange-correlation potential depends on the total electron density and is therefore not readily separable into core and valence contributions if there is any overlap of the density between regions. To handle this, partial core corrections operate by including a smooth piece of frozen electron density that matches the exact core density down to a given radius and then tends smoothly to zero at the nucleus. This density is then added back during calculation of the exchange-correlation potential to capture the nonlinearity in the region of density overlap. Note that this extra density

METHODOLOGY

53

35 AE core charge AE valence charge PS core charge PS valence charge

30 25 20 15 10 5 0

0

0.5

1

1.5

2

2.5

3

Fig. 2.2 Electron density for an iron atom, showing the all-electron curve (core contribution in - - - and valence in – – -), the valence-only contribution from the pseudopotentialgenerated orbitals ( ), and the partial core correction density (– · –) as a function of radius (in a.u.). Note the overlap between the core and valence densities in the region between 0.2 and 0.7 a.u. that leads to the need for partial core correction.

is not included in the norm-conservation requirement of the pseudopotential. The choice of the radius for the partial core corrections is a compromise between being small enough to describe sufficient core electron density and large enough to minimize the computational work associated with evaluating accurately the exchange-correlation potential for the combined density. While for plane-wave methods the use of partial core corrections is often the preferable approach to semicore states since it reduces the size of the basis set significantly, for the SIESTA method the two approaches are similar in cost, and therefore the use of explicit semicore states may be favored. Having generated a new pseudopotential and inspected its properties visually to check that there are no untoward characteristics, the next important step is to test it by comparing the energies for changes in atomic state between the all-electron- and the pseudopotential-based calculation. Configurations for testing might usually include ionization from the various valence orbitals, as well as promoting electrons from one angular momentum to another. If the pseudopotential passes this examination, it is ready for validation in a full calculation of a molecule or solid. 2.2.3 Basis Sets

Numerical solution of the Kohn–Sham equations is performed by expanding the orbitals or bands in terms of a computationally convenient mathematical function: the basis set. The coefficients that determine how much these functions contribute

54

SIESTA: A LINEAR-SCALING METHOD FOR DFT

are found by applying the variational principle. As mentioned in the introduction, there are many possible choices that could be made for the basis set, although Gaussians25 have dominated the molecular community while plane waves have been the de facto standard for solid-state physics. In choosing the optimal basis set for large linear-scaling calculations, we are guided by the need for locality in real space and the requirement to minimize the number of basis functions needed to obtain reasonable numerical precision. Clearly, a physically motivated basis set that takes into account the shape of atomic orbitals will best satisfy the latter criterion. If pseudopotentials of the form described in the preceding section are employed, then neither existing Slater, or Gaussian, basis sets will be of the correct form, due to the modification of shape in the nuclear region. Taking the discussion above into account, it can be seen that the optimal compact basis set is to work with exact solutions to the pseudopotential form of the atomic problem, provided that they can be represented. Following the approach taken by other researchers, such as Becke and Dickson26 in the NUMOL code and Delley27 in DMol, the basis set can conveniently be represented by a numerical tabulation rather than a specific, but approximate, analytical form. In the SIESTA methodology, the standard choice of basis set is pseudoatomic A for atom A, which are tabulated on a logarithmic radial orbitals (PAOs), ϕnlm grid for each angular momentum and then multiplied by the appropriate spherical harmonics: A A ϕnlm (r, θ, ϕ) = Rnl (r)Ylm (θ, ϕ)

(2.2)

These PAOs can be determined conveniently during generation of the pseudopotential and represent a “perfect” basis set for describing the isolated atom. While the PAOs above decay rapidly with distance, as do other atomic-centered basis functions, they only tend asymptotically to zero at infinite radius. To achieve linear scaling it is necessary to impose on the Hamiltonian strict locality in real space. The most common approach to achieving this is to introduce a drop tolerance in some form and to neglect integrals when they fall below a certain magnitude. However, this is fundamentally unappealing since it corresponds to modifying the Hamiltonian being solved, although this may be a philosophical point rather than a practical difficulty. In the SIESTA methodology, an alternative approach is taken in which the basis functions are localized rather than modifying the Hamiltonian. Following the fireball concept of Sankey and Niklewski,28 the eigenfunctions of the pseudoatomic problem are found within the confines of a spherical boundary at which the potential becomes infinite. In this way, the tails of the PAOs are modified such that they go rigorously to zero at a given radius, as shown in Fig. 2.3. This radius, rc , can be selected to be different for each angular momentum. Radial confinement is clearly an approximation, but it allows a choice to be made readily between higher precision, corresponding to large rc , or greater computational efficiency as the radius decreases. Although there is the flexibility

METHODOLOGY

55

1.4

Wavefunction

1.2 1 0.8 0.6 0.4 0.2 0

0

1

2 Radius (a.u.)

3

(A) 0.1 0.09 0.08 Wavefunction

0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 2.5

3 Radius (a.u.)

3.5

(B)

Fig. 2.3 (A) Pseudoatomic orbitals (PAOs) for oxygen 2s, illustrating the shape for the ), hard confinement with an energy shift of 0.02 Ry (- - -), and unconfined orbital ( soft confinement with an energy shift of 0.02 Ry, a potential V0 of 50 Ry, and a radius of soft confinement commencement of 0.8 times the hard confinement radius (– · –). (B) Close-up of the region where the confined orbitals approach the cutoff radius of 3.2 a.u.

to choose an individual radius for each orbital in the valence of every atom, it is preferable to have a more systematic method for selecting radii. Choosing a single fixed radius of confinement for all atoms is obviously not a sensible approach, since atoms with different atomic radii will be affected to varying extents. Hence, the calculation would be biased toward the precise description

56

SIESTA: A LINEAR-SCALING METHOD FOR DFT

of light atoms. When an orbital is radially confined, its energy increases with respect to the free atom. Therefore, a natural concept to aid in the selection of appropriate radii is the energy shift. Here, a single energy value is specified for all atoms and the radius of confinement found that raises the energy of each orbital by this amount. Typically, energy shifts in the range 0.001 to 0.02 Ry (1 Ry = 0.5 Ha ∼ 13.6 eV) are useful depending on whether precision or speed is being sought, respectively. As with all approximations, it is important to test the consequences of a given choice for the specific property of interest before proceeding. Although the default energy shift–based scheme provides a good first estimate of the radii in many cases, there are alternative approaches to refining the truncation of the orbitals. 2.2.3.1 Soft Confinement In the default confinement scheme the orbital goes to zero at the cutoff radius. However, there is a discontinuity in the derivatives of the orbital, which can lead to difficulties during structural optimization and more acutely during phonon calculations. The solution to this problem is to use a potential that tends asymptotically to infinity in a smooth manner rather than applying a discontinuous hard-wall potential.29 The form of the potential currently used is

Vsoft (r) = V0

e−(rc −r)(r−rs ) rc − r

(2.3)

This introduces two new parameters that determine the shape of the basis set tail by determining the radius at which the potential begins, rs , and the magnitude, V0 . 2.2.3.2 Basis Set Enthalpy In a further alternative scheme, an external pressure, Pext , can be applied to the atomic orbitals. This leads to determination of the radii through the associated enthalpy by adding a Pext V term to the intrinsic energy, where V represents the volume of the confinement sphere.30 Under this scheme, the confinement radii now correspond to equal hardness among the basis functions, rather than energy perturbation.

Occasionally, it may be beneficial to intervene manually in the choice of radii. For example, in the case of negatively charged species such as the oxide ion, which is nominally O2− , the radii determined by typical energy shift values as being appropriate for a neutral oxygen atom may be too confined to allow a good description of the anion in an ionic crystal. Although the formulation of PAOs above provides a good starting point for a basis set, it is well known that increased variational freedom is required to allow the system to respond to the changes associated with chemical bonding, external fields, or other perturbations to the electronic structure. In the Gaussian community this is achieved through the use of multiple-zeta basis sets, where one or more Gaussians (usually, the outermost function) is decontracted from the

METHODOLOGY

57

Slater-type orbital to allow the effective atom size to respond to its environment. When working with a numerical representation of the valence orbitals on a radial grid, there is no equivalent means of creating distinct “zetas.” Indeed, there is the flexibility to choose any arbitrary partitioning of the valence orbital into multiple components. From experience it is known that the objective is to allow the outermost part of the radial function to vary independent of the inner part while maintaining the smoothness of the basis functions. In the current SIESTA methodology, the division of the radial function into multiple components is achieved using the split-norm concept. Here a second, or higher, radial function is designed to pos1ζ sess the same tail as the full valence orbital, ϕl , outside a split radius, rs , while inside this value it decays according to a polynomial to be zero at the nucleus: r(a1 − b1 r 2 ) r < rs 2ζ (2.4) ϕl (r) = 1ζ ϕl (r) r ≥ rs The polynomial coefficients are determined by matching the function and its derivative at the split radius. If this new function is subtracted from the original valence orbital, the result is a contracted basis function that goes to zero at the split radius. Motivated by similar arguments to the use of the energy shift, the split radius is usually chosen indirectly by specifying the norm of the valence state to be included in the outer function. Typically, an outer zeta should contain on the order of 15% of the total norm. For hydrogen, in a double-zeta basis set, a value closer to 50% can prove more effective, given that the variation in effective size between a neutral hydrogen atom and a proton-like state can be particularly extreme. Conversely, very small values for the split norm can represent a poor choice since their effect is negligible and can lead to linear-dependence issues in the basis set. There are several things to note regarding the choice of the split-norm approach to increasing the radial variational freedom of the basis set. As already pointed out, this is just one possible choice and there are many other possible approaches. In the all-electron numerical methodology of Delley,27 an alternative strategy is employed in which the basis functions for charged atomic states are used for the additional radial functions to describe more contracted environments. Alternatively, one could use extra Gaussian functions to mimic a standard multiple-zeta basis set from conventional molecular quantum mechanics.31 A strength of the split-norm approach is that the operation can be applied as many times as desired to create a basis set of arbitrary size in a systematic fashion. Usually, a doubleor triple-zeta basis is sufficient unless trying to achieve plane-wave levels of numerical convergence. We should note that the use of terms double zeta (DZ) and triple zeta (TZ) is a matter of conforming to the nomenclature that has arisen in the Gaussian community, although strictly speaking it is incorrect since there are no “zetas” (i.e., Gaussian exponents) in the present approach. In the terminology of Delley, the basis sets are referred to more correctly as double numeric (DN), triple numeric (TN), and so on.

58

SIESTA: A LINEAR-SCALING METHOD FOR DFT

It may be questioned whether an approach that allows atoms to adopt a smaller effective radius, but not a larger one, is always sufficient. The answer is usually in the affirmative. If the minimal basis set is constructed for the neutral atom, when an atom is placed in a crystal, or even in a molecule, the rate of decay of the valence states will usually be increased by Pauli repulsion due to the neighboring atoms. Hence, a shorter-range basis set is generally appropriate, although with some exceptions. Although the split-norm approach provides increased radial variational freedom, there is also the need to consider angular augmentation of the basis set. For example, a minimal basis set for hydrogen would only include the 1s orbital, but the moment an external field is applied, or the hydrogen forms a covalent bond to another atom, there is a need to describe asymmetric contributions to the electronic structure about the hydrogen nucleus. Therefore, it is necessary to include basis functions of higher angular momentum than those from the occupied valence states alone, and these are known as polarization functions. Typically, functions with a value of the angular momentum quantum number, l , one higher than that of the highest occupied state, are needed as a minimum requirement for a reliable description of the electronic structure (i.e., 2p for H, 3d for C, 4f for Fe, etc). Hence, the default basis set, and minimum recommended quality, for SIESTA would be double-zeta polarized (DZP). Although some special cases, such as bulk silicon, are relatively well described with a minimal basis set, these are the exceptions rather than the rule. The key question with polarization functions is how to obtain the radial form of these basis functions. Unfortunately, the excited states of the pseudopotential atomic problem tend to be either rather extended in space, or even unbound, and therefore taking the hard confined unoccupied orbitals, as basis functions can often be unsatisfactory. In an attempt to circumvent this problem, the default method for the generation of polarization functions uses perturbation theory. By applying an electric field to the atomic problem, states of higher angular momentum are created, and these are taken as the polarization functions. The choice of good polarization functions is the most difficult part of the basis set creation and is often responsible for lower-quality results, as can be demonstrated in an example. If we consider the comparison of results for the molecule SO2 , as obtained using the default DZP basis set in SIESTA and from the use of the same density functional with a range of standard Gaussian basis sets, it can be seen that there is some discrepancy (Table 2.2). If instead of using the default polarization functions, the shape of the radial part of this basis set is tuned by using a soft-confinement potential to lower the energy of the system variationally, a significant improvement is achieved. Indeed, the results for the DZP basis set are now very close to those for the equivalent Gaussian basis set. While default basis sets can be generated within the SIESTA methodology, according to the energy shift, split-norm, and perturbative polarization function schemes described above, there is also a possibility for the user to control the

METHODOLOGY

59

TABLE 2.2 Comparison of Optimized Structural Parameters for the Molecule SO2 with the PBE Functional as a Functional of Basis Set Quality Basis Set STO-6G 6-31G 6-311G 6-31G* 6-311G* DZP (standard/0.01 Ry) DZP (optimized polarization)

˚ r(S–O) (A)

∠(O–S–O) (deg)

1.628 1.634 1.630 1.483 1.477 1.509 1.482

107.40 114.67 114.66 119.34 119.04 118.71 119.34

basis fully. Accordingly, there are methods to tune to the basis set performance in a number of ways. 2.2.3.3 Charge State By default the basis set is generated for the reference state used in pseudopotential generation. However, a charge on a species can also be specified during basis set creation. Here a positive charge will lead to more contracted basis functions, while a small negative charge will result in more diffuse PAOs. Note that a large negative charge would not be sensible since species become formally unbound. 2.2.3.4 Variational Optimization The experience of other communities that have adapted molecular basis sets to the solid state shows that optimization of the basis set parameters with respect to the total energy of a target material can improve the results substantially.32 Although compromising the transferability, it allows the best results to be obtained for a particular problem while maintaining a low prefactor for the computational cost.

As with all numerical approximations, it is important to test the influence of basis set quality before embarking on any scientific study. While DZP should be adequate to obtain at least qualitatively correct results for most problems, this should not be assumed a priori for a new class of problem. It is also important to consider the consequences of radial confinement for the study to be undertaken. For example, if considering the decay properties of the electronic states of a surface into vacuum, by construction the answer will be in error unless steps are taken to rectify this.33 The present method will also share much of the cautionary advice common to all localized, atomic-centered basis sets, including basis set superposition error (BSSE) and the need for floating functions when describing states that involve electron density in a region away from atomic centers (e.g., a defect such as an F-center). BSSE can be a particular issue, since the overlap of basis functions from different atoms allows the radial confinement to be released, thereby artificially inflating the binding energy even more than usual. Therefore, when considering molecular adsorption, particularly if it is weak, it is essential to work with a low value for the energy shift and to apply a counterpoise correction34 to the final result in order to extract a meaningful binding energy.

60

SIESTA: A LINEAR-SCALING METHOD FOR DFT

2.2.4 Construction of the Kohn–Sham Equations

Once the basis set is defined, it is then possible to define the Kohn–Sham equations for the system of interest (see Section 1.3). Note that because the basis set is nonorthogonal, the overlap matrix must also be computed, in addition to the Hamiltonian. Although the average user of the SIESTA methodology need not understand all the details of how the elements of the Hamiltonian and overlap matrices are computed, it is essential to possess some appreciation of the underlying concepts and the numerical approximations that influence calculation quality. In considering the construction of the Kohn–Sham equations, it is possible to break the problem down into several components:

• • • • • •

Overlap matrix elements between basis functions Kinetic energy of basis functions Nonlocal contribution of the pseudopotential (confined to core region) Local contribution of the pseudopotential (long-range) Hartree potential (mean-field Coulomb interaction of electrons) Exchange-correlation contribution; either LDA or GGA

As emphasized previously, the key is to evaluate the terms in a manner that is linear scaling and efficient. The components naturally break down into two different classes of integral to be evaluated: those that depend on the basis functions only, and those that depend on the electron density or are potentially long-range. Considering first the overlap matrix elements, kinetic energy matrix elements, and the nonlocal contribution of the pseudopotential, these are all strictly local in real space, due to the finite range of the basis set. The first two terms depend on pairs of overlapping orbitals, and therefore the range is at most twice the largest orbital cutoff radius for any species. In the case of the nonlocal pseudopotential projectors, these give rise to matrix elements between the atomic center associated with the pseudopotential and the basis functions of up to two neighboring atoms. Hence, the range is slightly greater, spanning twice the largest orbital cutoff radius, plus twice the largest core radius for any pseudopotential. However, the range of interaction is still readily predefined. Evaluation of these two- or threecenter integrals can be performed readily by use of a Fourier expansion (see the original papers for full details8,28 ). The key point is that these integrals are performed with a default reciprocal space cutoff of 2000 Ry, which is sufficient to ensure that they are numerically well converged in all but the most extreme circumstances. Furthermore, the cost of these matrix elements is usually a minor part of the total computing time of any calculation. Therefore, the user need not be particularly concerned with the evaluation of these contributions to the Hamiltonian and overlap matrix. The remaining contributions to the potential and energy are more complex than the terms above since they involve the electron density rather acting directly on the basis functions. The electron density is, of course, expanded in terms of

METHODOLOGY

61

the basis functions: ρ(r) =

μν

ρμν =

ρμν φ∗ν (r)φμ (r)

i

BZ

cμi (k)oi (k)ciν (k)eik(rν −rμ ) dk

(2.5) (2.6)

where the coefficients are stored as the density matrix elements, ρμν . Here integration over the Brillouin zone is explicitly included and oi (k) represents the occupancy of eigenstate i at a given point in reciprocal space. If evaluated simplistically, this would make the Coulomb interaction between two points of electron density a long-range interaction that scales as the fourth power of the number of basis functions. Fortunately, this is less problematic than it appears for two reasons. First, the contribution due to the local part of the pseudopotential is of opposite sign to the interaction with the electron density. For a charge-neutral system, these two contributions cancel in the long-range limit, so the Coulomb interaction is ultimately screened. Second, the use of an auxiliary basis set to represent the electron density is well known to reduce the scaling problem and improve computational efficiency.2 Many different choices could be made to converge Coulomb sums efficiently, such as fast multipole methods,35 and to represent the electron density in an auxiliary basis set. In the SIESTA methodology, the choice was made to represent the electron density on a uniform Cartesian grid of points in real space. This decision can be justified for a number of reasons. First, unlikely in some localized basis sets, there is no natural representation to choose for the density expansion; although the basis functions themselves have some of the correct properties, it is difficult to extend the minimal set to ensure an accurate representation of the density at all points. A Cartesian grid is systematic and basis set shape independent; as the fineness of the grid increases, the aliasing error should decrease, as all Fourier components become representable. Second, the construction of the electron density is rigorously linear scaling. As shown in Fig. 2.4, only basis functions within the maximum cutoff radius can contribute to the electron density at a given grid point, and therefore the cost per point does not depend on the overall system size. Third, calculation of the exchange-correlation contribution for both LDA and GGA becomes a trivial summation over grid points. In the case of GGAs, calculation of the gradient of the density is facilitated by the use of a finite difference expansion36 over the neighboring grid points (and equally important, the additional contribution to the potential from the GGA is straightforward to determine in the same way). Once the total electron density on the grid points is known, it is possible to begin computation of the electrostatic potential, consisting of the electron–electron interaction (Hartree potential) and the electron–local component of the pseudopotential interaction. We note that the Hartree term is based on the interaction between the electron density at all points to give a single orbital-independent potential and therefore contains the self-interaction of an

62

SIESTA: A LINEAR-SCALING METHOD FOR DFT

Fig. 2.4 Calculation of the density based on two orbitals (large circles) on an underlying Cartesian mesh. Here the density contribution would only be nonzero at the mesh points (small circles).

electron with its own density, as is the norm within standard Kohn–Sham theory. Rather than working directly with the total electron density, it is advantageous to divide the electrostatic contributions into two parts: the neutral contribution and the deformation density. The electron density of the neutral atoms can readily be computed on the grid and subtracted from the total electron density to leave the deformation density. The neutral atom density can then be added to the local part of the pseudopotential to yield a potential that goes strictly to zero at the outermost core radius. Being local, the electrostatic contribution of the neutral atoms is readily computed. Having determined the deformation density on a uniform grid, δρ, the calculation of electrostatic potential due to this quantity, δVH , can be made through solution of Poisson’s equation: δρ(r) = ρtot (r) − ρNA (r) = −

1 2 ∇ δVH (r) 4π

(2.7)

At present, SIESTA solves for the potential through the use of a fast Fourier transform (FFT), as many efficient libraries are available to perform this task. Although this approach is not actually linearly scaling (N ln N ), the relative low scaling, combined with the efficiency of the method, ensures that the contribution to the computational cost is negligible and therefore the deviation from linear scaling due to this contribution has yet to be observed. Arguably a more significant drawback of the use of FFTs, with practical consequences for the user, is the requirement that all systems must have threedimensional periodic boundary conditions. In the implementation of the SIESTA method, all systems are automatically enclosed within a periodic cell, regardless

METHODOLOGY

63

of whether it is a molecule, a polymer, a surface, or a solid. For cases where there is no natural periodicity, the fictitious cell parameter(s) is chosen so as to ensure that there is no overlap between the basis functions of images. Although this guarantees that there are no direct matrix elements between periodic repeats, there is a potential for interaction via electrostatic terms. Consequently, for systems with a strong dipole or higher-order moment, it is recommended that the explicit convergence with respect to cell size be tested. Unlike plane-wave methods, the cost of including a large region of vacuum is generally small since there is no change in the basis set associated with this, and the only computational cost lies in the Fourier transform step to compute the potential. Hence, it is usually straightforward and inexpensive to ensure that the interaction between periodic images is negligible. An alternative to the use of fast Fourier transforms is to employ multigrid methods to solve the problem.37,38 This has the advantage of being linear scaling and can be adapted to any set of boundary conditions that are required. Although it has been explored in conjunction with the SIESTA method,39 the absolute performance remains slower than the use of FFTs, so it has not yet been adopted within the distributed implementation. Once the potential due to the deformation density is determined, by either FFTs or multigrid, the contribution to the energy from this term can be calculated by summing the product of this potential with the total electron density across the mesh. Having discussed the background to the evaluation of the electron density–oriented contributions to the Hamiltonian, it remains to consider the practical consequences for the use of the methodology. The most significant point is that there will always be a numerical error in the integral of quantities involving the electron density. While the description of the electron density at the grid points is correct, the integration between adjacent points is approximate. As the grid spacing is reduced, the numerical integration becomes more precise. Rather than specifying the grid spacing directly, the fineness is controlled by a kinetic energy value, known as the mesh cutoff , for the highest-energy Fourier component that can be represented. For periodic systems, the grid spacings allowed are constrained by the requirement to be commensurate with the unit cell, so the nearest mesh cutoff above the target specified is chosen. Typical mesh cutoffs are between 80 and 400 Ry, although higher values may be required for very precise calculations. Ultimately, the value required will depend on the pseudopotentials present or basis set shape and must be tested for convergence behavior. Note that the use of partial core corrections often necessitates the use of higher mesh cutoffs, due to the larger total electron density to be integrated. The practical consequence of the numerical integration error above is that there will be a small breaking of translational invariance (i.e., the energy of a system will change slightly according to its absolute Cartesian position relative to the underlying mesh). This is referred to as space rippling or the “egg-box” effect. In addition to affecting the energy, this will also lead to numerical deviations in the

64

SIESTA: A LINEAR-SCALING METHOD FOR DFT

forces. As a result, there can be slight symmetry breaking of structures or convergence slowdown during geometry optimization if the mesh cutoff is too low. It should be noted that this issue is common to most methods that use non-atomcentered basis (or auxiliary basis) sets, although it can be hidden through explicit symmetry constraints, or reduced through the use of softer pseudopotentials/basis function shapes. A number of practical schemes to reduce the influence of the “egg box” have evolved. Obviously, increasing the mesh cutoff is one, but since the mesh dominates the computational expense for small to moderately sized systems, this is not the ideal solution. A more efficient technique is referred to as grid-cell sampling. Imagine an isolated atom being displaced relative to the underlying grid. The energy of the system will vary with the periodicity of the grid and may exhibit a behavior that to first order resembles either a simple sine or cosine wave (see Fig. 2.5). If this were the case, the energy and forces could be evaluated for two positions displaced by half of a grid spacing relative to each other and then averaged. The result would then be invariant to absolute position. While the situation for molecules and solids is more complex, with many Fourier components, the averaging over several displacements with respect to the grid points can lead to a reduction of the numerical error in the forces. This is the grid-cell sampling technique. On the face of it, this may not appear to represent a computational saving over increasing the mesh cutoff, since multiple energy/force evaluations appear to be required. However, it transpires that the breaking of translational invariance is much more significant for the forces than for the potential. Consequently, the self-consistent field procedure (see Section 2.2.5) can be performed for a single mesh position and then only the force evaluation need be conducted

–939.67

Energy (eV)

–939.675

–939.68

–939.685

–939.69 0

0.2

0.4 0.6 Fraction of mesh spacing

0.8

1

Fig. 2.5 Egg-box effect for a Ne atom with a DZP basis set and an energy shift of 0.01 Ry. The total energy is plotted as a function of atom position relative to the underlying ), mesh in fractions of the mesh spacing. The curves shown are for a cutoff of 150 Ry ( 250 Ry ( ), 450 Ry ( ), and 250 Ry with a two-point grid cell sampling ( ).

METHODOLOGY

65

for multiple grid positions, thereby representing a considerable efficiency gain. The validity of this approximation can be seen in Fig. 2.5, where the grid-cell sampling correction largely removes the oscillation for a single atom. There are several further methodologies for the reduction of space-rippling effects. For example, the basis functions and pseudopotentials can be explicitly Fourier filtered to reduce the components beyond the mesh cutoff.40 Although this guarantees almost no invariance breaking for an isolated atom, it is difficult to limit the Fourier components that arise from combinations of basis functions from different atoms when they overlap. Ultimately, the only way to ensure that translational invariance is obeyed exactly is to use atom-centered integration grids, such as the radial grid techniques that have been employed for numerical basis sets.41 In such cases it is necessary to include the derivatives associated with the movement of the integration grid and the change of weights; terms that are often neglected for simplicity in some implementations, although there can also be numerical benefits to considering the grid to be fixed in some cases. So far we have focused on the requirements to achieve linear scaling in the CPU time cost of a calculation. However, for a scheme to be useful it is also necessary for the memory usage of an algorithm to increase linearly while being small in absolute size; otherwise, this will become the bottleneck that prevents large-scale calculations from being performed. The memory usage of a SIESTA calculation can be dominated by one of two things. First, there is the storage of the matrices used in the construction of the Kohn–Sham equations and subsequent quantities, which consists of the Hamiltonian, overlap, density, and energy-density matrices. Second, storage of the nonzero orbital values at the mesh points can represent a large amount of data, especially for high mesh cutoffs, and is often the dominant memory use. Other mesh-related quantities are typically much smaller since there can be several tens of orbitals that contribute to each mesh point in a dense solid, whereas other arrays involve just one number per grid point. In cases where the storage of the orbitals on the grid becomes a limiting factor, there is a direct-phi algorithm in which orbital values are recomputed on the fly (analogous to the direct SCF concept in Gaussian methods, but for different quantities). This approach greatly reduces memory usage at the expense of additional computational cost. The key to reducing the memory usage to linear scaling is to recognize that the Hamiltonian and overlap matrices are both sparse, due to the finite basis set range. Indeed, the number of nonzero elements per row or column remains fixed as the system size increases once the dimensions of the problem exceed the maximum interaction range. To exploit this, all matrices are stored in compressed row storage format, which is a standard technique for storing just the nonzero elements of a sparse array, at the cost of storing two extra integer pointer arrays to allow mapping of the stored elements to the dense matrix representation. To reduce this overhead, the overlap matrix is presently treated as possessing the same sparsity pattern as the Hamiltonian, even though it actually has a greater number of null elements. Along similar lines, the approximation is made that the density matrix obeys the same sparsity pattern as the Hamiltonian. Although the

66

SIESTA: A LINEAR-SCALING METHOD FOR DFT

density matrix is not physically constrained to be zero where the Hamiltonian is, the matrix elements that match the nonzero terms in the Hamiltonian capture the contributions that are important for the total energy. 2.2.5 Solving the Kohn–Sham Equations

Once the Hamiltonian and overlap matrices have been constructed, the next key step in any calculation is to solve for the new density matrix and then to iterate to self-consistency. The traditional approach to this problem has been to use matrix diagonalization to determine the Kohn–Sham eigenstates and then to use the coefficients of the basis functions to construct the next density matrix in the iterative sequence. This approach has the benefit of being able to determine both the occupied and unoccupied Kohn–Sham eigenstates, making it possible to compute properties such as the bandgap and densities of states. We note, of course, that these quantities should be interpreted with care since the Kohn–Sham wavefunctions do not represent true one-electron eigenstates as a result of selfinteraction error. For periodic systems it is necessary to integrate all observables across the Brillouin zone. This is usually approximated by a sum over discrete points in reciprocal space, and most commonly a uniform grid of k -points is chosen according to the scheme of Monkhorst and Pack.42 In the case of small unit cells it is necessary to take the same approach within the SIESTA methodology. One specific feature of the actual implementation is the standard method of choosing the grid size. Here a quantity called the K-grid cutoff can be chosen as a single value with units of distance. This methodology, due to Moreno and Soler,43 exploits the relationship between reciprocal space sampling on a grid of k -points and the equivalent sampling through the use of supercells (e.g., a 2 × 2 × 2 grid of k -points allows the same phase factors to be sampled as creating a 2 × 2 × 2 supercell in real space). By specifying the real space supercell length that is desired, the equivalent reciprocal space sampling for a single cell can be determined. Through the use of a single control value it is possible to try to achieve consistent convergence across a range of different systems, provided that the bandgap and dispersion are similar. Of course, to be certain, the user must always check the convergence for each system. The SIESTA methodology is designed to target large systems containing several hundreds to thousands of atoms. Thus, by the time such dimensions are reached, it is often a good approximation to consider only the Brillouin zone center (gamma point) for sampling purposes. This greatly simplifies the calculation and leads to a dramatic increase in computational speed since the Hamiltonian and overlap matrices become real rather than complex. Hence, from this point onward the assumption will be made that the integration over the Brillouin zone can be dropped and the system will be treated at the gamma point only. Since there are many efficient machine-optimized libraries for dense matrix diagonalization, usually based on the LAPACK and BLAS routines, this approach can be highly competitive up to relatively large system sizes. However, the problem of cubic scaling and the need to work typically with dense matrices

METHODOLOGY

67

ultimately dominates the computational cost. As a result, there has been considerable research over the last two decades into alternative techniques to determine the density matrix during self-consistency.44,45 Although improvements can be made to the diagonalization approach, such as solving for only the occupied states and iterative techniques for sparse matrices,46 there is a need for more radical alternatives to achieve linear scaling. The major difficulty when working with a localized atomic orbital basis set is the need to solve the generalized eigenvalue problem: H = εS

(2.8)

which involves first transforming the problem to a standard eigenvalue equation: H = ε

(2.9)

To do this implies the multiplication of the Hamiltonian by the effective inverse of the overlap matrix, which is often achieved indirectly through the use of Cholesky decomposition. Although both the Hamiltonian and overlap matrices may be very sparse, the difficulty is that the inverse of the overlap matrix is potentially much less sparse or even dense. While reordering techniques can reduce the degree of potential fill-in that occurs,47 and other factorization schemes48 may improve the level of sparsity of an effective inverted overlap matrix, the main challenge remains how to handle the nonorthogonality of the basis set while achieving linear scaling. One of the first linear-scaling methods to be proposed was the divide-andconquer method of Yang.49 The principle of the approach is to reduce the total set of Kohn–Sham equations into a series of smaller overlapping subproblems from which the overall electron density could be constructed. For example, a partition could be created centered on each atom of the system whereby all Hamiltonian and overlap matrix elements within a cutoff distance are collected and solved using diagonalization. Provided that the cutoff radius is much smaller than the total system size, the cost of each separate diagonalization is much less than that for solving for the whole system together, and will be independent of the number of atoms for the entire problem. Hence, linear scaling is achieved while retaining the use of efficient matrix diagonalization for small problems. The remaining issue is how to reconstruct the total density from the sum of the subproblems, since the same contribution will appear in many different partitions. While first formulated in terms of the electron density itself, the divide-and-conquer scheme was later also cast in terms of the coefficients of a density matrix,50 which is more appropriate here. Accordingly, the overlapping contributions can be partitioned as follows: ρμν =

α

α α Pμν Pμν

(2.10)

68

SIESTA: A LINEAR-SCALING METHOD FOR DFT

α Pμν =

⎧ ⎪ ⎨1 1 ⎪2

⎩0

μ ∈ α, ν ∈ α μ ∈ α, ν ∈ / α or μ ∈ / α, ν ∈ α μ∈ / α, ν ∈ /α

(2.11)

where α represents a partition label. The density matrix divide-and-conquer approach above has recently been implemented in SIESTA and shown to be an effective linear-scaling solution.51 Divide and conquer, as described above, is a simple and appealing approach to achieving linear scaling and has found considerable favor in some communities.52 However, it is important to recognize the limitations. First, for reasons of simplicity, the division of the Hamiltonian into submatrices is usually made based on a distance cutoff. However, decay lengths for matrix elements and the density matrix in different systems can vary substantially according to the nature of the bandgap, atoms involved, and so on. Therefore, truncation methods that are more adaptable to the physical problem are arguably superior. Second, the prefactor for the divide-and-conquer method is relatively high because a large amount of duplicate work is being performed (i.e., the same density matrix element is being computed many times over as a result of partition overlap). Third, all the subsystems are connected by the requirement that the Fermi energy must be globally the same; otherwise, electron density would flow from one partition to another until the chemical potential was equalized. Hence, once the submatrices have been diagonalized to obtain the local eigenspectrum, the population of the states cannot be determined without knowledge of the eigenvalues for all partitions simultaneously. Consequently, either the eigenvalues and eigenvectors for all subsystems must be stored, which represents a large amount of memory, or multiple diagonalizations must be performed for each partition, thus further raising the prefactor. Because of the issues described above relating to divide and conquer, especially the second factor, there has been a search for more efficient algorithms that act on a single sparse density matrix. All methods involve dropping negligible contributions to the density matrix in one way or another, and are generally applicable to materials with a HOMO/LUMO or bandgap. Within this there are two general classes of method: those that impose truncation on the density matrix and those that invoke localization of the wavefunction, similar to divide and conquer. Considering first the former class of methods, they recognize that the density matrix can be used directly without recourse to the Kohn–Sham wavefunctions. However, in doing so, the conditions of N-representability must be observed (i.e., the density matrix must be derivable from an underlying antisymmetric N particle wavefunction).53 For an orthonormal basis set, the density matrix must therefore obey the following conditions:

• • •

Symmetry. D = D T , where D is the density matrix and D T is its transpose. Trace. Tr(D) = Ne , where Tr represents the trace of a matrix and Ne is the number of electrons. Idempotency. D 2 = D, since eigenvalues are either 0 or 1.

METHODOLOGY

69

Given these constraints, a trial density matrix can be converged to an approximation to the true density matrix by one invoking one of two broad classes of approach. In the first class, purification formulas are used to iteratively transform

an approximate density matrix, D, into one that is more nearly idempotent, D. The most widely known purification transformation is that due to McWeeny54 :

= 3D 2 − 2D 3 D

(2.12)

although this has recently been generalized to higher orders by Niklasson.55 The second class of density matrix–based methods involve minimization of an energy functional of the trial matrix, subject to the constraints above, based on the Hamiltonian. One of the best known examples is the method of Li et al.,56 with further refinements by other groups.57,58 All of the techniques above are valuable approaches to linear-scaling generation of the density matrix. However, they perform optimally for a basis set that is orthonormal. For a localized atom-centered basis there is the extra complexity of transforming the Hamiltonian or carrying the effective inverse of the overlap matrix through the formulas. For this reason, the SIESTA methodology currently employs a different class of method that focuses on the localization of the wavefunction. It is possible to perform a unitary transformation of a set of extended wavefunctions into a localized set of states known as Wannier functions. Although this is a nonunique transformation, there are well-developed approaches for this process, such as maximally localized Wannier functions.59 It should also be noted that when discussing the locality of these Wannier functions, this usually implies an exponential decay rather than strict confinement. The culmination of several developments led to the Kim–Mauri–Galli (KMG)60 order-N functional for linear-scaling construction of the Wannier functions, and thereby the density matrix. This represents the default approach for achieving true linear scaling within SIESTA. Here the Wannier functions are forced to be strictly local through the use of a cutoff radius, so the approach has much in common with the philosophy of the density matrix divide-and-conquer method, but avoids the duplicate generation of matrix elements. Each atomic center carries a number of localized Wannier functions (LWFs), such that the total number of localized states exceeds the number of occupied states. The number assigned to a given atom is specified by (Ne + 2)/2 for KMG. Within the KMG method, the orbital coefficients within the localized states are determined by minimization of a functional that depends on the Hamiltonian and overlap matrix, as well as the chemical potential, μ, of the electrons: UKMG = 2

(2δij − Sij )(Hij − μSij )

(2.13)

ij

Here the use of the distinct subscripts i and j indicate that the Hamiltonian and overlap matrices have been transformed to the basis of the localized Wannier functions according to the coefficients of the orbital basis set within the LWFs.

70

SIESTA: A LINEAR-SCALING METHOD FOR DFT

The conceptual key to achieving linear scaling is that this expression avoids the need for explicit orthogonalization, but instead, imposes an energy penalty for the deviation from orthogonality [the first term in parentheses in expression (2.13) represents a truncated polynomial expansion of the inverse of the overlap matrix]. During the minimization the localized states therefore gradually become orthonormal until this condition is met at convergence. It is important to note that this minimization is an extra iterative step that lies within each self-consistent field (SCF) cycle. The greatest challenge within the KMG approach is the determination of the chemical potential, which represents the Fermi energy of the system. Because there is no determination of eigenstates in this method, the Fermi energy is not computed directly, although techniques exist to evaluate subsets of the eigenvalue spectrum of a matrix at a considerably lower cost than full diagonalization. However, this extra calculation is generally undesirable and would have to be repeated at every step of the self-consistent field procedure, since the Fermi energy changes as a function of the density matrix. In the KMG method, the chemical potential need not be exactly equal to the true Fermi energy; it must just lie above the top of the valence band/HOMO and below the conduction band/LUMO. For an insulator, or even many semiconducting materials, the bandgap is sufficiently large and the Fermi energy is known to be in the vicinity of zero, such that it is possible to “guess” a value of the chemical potential that satisfies this requirement. Alternatively, a trial-and-error approach can be used. If the chemical potential is set too low, the number of electrons in the system will lie below the actual number, while if it is set too high, the converse will be true. Should the value lie within a band, the minimization procedure can diverge, again providing an indication that the value chosen is not suitable. Where it can be afforded, a practical scheme that avoids the difficulty in setting the chemical potential correctly is the following. First, a small number of iterations of diagonalization are performed to obtain a good approximation to the density matrix, and the Fermi energy can be determined, as well as being seen to be stable. Having written out the unconverged density matrix, the calculation can then be restarted to use the KMG scheme, taking the Fermi energy from this calculation. Although the first step may represent a considerable initial overhead for the initial geometry, the cost rapidly becomes insignificant if an extensive geometry optimization or molecular dynamics simulation is subsequently to be run. Let us now consider the convergence behavior of the minimization of the KMG functional, assuming that the chemical potential has been chosen correctly to lie within the correct energy window. In Table 2.3 the number of iterations required to achieve minimization of the KMG functional is quoted for the simple case of bulk silicon. There are several trends to note in the behavior. First, the initial minimization of the orbital coefficients within the LWFs is very slow to converge and can take over 1000 iterations. This is because the initial guess for the localized states involves the use of random coefficients to avoid artificially biasing the symmetry of the solutions. Minimization uses conjugate gradients, and therefore

METHODOLOGY

71

TABLE 2.3 Number of Iterations Required to Converge the Localized Wannier Functions at Each of the First Five SCF Iterations and the Total Number of SCF Iterations Required for Convergence for Bulk Sia RcLWF (bohr)

Iter. 1

Iter. 2

Iter. 3

Iter. 4

Iter. 5

No. of SCF Cycles

6 8 10 12 14 16

502 902 1202 902 1502 902

16 171 302 302 302 302

6 30 302 5 7 1

6 18 100 5 7 1

6 10 6 7 5 3

7 12 10 13 9 8

a

The basis set and parameters are as in Fig. 2.6.

convergence is naturally slow. Attempts at using more sophisticated minimization algorithms have, however, generally proved no more effective. Second, subsequent SCF cycles require progressively fewer minimization steps since the LWFs from the previous cycle are reused and the number of iterations drops rapidly to less than 10. Third, the number of iterations required can decrease as the radius of confinement for the LWF (RcLWF) increases, especially for the later SCF cycles. Consequently, a more accurate calculation can actually be as fast overall, so the use of very small radii to confine the LWFs is not advisable. The variation of calculation quality as a function of the radius used for the localized states is illustrated in Fig. 2.6 for the case of bulk silicon. As can be 0.075

Percentage error

0.05 0.025 0 –0.025 –0.05 –0.075 8

10

12 RcLWF (Bohr)

14

16

Fig. 2.6 Percentage error in the total energy ( ) and optimized lattice parameter ) as a function of the localization radius for bulk silicon. Calculations are based on ( a 3 × 3 × 3 supercell containing 216 atoms for a SZ basis set and an energy shift of 0.01 Ry. The mesh cutoff is 250 Ry and the converged reference is for diagonalization using the gamma point only. The converged values for the total energy per atom and single-cell ˚ respectively. lattice parameter are −106.98172 eV and 5.541 A,

72

SIESTA: A LINEAR-SCALING METHOD FOR DFT

seen, sensitivity to the localization radius varies according to the property being studied. While the energy converges to within an acceptable error (i.e., less than ambient thermal energy) relatively quickly, the error in lattice parameter is slightly larger, and the curvature-related properties, such as bulk modulus, greater still. Of course, the rate of convergence is also dependent on the bandgap, which influences the decay of the states, and therefore testing the influence of this approximation is important for each material of interest. Before concluding the topic of solving the Kohn–Sham equations, it is worth briefly mentioning two topics that are common to all numerical implementations: spin and SCF convergence acceleration. For the case where diagonalization is used to achieve self-consistency, the SIESTA code allows the user to include spin polarization where either the total spin may be fixed or the electrons allowed to flow between spin states to attain a common Fermi energy. In addition, there is the option to use noncollinear spin to describe spiral magnetic states.61 If using a linear-scaling solver, in particular the KMG form, the options for treatment of spin are more limited. Spin polarization is still allowed, but control of the spin state is achieved via the specification of two separate values of the chemical potential for alpha and beta spin. Turning to the second topic, there are a number of methods for assisting the convergence of the self-consistent field procedure that might otherwise diverge or require a larger number of iterations. The simplest technique is static mixing, which may be applied to either the Hamiltonian or the density matrix, but is applied more conventionally to the latter. Here the density matrix for a new iteration is taken to be a combination of the old density matrix with the undamped result of the current solution step, i (either diagonalization or order N ), in a proportion controlled by the mixing parameter, α: i+1 i i Din = αDout + (1 − α)Din

(2.14)

Typically, values of the mixing parameter in the range 0.05 to 0.35 are used, where a small value is used for a poorly convergent system, while the larger value is appropriate for a wide-gap material. If too large a value is used, there is a risk that the SCF procedure may start to oscillate. Even in cases that are intrinsically convergent, the iterative process may take numerous cycles to converge as a result of the damped mixing, so there are acceleration techniques to deal with this. SIESTA has the option to use either Pulay mixing62 or the Broyden–Vanderbilt–Louie–Johnson scheme,63 both of which store information from previous iterations, such as the density matrix, and then extrapolate forward. These methods can reduce the number of iterations considerably, though as a caution it should be noted that they could also prevent convergence in some problematic cases. Although there are numerous other convergence techniques, such as level shifting,64 dynamic mixing, and exponential transformation,65 these have yet to be combined with the SIESTA implementation but may be available in the future.

REFERENCES

73

2.3 FUTURE PERSPECTIVES

This chapter has sought to present a perspective on the key background aspects of the SIESTA methodology that will be of value to a new user of the technique. A complementary chapter in this volume (Chapter 11) highlights some applications of the SIESTA approach that are possible, with a focus on the area of nanoscience. Unlike other mature computational methods, the SIESTA methodology could be considered an evolving approach that may develop further in the future as we learn about the optimal methods for creating numerical basis sets in particular. In addition, implementation in the SIESTA code will develop in response to new trends and advances in the field of density functional theory, where this is compatible with linear scaling. For example, there is no reason why the method cannot be extended to encompass Hartree–Fock exchange, hybrid functionals, and localized post-HF correlation methods, as has been the case for other solid-state codes. Acknowledgments

The author would like to express his grateful thanks to all those who have been involved in the development of the SIESTA methodology and software, whose hard work and inspiration the present chapter draws on significantly, while stressing that any opinions expressed are personal ones. The Australian Research Council is also thanked for support through the Discovery Program and for an Australian Professorial Fellowship.

REFERENCES 1. 2. 3. 4. 5.

6. 7. 8. 9. 10. 11.

Hohenberg, P.; Kohn, W. Phys. Rev . 1964, 136 , B864. Dunlap, B. I.; Connolly, J. W. D.; Sabin, J. R. J. Chem. Phys. 1979, 71 , 3396. Kohn, W. Phys. Rev. Lett. 1996, 76 , 3168. Artacho, E.; S´anchez-Portal, D.; Ordejo´n, P.; Garc`ıa, A.; Soler, J. M. Phys. Status Solidi (b) 1999, 215 , 809. Bock, N.; Challacombe, M.; Chee-Kwan, G.; Henkleman, G.; Nemeth, K.; Niklasson, A.-M.-N.; Odell, A.; Schwegler, E.; Tymczak, C.-J.; Weber, V. Los Alamos National Laboratory (LA-CC 01-2. LA-CC-04-086). Shao, Y. et al., PCCP 2006, 8 , 3172. VandeVondele, J.; Krack, M.; Mohamed, F.; Parrinello, M.; Chassaing, T.; Hutter, J. Comput. Phys. Commun. 2005, 167 , 103. Soler, J. M.; Artacho, E.; Gale, J. D.; Garc`ıa, A.; Junquera, J.; Ordejon, P.; SanchezPortal, D. J. Phys. Condens. Matter 2002, 14 , 2745. Kenny, S. D.; Horsfield, A. P.; Fujitani, H. Phys. Rev. B 2000, 62 , 4899. Ozaki, T. Phys. Rev. B 2003, 67 , 155108. Bowler, D. R.; Choudhury, R.; Gillan, M. J.; Miyazaki, T. Phys. Status Solidi (b) 2006, 243 , 989.

74

SIESTA: A LINEAR-SCALING METHOD FOR DFT

12. Skylaris, C. K.; Haynes, P. D.; Mostofi, A. A.; Payne, M. C. J. Phys. Condens. Matter 2008, 20 , 064209. 13. Perdew, J. P. Physica B 1991, 172 , 1. 14. Perdew, J. P.; Kurth, S.; Zupan, A.; Blaha, P. Phys. Rev. Lett. 1999, 82 , 2544. 15. Becke, A. D. J. Chem. Phys. 1993, 98 , 5648. 16. Anisimov, V. I.; Zaanen, J.; Andersen, O. K. Phys. Rev. B 1991, 44 , 943. 17. Kleinman, L.; Bylander, D. M. Phys. Rev. Lett. 1982, 48 , 1425. 18. Hamann, D. R.; Schl¨uter, M.; Chiang, C. Phys. Rev. Lett. 1979, 43 , 1494. 19. Troullier, N.; Martins, J. L. Phys. Rev. B 1991, 43 , 1993. 20. Kerker, G. P. J. Phys. C 1980, 13 , L189. 21. Vanderbilt, D. Phys. Rev. B 1990, 41 , 7892. 22. Bl¨ochl, P. E. Phys. Rev. B 1994, 50 , 17953. 23. Bili´c, A.; Gale, J. D. Phys. Rev. B 2009, 79 , 174107. 24. Louie, S. G.; Froyen, S.; Cohen, M. L. Phys. Rev. B 1982, 26 , 1738. 25. Ahlrichs, R.; Taylor, P. R. J. Chim. Phys. Phys. Chim. Biol . 1981, 78 , 315. 26. Becke, A. D.; Dickson, R. M. J. Chem. Phys. 1990, 92 , 3610. 27. Delley, B. J. Chem. Phys. 1990, 92 , 508. 28. Sankey, O. F.; Niklewski, D. J. Phys. Rev. B 1989, 40 , 3979. 29. Junquera, J.; Paz, O.; Sanchez-Portal, D.; Artacho, E. Phys. Rev. B 2001, 64 . 30. Anglada, E.; Soler, J. M.; Junquera, J.; Artacho, E. Phys. Rev. B 2002, 66 , 205101. 31. Sanchez-Portal, D.; Ordejon, P.; Artacho, E.; Soler, J. M. Int. J. Quantum Chem. 1997, 65 , 453. 32. Causa, M.; Dovesi, R.; Pisani, C.; Roetti, C. Phys. Rev. B 1986, 33 , 1308. 33. Garc´ıa-Gil, S.; Garc´ıa, A.; Lorente, N.; Ordejon, P. Phys. Rev. B 2009, 79 , 075441. 34. Boys, S. B.; Bernardi, F. Mol. Phys. 1970, 19 , 553. 35. Greengard, L.; Rokhlin, V. J. Comput. Phys. 1987, 73 , 325. 36. Chelikowsky, J. R.; Troullier, N.; Wu, K.; Saad, Y. Phys. Rev. B 1994, 50 , 11355. 37. Brandt, A. Math. Comput. 1977, 31 , 333. 38. Briggs, E. L.; Sullivan, D. J.; Bernholc, J. Phys. Rev. B 1995, 52 , R5471. 39. Artacho, E.; Anglada, E.; Dieguez, O.; Gale, J. D.; Garc`ıa, A.; Junquera, J.; Martin, R. M.; Ordejon, P.; Pruneda, J. M.; Sanchez-Portal, D.; Soler, J. M. J. Phys. Condens. Matter 2008, 20 , 064208. 40. Anglada, E.; Soler, J. M. Phys. Rev. B 2006, 73 , 115122. 41. Becke, A. D. J. Chem. Phys. 1988, 88 , 2547. 42. Monkhorst, H. J.; Pack, J. D. Phys. Rev. B 1976, 13 , 5188. 43. Moreno, J.; Soler, J. M. Phys. Rev. B 1992, 45 , 13891. 44. Goedecker, S. Rev. Mod. Phys. 1999, 71 , 1085. 45. Bowler, D. R.; Fattebert, J. L.; Gillan, M. J.; Haynes, P. D.; Skylaris, C. K. J. Phys. Condens. Matter 2008, 20 , 290301. 46. Lehoucq, R. B.; Sorensen, D. C.; Yang, C. ARPACK Users’ Guide: Solution of LargeScale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, Society for Industrial and Applied Mathematics, Philadelphia, 1998. 47. Karypis, G.; Kumar, V. SIAM J. Sci. Comput. 1999, 20 , 359.

REFERENCES

48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65.

75

Benzi, M.; Meyer, C. D.; Tuma, M. SIAM J. Sci. Comput. 1996, 17 . Yang, W. Phys. Rev. Lett. 1991, 66 , 1438. Yang, W.; Lee, T.-S. J. Chem. Phys. 1995, 103 , 5674. Cankurtaran, B. O.; Gale, J. D.; Ford, M. J. J. Phys. Condens. Matter 2008, 20 , 294208. van der Vaart, A.; Gogonea, V.; Dixon, S. L.; Merz, Jr, K. M. J. Comput. Chem. 2000, 21 , 1494. Coleman, A. J. Rev. Mod. Phys. 1963, 35 , 668. McWeeny, R. Rev. Mod. Phys. 1960, 32 , 335. Niklasson, A. M. N. Phys. Rev. B 2002, 66 , 155115. Li, X. P.; Nunes, R. W.; Vanderbilt, D. Phys. Rev. B 1993, 47 , 10891. Millam, J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106 , 5569. Challacombe, M. J. Chem. Phys. 1999, 110 , 2332. Mazari, N.; Vanderbilt, D. Phys. Rev. B 1997, 56 , 12847. Kim, J.; Mauri, F.; Galli, G. Phys. Rev. B 1995, 52 , 1640. Garc´ıa-Su´arez, V. M.; Newman, C. M.; Lambert, C. J.; Pruneda, J. M.; Ferrer, J. J. Phys. Condens. Matter 2004, 16 , 5453. Pulay, P. Chem. Phys. Lett. 1980, 73 , 393. Johnson, D. D. Phys. Rev. B 1988, 38 , 12807. Saunders, V. R.; Hillier, I. H. Int. J. Quantum Chem. 1973, 7 , 699. Douady, J.; Ellinger, Y.; Subra, R.; Levy, B. J. Chem. Phys. 1980, 72 , 1452.

3

Large-Scale Plane-Wave-Based Density Functional Theory: Formalism, Parallelization, and Applications ERIC BYLASKA William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington

KIRIL TSEMEKHMAN University of Washington, Seattle, Washington

NIRANJAN GOVIND and MARAT VALIEV William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington

The basic density functional formalism presented in Chapter 1 is applied to the simulation of large materials, solutions, and molecules using plane-wave basis sets. This parallels the applications developed in Chapter 2 for similar systems using atomic basis sets. Much attention is focused on the pseudopotentials that describe the interaction of the atomic nuclei and their inner-shell electrons (“ions”) with the valence electrons. Methods for simulating charged systems are described, as well as the use of hybrid density functionals in simulations of chemical properties. Advances in numerical methods and software (contained in the NWChem package) are described that allow for both geometry optimization and multi-picosecond time scale Car–Parinello molecular dynamic simulations of very large systems. Sample applications including the structure of hematite and the aqueous solvation of cations are described.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

77

78

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

3.1 INTRODUCTION

The development of fast and efficient ways to calculate density functional theory (DFT) using plane-wave basis sets1 – 8 combined with parallel supercomputers7,9 – 16 has opened the door to new classes of large-scale first-principles simulations. It is now routine at this level of theory to perform simulations containing hundreds of atoms,17 and simulations containing over 1000 atoms are feasible on today’s parallel supercomputers,20 making realistic descriptions of a variety of systems possible. Several techniques are responsible for the efficiency of plane-wave DFT programs. The central feature is the representation of the electronic orbitals in terms of a plane-wave basis set. In this representation, one can take advantage of fast fourier transform (FFT) algorithms21 for fast calculations of total energies and forces. Periodic boundary conditions (PBCs) are also incorporated automatically as a result. However, the plane-wave basis sets do have an important shortcoming: their inefficient description of the electronic wavefunction in the vicinity of the atomic nucleus or core region. Valence wavefunctions vary rapidly in this region and much more slowly in the interstitial regions (or bonding regions) (see Fig. 3.1). Accurate description of the rapid variation of the wavefunction inside the atomic or core region would require very large plane-wave basis sets. The pseudopotential plane-wave (PSPW) method can be used to resolve this problem.22 – 25 In this approach the fast-varying core regions of the atomic potentials and the core electrons are removed or pseudized and replaced by smoothly varying pseudopotentials. The pseudopotentials are constructed such that the scattering properties of the resulting pseudoatoms are the same as those of the original atoms.26,27 The rationale behind the pseudopotential approach is that changes in the electronic wavefunctions during bond formation occur only in the valence region, and therefore proper removal of the core from the problem should not affect the prediction of bonding properties of the system. The projector augmented plane-wave (PAW) method developed by Bl¨ochl is a further enhancement of the pseudopotential in that it addresses some of the shortcomings encountered in a traditional PSPW approach. Since the main computational algorithms are essentially the same in the two approaches, we will not specifically discuss the PAW approach and refer the reader to comprehensive reviews.8,15,28 – 31

Fig. 3.1

Valence wavefunction.

PLANE-WAVE BASIS SET

79

3.2 PLANE-WAVE BASIS SET

Plane waves are natural for solid-state applications, since crystals are readily represented using periodic boundary conditions where the system is enclosed in a unit cell defined by the primitive lattice vectors a 1 , a 2 , and a 3 , as shown in Fig. 3.2. However, periodic plane-wave basis sets can also be used for molecular simulations as long as the unit cell is large enough to minimize the image interactions between cells. In terms of plane waves, the molecular orbitals are represented as 1 ψi (r) = √ ψi (G)eiG·r {G}

(3.1)

where is the volume of the primitive cell ( = [a 1 , a 2 , a 3 ] = a 1 · (a 2 × a 3 )). Since the system is periodic, the plane-wave expansion must consist of only the plane waves eiG·r that have the periodicity of the lattice, which can be determined using the constraint eiG·(r+L) = eiG·r

(3.2)

where L is the Bravais lattice vector (L = n1 a 1 + n2 a 2 + n3 a 3 , with n1 , n2 , n3 = integers) and G represents the wave vectors, which can be defined in terms of the reciprocal lattice vectors: N1 N2 N3 (3.3) b1 + i2 − b2 + i3 − b3 Gi1 i2 i3 = i1 − 2 2 2 a3 a2 a3 a2

a1

a1

Periodic Boundaries

Fig. 3.2 Unit cell in periodic boundary conditions. The solid arrows represent the Bravais lattice vectors.

80

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

where N1 , N2 , and N3 are chosen sizes of the lattice vector grid, which can range from 1 to ∞; i1 , i2 , and i3 are integers defined in the ranges of 1 · · · N1 , 1 · · · N2 , and 1 · · · N3 , respectively, and b1 = 2π

a2 × a3

b2 = 2π

a3 × a1

b3 = 2π

a1 × a2

(3.4)

are the primitive reciprocal lattice vectors. A real space grid that is dual to the reciprocal lattice grid can be defined and is given by i1 1 i2 1 i3 1 a1 + a2 + a3 ri1 i2 i3 = − − − (3.5) N1 2 N2 2 N3 2 The transformation between the reciprocal and real space representations is achieved via the discrete Fourier transform: N3 N1 N2 1 F (Gj1 j2 j3 )eiGj1 j2 j3 ·ri1 i2 i3 f (ri1 i2 i3 ) = √ j =1 j =1 j =1

√

1

2

F (Gi1 i2 i3 ) = N1 N2 N3

3

N3 N1 N2

f (rj1 j2 j3 )e

(3.6)

−iGj1 j2 j3 ·ri1 i2 i3

j1 =1 j2 =1 j3 =1

These transformations can be calculated efficiently via fast Fourier transform (FFT) algorithms.21 In typical plane-wave calculations, the plane-wave expansion is truncated in that only the reciprocal lattice vectors whose kinetic energy is lower than a predefined maximum cutoff energy, 1 2 2 |G|

(3.7)

< Ecut

are kept in the expansion, while the rest of the coefficients are set to zero. The density is also expanded using plane waves, ρ(r) =

i

ψ∗i (r)ψi (r) =

ρ(G)eiG·r

(3.8)

G

Since the density is the square of the wavefunctions, it can vary twice as rapidly. Hence, for translational symmetry to be formally maintained, the density should contain eight times more plane waves than the corresponding wavefunction expansion. Often, the density cutoff energy is chosen to be the same as the wavefunction cutoff energy; this approximation is known as dualing. An added complication arises in the calculation of crystalline systems. In these systems the orbitals may have long-wavelength contributions that span over a large number of primitive unit cells. To account for the infinite number of electrons in the periodic system, an infinite number of k-points are required.

PSEUDOPOTENTIAL PLANE-WAVE METHOD

81

The Bloch theorem, however, helps restate this problem of calculating an infinite number of wavefunctions to one of calculating a finite number of wavefunctions at an infinite number of k-points or BZ points: eik·r ψi (G)eiG·r ψi (r) = √ G

(3.9)

Since the occupied states at each k-point contribute to the electronic potential, an infinite number of calculations are required in principle. However, experience tells us that wavefunctions at k-points that are nearby are almost identical. As a result, one can redefine the k-point summations or integrals in the DFT expressions to those that just span only a small set of special k-points in the Brillouin zone. There are a number of prescriptions to generate these special points. Since a detailed discussion of the various prescriptions is beyond the scope of this chapter, we refer the reader to more comprehensive papers and reviews.1,32 – 34 Obviously, for molecular systems there is no need for k-point sampling. Systems with large unit cells (disordered systems) and large bandgap systems also do not require or require a limited k-point sampling because the long-wavelength components are typically contained within the unit cell as in the former, or the electronic states are localized as in the latter. In this work we restrict ourselves to the -point (k = 0), since we are interested in isolated systems and systems with large unit cells.

3.3 PSEUDOPOTENTIAL PLANE-WAVE METHOD

The pseudopotential plane-wave method (PSPW) has its roots in the work on orthogonalized plane waves35 and core state projector methods,23 and empirical pseudopotentials have been used for some time in plane-wave calculations.25,36 – 38 However, this method was not considered entirely reliable until the development of norm-conserving pseudopotentials.26,39 – 41 It is currently a very popular method for solving DFT equations. In particular, PSPW can perform ab initio molecular dynamics very efficiently,3 and treat unit cells up to a couple of thousand atoms.4,6,7,17 Another advantage of PSPW methods is their transferability across a wide range of systems. In this section we describe implementation of the norm-conserving PSPW method. Formulas for the total energy, wavefunction gradient, and nuclear gradients are given in terms of a plane-wave basis set at the -point. 3.3.1 Pseudopotentials

Pseudopotentials (effective core potentials) are based on two observations. First, in almost any system one can identify a set of core orbitals which change little from their atomic counterparts. Second, the remainder, or valence orbitals, acquire their oscillating behavior as a result of their orthogonality to the core orbitals. This also keeps valence electrons away from the core. In the

82

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

pseudopotential approximation the original atoms that constitute a given chemical system are modified by removing core states and replacing their effect via a repulsive pseudopotentials. This removes the rapid oscillations from the atomic valence orbitals and allows efficient application of plane-wave basis set expansion. The resulting pseudoatoms will in general acquire a nonlocal potential term. There have been many ways to define pseudopotentials.1,23,24,27,40 – 58 The original procedure of Phillips and Kleinman formed pseudopotentials from pseudo wavefunctions in which atomic core wavefunctions were added to the valence wavefunctions.23 Unfortunately, this procedure and related later developments44 – 46 resulted in “hard-core” potentials that contained singularities. These pseudopotentials were not useful in plane-wave calculations, since the nonregularized singularities could not be expanded using a reasonable number of plane waves. At about the same time, “soft-core” empirical pseudopotentials were developed.24,25,36 – 38 These potentials, which were made up of smooth functionals with a few parameters, were fitted to reproduce one-electron eigenvalues and orbital shapes. Such soft-core pseudopotentials were readily expanded using plane waves. However, pseudopotentials generated in this way were not transferable, yielding pseudowavefunctions that were different from the true valence wavefunctions by a few percent outside the core. Later it was realized that soft-core pseudopotentials needed to maintain norm conservation for them to be transferable.26,39 – 41 The principle of norm conservation states that if the charge of the real valence densities and the pseudovalence densities are identical inside the core region, the real valence wavefunction and pseudowavefunction will be identical outside the core region. This procedure was refined over the years and now most soft-core pseudopotentials are designed to have the following properties54 :

• • • • • •

The valence pseudowavefunction generated from the pseudopotentials should not contain nodes. The pseudowavefunctions near zero approach ϕ˜ l (r) → r l+1 . This criterion removes the singularities from the pseudopotential. Real and pseudovalence eigenvalues agree for a chosen “prototype” atomic = εPP configuration (εAE l l ). Real and pseudoatomic valence wavefunctions agree beyond a chosen core radius r c . Real and pseudovalence charge densities agree for r > r c . Logarithmic derivatives and the first energy derivatives agree for r > r c .

These types of pseudopotentials are called norm-conserving pseudopotentials. Here we review briefly the construction of pseudopotentials suggested by Troullier and Martins.54 The first step is to solve the radial Kohn–Sham equation self-consistently for a given atom: l(l + 1) 1 d2 + + V (r) ϕnl (r) = εnl ϕnl (r) (3.10) − AE 2 dr 2 2r 2

83

PSEUDOPOTENTIAL PLANE-WAVE METHOD

to obtain a set of radial atomic orbitals, {ϕnl }. The self-consistent potential VAE (r) is given by Z ρ(r ) dr + Vxc (ρ(r)) (3.11) VAE (r) = − + r |r − r | where the density, ρ(r), is given by the sum of the occupied orbital densities, ϕnl (r) 2 ρ(r) = fnl (3.12) r nl

and Vxc (ρ(r)) is the exchange–correlation potential. In Eq. (3.12), fnl is the occupancy of the nl state. Pseudopotential construction starts by introducing a smooth pseudovalence wavefunction, ϕ˜ l (r), such that it and at least one derivative continuously approaches the all-electron valence wavefunction, ϕlAE (r), beyond a chosen cutoff radius rcl . In addition, to avoid a hard-core pseudopotential (i.e., a singularity in the pseudopotential), the pseudowavefunctions near zero have to approach ϕ˜ l (r) → r l+1 . The actual functional form of ϕ˜ l (r) could be chosen in many different ways. Troullier and Martins suggested the following form for the pseudowavefunctions: ϕlAE (r) if r ≥ rcl (3.13) ϕ˜ l (r) = l+1 p(r) r e if r < rcl where p(r) is a polynomial of order 12: p(r) =

6

cn r 2n

(3.14)

n=0

The seven coefficients are then determined using the following constraints:

• • •

Norm conservation with the core Continuity of the pseudowavefunction and its first four derivatives at rcl The curvature of the screened pseudopotential at the origin defined to be zero

An explicit procedure to do this can be found in the paper by Troullier and Martins.54 The next step is to generate the screened pseudopotentials, which are easily obtained by inverting the radial Schr¨odinger equation: Vlscr (r) = εl −

l(l + 1) 1 d2 + ϕ˜ l (r) 2r 2 2ϕ˜ l (r) dr 2

⎧ ⎨VAE (r) 2 = ⎩εl + l + 1 p (r) + p (r) + [p (r)] r 2 2

if r ≥ rcl if r < rcl

(3.15)

84

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

Three important properties of pseudopotentials result from Eq. (3.15). First, the pseudopotential will not be continuous if the pseudowavefunction does not have at least two continuous derivatives. Second, a hard-core singularity will be present in Eq. (3.15) if ϕ˜ l (r) = r l+1 at zero. Third, the pseudopotentials may contain discontinuities if the pseudowavefunctions have nodes. For rare gases, where all the electrons are in the core, these are the correct pseudopotentials to use. However, in cases where one wants to include valence electrons in a calculation, the screened potentials must be unscreened to remove the effects of the valence electrons from the pseudopotential, thus generating an ionic pseudopotential. This is done by subtracting off the Hartree and exchange–correlation potentials that are calculated from the valence pseudowavefunctions from the screened pseudopotential: ∞ 4π r ion scr 2 Vl (r) = Vl (r) − ρ˜ (r )r dr − 4π ρ˜ (r )r dr − Vxc (˜ρ(r)) (3.16) r 0 r where ρ˜ (r) =

l

ϕ˜ l (r) 2 fl r

(3.17)

In Section 3.4.8, fl is the occupancy of the valence state l. Based on these atomic pseudopotentials, the pseudopotential for the entire system takes the form

Vpsp (r, r ) =

lmax l

∗ Ylm (ˆr)(Vlion (|r|)δ(|r| − |r |))Ylm (ˆr )

(3.18)

l=0 m=−l

where Ylm (ˆr) are spherical harmonic functions. Because of the explicit angular dependence of the pseudopotentials, the formula for applying ionic pseudopotentials of Eq. (3.18) to nonspherical systems is fairly difficult. In this semilocal form, the pseudopotential is computationally difficult to calculate with a planewave basis set, since the kernel integration is not separable in r and r . This form of the pseudopotential is usually simplified by rewriting the potential kernel into a separable form suggested by Kleinman and Bylander,59 which was later shown by Bl¨ochl60 to be the first term of a complete series expansion using atomic pseudowavefunctions. Equation (3.18) rewritten within the Kleinman–Bylander form is KB Vpsp (r, r ) = Vlocal (r) +

lmax l

∗ Plm (r)hl Plm (r )

(3.19)

l=0 m=−l

where the atom-centered projectors Plm (r) are of the form

Plm (r) = Vlion (|r|) − Vlocal (|r|) ϕ˜ l (|r|)Ylm (ˆr)

(3.20)

PSEUDOPOTENTIAL PLANE-WAVE METHOD

85

and the coefficient hl = 4π

∞ 0

[Vlion (r)

− Vlocal (r)]ϕ˜ l (r)r dr 2

−1 (3.21)

where ϕ˜ l (r) are the zero radial node pseudowavefunctions corresponding to Vlion (r). The choice of the local potential, Vlocal (r), is somewhat arbitrary but is usually chosen to be the highest angular momentum pseudopotential.27,54 When a larger series expansion atomic wavefunction is used,49,60 it is easy to show that Eq. (3.19) will have the general form Vpsp (r, r ) = Vlocal (r) +

lmax l n max n max

Pnlm (r)hn,n Pn∗ lm (r ) l

(3.22)

l=0 m=−l n=1 n =1

It is known that the norm-conservation condition results in harder pseudopotentials for some elements. For example, the p states in the first-row elements (oxygen, 2p) and the d states in the second-row transition elements (copper, 3d) do not have core counterparts of the same angular momentum. As a result, these states are compact and close to the core compared to the other valence states, resulting in higher plane-wave cutoffs. The ultrasoft pseudopotentials developed by Vanderbilt52,61 relax the norm-conservation condition by generalizing the norm-conservation sum rule. This results in pseudopotentials that are smoother and consequently require a lower plane-wave cutoff. We do not discuss the details of these pseudopotentials in this chapter and refer the reader to more comprehensive reviews.7,8,28,31,62 3.3.2 Total Energy

The total energy in the pseudopotential plane-wave method can be written as a sum of kinetic, external (i.e., pseudopotential), electrostatic, and exchange and correlation energies: Etotal = Ekinetic + Epseudopotential + Eelectrostatic + Exc

(3.23)

The kinetic energy can be written Ekinetic =

1 2

fi G2 |ψi (G)|

2

(3.24)

i,G

where fi are the occupation numbers. To simplify our presentation here we restricted ourselves to spin-unpolarized systems, with fi = 2. The extension to spin-polarized systems is straightforward and will not be discussed here. The pseudopotential energy Epseudopotential is given as a sum of local and nonlocal contributions: local nonlocal + Epseudopotential Epseudopotential = Epseudopotential

(3.25)

86

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

The local portion of the pseudopotential energy can be evaluated as I I local Epseudopotential = V local (r)ρ(r) dr = V local (G)ρ∗ (G) I

(3.26)

I,G

The valence electron density in reciprocal space ρ(G) is obtained from its real space representation, ρ(r) = fn |ψn (r)|2 , using a fast Fourier transform. The local potential is defined to be periodic and is represented as a sum of piecewise functions on the Bravais lattice by I I V local (r) = Vlocal (|r − RI − L|) (3.27) L I (r) where RI is the location of atom I, L is a Bravais lattice vector, and Vlocal is the radial local potential for the ion defined in Section 3.3.1. The local pseudopotential in reciprocal space is found by a spherical Bessel transform ∞ 4π I I V local (G) = √ eiG·RI Vlocal (r)j0 (r)r 2 dr (3.28) 0

is the spherical Bessel function. where j0 (r) = sin(r) r The nonlocal part of the pseudopotential energy is given by nonlocal I = fi ψ∗i (G)VˆNL (G, G )ψi (G ) Epseudopotential i

I

(3.29)

G,G

where I VˆNL (G, G ) =

I I∗ Plm (G)hIl Plm (G )

(3.30)

lm I (G) is the reciprocal space representation of the nonlocal projector [e.g., and Plm Eq. (3.20)], which can be obtained using the spherical Bessel transform ∞ 4π −iG·RI −l I I ˆ Plm (G) = √ e i Ylm (G) Plm (r)jl (r)r 2 dr (3.31) 0

The electron–electron repulsion energy can be written as e−e = Eelectrostatic

1 2

VH (r)ρ(r)dr

=

1 2

G

ρ(G)VH∗ (G)

where the Hartree potential, VH (r), is defined as ρ(r − L) dr VH (r) = + L| |r − r L

(3.32)

(3.33)

PSEUDOPOTENTIAL PLANE-WAVE METHOD

and in reciprocal space it is calculated as ⎧ ⎨ 4π ρ(G) VH (G) = G2 ⎩ 0

G = 0

87

(3.34)

G=0

The ion–ion electrostatic energy for a periodic system can be facilitated using the Ewald decomposition63 : 1 4π |G|2 ion-ion = exp −i Eelectrostatic 2 |G|2 4ε G=0 ⎡ ⎤ ⎣ ZI exp(iG · RI )ZJ exp(−iG · RJ )⎦ I,J

+

1 2 L

ZI ZJ

I,J ∈|RI −RJ +L|=0

erf(ε|RI − RJ + L|) |RI − RJ + L|

2 ε 2 π −√ Z − ZI π I I 2ε2 I

(3.35)

where ε is a constant (typically on the order of 1) and L is a lattice vector. The exchange–correlation energy Exc with LDA or GGA approximation is given by Exc = fxc (ρ(r), |∇ρ(r)|)dr

fxc (ρ(ri1i2i3 ), |∇ρ(ri1i2i3 )|) ≈ Nr

(3.36)

i1i2i3

where fxc is the exchange–correlation energy density, is the volume of the unit cell, and N is the number of real-space grid points in the FFT grid ri1i2i3 . 3.3.3 Electronic Gradient

During the course of total energy minimization or Car–Parrinello molecular dynamics simulation it is required to calculate the electron gradient, defined as Si =

δEtotal δψ∗i

(3.37)

Part of the electron gradient is evaluated in reciprocal space and the other in real space: Si = SiG + Sir

(3.38)

88

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

The reciprocal-space portion contains contributions from the kinetic and nonlocal pseudopotential energy terms: nonlocal ∂Epseudopotential ∂Ekinetic + ∂ψ∗i (G) ∂ψ∗i (G) 1 I VˆNL = G2 ψi (G) + (G, G )ψi (G ) 2 I

SiG (G) =

(3.39)

G,G

The real-space portion is given by ∂ local e-e + E + E E xc pseudopotential electrostatic ∂ψ∗i (r) I = VH (r) + V local (r) + Vxc (r) ψi (r)

Sir (r) =

(3.40)

I I

where VH (r) and V local (r) are the Hartree potential and the local pseudopotential, respectively. The exchange–correlation potential is given by64 Vxc (ri1i2i3 ) = =

δExc δρ(ri1i2i3 )

1 ∂fxc ∇ρ(r ) ∂fxc iG·(ri1i2i3 -r ) − e iG · ∂ρ(ri1i2i3 ) N |∇ρ(r )| ∂∇ρ(r )

(3.41)

G,r

Equivalently, all the real-space expressions above can be derived from a completely reciprocal space representation using the convolution theorem. The real-space forms above are, however, considerably more efficient to compute. 3.3.4 Atomic Forces

The force acting on the atoms in the system is defined as FI = −

∂Etotal ∂RI

(3.42)

Only the pseudopotential and ion–ion electrostatic energies contribute to the force: I I + Fion-ion FI = Fpseudopotential

The force due to the pseudopotential is give by I =− Fpseudopotential

local ∂Epseudopotential

∂RI

−

nonlocal ∂Epseudopotential

∂RI

(3.43)

CHARGED SYSTEMS

=i

Gρ∗ (G)V local (G) I

G

− 2 Re

where ∇RI

89

i

I

lm

I ψ∗i (G)Plm (G)

hl ∇RI

G

I∗ Plm (G )ψi (G )

(3.44)

G

G

I∗ I∗ Plm (G )ψi (G ) = i G G Plm (G )ψi (G ).

The force due to the ion–ion interaction is given by I =− Fion-ion

=−

ion-ion ∂Eelectrostatic ∂RI

ZI ZJ (RI − RJ + L)

L J ∈|RI −RJ +L|=0

2 exp(−ε2 |RI − RJ + L|2 ) erf(ε|RI − RJ + L|) +√ × |RI − RJ + L|3 |RI − RJ + L|2 πε |G|2 1 4π G 2 exp − ZI + |G| 4ε G=0 × Im exp(iG · RI ) ZJ exp(−iG · RJ ) (3.45)

J

3.4 CHARGED SYSTEMS

As we have discussed so far, plane waves are ideal to describe systems that are intrinsically periodic. However, periodic and aperiodic systems are very different within a periodic boundary condition (PBC) framework and this is compounded further if the system is charged (e.g., charged defects, charged ions). The electrostatic energy in these systems is, in principle, divergent. A standard approach to dealing with this issue is to impose a charge-neutrality condition via a uniform charge background. This implicitly introduces a jellium background. Makov and Payne66 have shown that this procedure results in errors which go as L−1 for charged systems and L−3 for isolated neutral systems in three dimensions, where L is size of a cubic unit cell. One approach to minimizing these errors is to use the scheme developed by Leslie and Gillan65 and improved by Makov and Payne.66 They derived an analytic expression for the electrostatic correction between charged unit cells as follows: q 2 α 2πqQ 1 + O − (3.46) EMakov-Payne = Etotal − 2L 3L3 L5

90

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

where Etotal is the calculated energy of the charged cell, α is the Madelung constant for the lattice, q is the total charge of the cell, and Q is the quadrupole moment of the cell, given by r 2 ρ(r) dr (3.47) Q=

Another approach for charged systems is via free-space boundary conditions. Provided that the density has decayed to zero at the edge of the supercell, freespace boundary conditions can be implemented by restricting the integration to just one isolated supercell, , 1 ρ(r)g(r, r )ρ(r ) dr dr ECoulomb = 2 VH ( r) = g(r, r )ρ(r ) dr (3.48)

This essentially defines a modified Coulomb interaction ⎧ ⎨ 1 for r, r ∈ g(r, r ) = |r − r | ⎩ 0 otherwise

(3.49)

Hockney and Eastwood showed that an interaction of the form of Eq. (3.49) could still be used in conjunction with the fast Fourier transform convolution theorem.67,68 In their algorithm, the interaction between neighboring supercells is removed by padding the density with an external region of zero density, or in the specific case of a density defined in cubic supercell of length L, the density is extended to a cubic supercell of length 2L, where the original density is defined as before on the [0, L]3 domain and the remainder of the [0, 2L]3 domain is set to zero. The grid is eight times larger than the conventional grid. The Coulomb potential is calculated by convoluting the density with the Green’s function kernel on the extended grid. The density on the extended grid is defined by expanding the conventional grid to the extended grid and putting zeros where the conventional grid is not defined. After the aperiodic convolution, the free-space potential is obtained by restricting the extended grid to the conventional grid. In his original work, Hockney suggested that the cutoff Coulomb kernel could be defined by ⎧ constant ⎪ for |ri,j,k | = 0 ⎪ ⎨ h (3.50) g(ri,j,k ) = 1 ⎪ ⎪ otherwise ⎩ |ri,j,k | where h3 is the constant volume of subintervals, defined by the unit cell divided by the number of conventional FFT grid points.67 Hockney suggested a constant

CHARGED SYSTEMS

at |r| = 0 to be between 1 and 3. Barnett and defined the constant to be69 ⎧ ⎪ ⎨2.380077 1 1 dr ≈ 0.910123 ⎪ h2 h 3 r ⎩1.447944

91

Landman in their implementation for SC lattice for FCC lattice for BCC lattice

(3.51)

Regardless of the choice of the constant, the singular nature of g(r) in real space can lead to significant numerical error. James addressed this problem somewhat by expanding the Coulomb kernel to higher orders in real space.70 The convolution theorem suggests that defining g(r) in reciprocal space will lead to much higher accuracy. A straightforward definition in reciprocal space is guniform (G)eiG·r g(r) = G

1 guniform (G) = 3 h

e−i(G•r/2 ) dr r

(3.52)

where is the volume of the extended unit cell and h3 is the volume of the unit cell divided by the number of conventional FFT grid points. The reciprocal space definition gains accuracy because the singularity at r = r in Eq. (3.48) is integrated out analytically. Even when Eq. (3.52) is used to define the kernel, a slight inexactness in the calculated electron–electron Coulomb energy will always be present, due to the discontinuity introduced in the definition of the extended density where the extended density is forced to be zero in the extended region outside . However, this discontinuity is small, since the densities we are interested in decay to zero within , thus making the finite Fourier expansion of the extended densities extremely close to zero in the extended region outside . Equation (3.52) could be calculated numerically; however, we have found that alternative definitions can be used with little loss of numerical accuracy. In an earlier work71,72 we suggested that the cutoff Coulomb kernel could be defined as ⎧ ga (G)eiG·r for |r| ≤ Rmax − δ ⎪ ⎪ ⎨ G g(r) = ⎪ 1 ⎪ ⎩ otherwise |r| ⎧ 2π(Rmax )2 ⎪ ⎪ for G = 0 ⎨ h3 ga (G) = ⎪ ⎪ ⎩ 4π [1 − cos(G2 Rmax )] otherwise h3 G2

92

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

⎧ L (simple cubic) ⎪ ⎪ ⎪√ ⎪ ⎪ ⎨ 2 L (face-centered cubic) Rmax = 2 ⎪ ⎪ √ ⎪ ⎪ ⎪ 3 ⎩ L (body-centered cubic) 2 δ = small constant

(3.53)

Other forms have been suggested and could also be used.7,73 – 75 The Fourierrepresented kernels improve the integration accuracy by removing the singularity at |r − r | in a trapezoidal integration. A disadvantage of the kernel defined by Eq. (3.53) is that only regular-shaped cells can be used. To extend this method to irregular-shaped cells, short- and long-range decomposition can be used15 : g(r) = gshortrange (r) + glongrange (r) gshortrange (G) eiG·r gshortrange (r) = ⎧ 4π 2 2 ⎪ ⎨ 3 2 (1 − e−(|G| /4ε ) ) gshortrange (G) = h G ⎪ ⎩ π h3 ε2 ⎧ erf(εr) ⎪ ⎪ for r = 0 ⎨ r glongrange (r) = 2ε ⎪ ⎪ ⎩√ for r = 0 π

for G = 0 for G = 0

(3.54)

We have found this kernel to give very high accuracy, even for highly noncubic supercells. Marx and Hutter recently proposed the use of this kernel as well.7 Other kernel definitions are possible (e.g., using short- and long-range decomposition based on a Lorentzian).74 Other schemes involve the use of countercharges, represented by Gaussian densities, whose potential can be derived analytically. Since a detailed discussion of the various approaches to this problem is beyond the scope of this chapter, we refer the reader to various papers on the subject.65,66,76 – 78 3.5 EXACT EXCHANGE

A number of failures are known to exist in DFT (see Chapter 1), such as underestimating bandgaps, the inability to localize excess spin density, and underestimating chemical reaction barriers. These problems are a consequence of having to rely on computationally efficient approximations to the exact exchange–correlation functional (e.g., LDA and GGA) used by plane-wave DFT programs—that is an accuracy–performance trade-off. It is generally agreed

EXACT EXCHANGE

93

that the largest error in these approximations is their failure to completely cancel out the orbital self-interaction energies, or in plain terms that electrons partially “see” themselves.79,80 In the Hartree–Fock approximation, the exchange energy is calculated exactly and no self-interaction is present; however, by definition all electron correlation effects are missing from it. In all practical implementations of DFT the exchange energy is calculated approximately, and cancellation of the self-interaction is incomplete. Experience has shown that many of the failures associated with the erroneous self-interaction term can be corrected by approaches in which DFT exchange–correlation functionals are improved by inclusion of the nonlocal exchange term (hybrid-DFT, e.g., B3LYP and PBE081 ),82 Ex-exact = −

σ ρij (r)ρσij (r ) 1 dr dr 2 σ=↑,↓ n m |r − r |

(3.55)

were the overlap densities are given by σ ρσij (r) = ψσ∗ i (r)ψj (r)

(3.56)

Using the expanded Bloch states83 representation eik·r σ ψik (G)eiG·r ψσik (r) = √ G

(3.57)

the exchange term takes the form Ex-exact =

−1 2 dk dl 2 8π3 σ=↑,↓ BZ BZ 4π σ σ ρ (−G)ρik;j l (G) |G − k + l|2 j l;ik n m

(3.58)

G

where ρσik;j l (G) =

σ ψσ∗ ik (G )ψj l (G + G)

(3.59)

G

As pointed out by Gygi and Baldereschi84 – 86 and others,87 – 91 this expression must be evaluated with some care, especially for small Brillouin zone samplings and small unit cell size, because of the singularity at G − k + l = 0. A better alternative for the evaluation of Ex-exact for -point (k = 0) calculations with large unit cells can be found in terms of localized Wannier orbitals.92,93

94

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

The standard approach for the generation of Wannier orbitals using unitary transformation over k, σ wi (r − L) = e−ik·L ψσik (r)dk (3.60) 8π3 BZ is not applicable for the -point case. Instead, one can follow a Marzari–Vanderbilt localization procedure (which is the counterpart of the Foster–Boys transformation for finite systems)92 – 94 forming linear combinations of ψσik=0 (r) over different n to produce a new set of -point σ Bloch functions, w ik=0 (r). These new periodic orbitals are extremely localized within each cell for nonmetallic systems with sufficiently large unit cells93 σ (see Fig. 3.3). In that case w ik=0 (r) can be represented as a sum of piecewise σ localized functions, wi (r − L), on the Bravais lattice σ w ik=0 (r) =

wiσ (r − L)

(3.61)

L

with the exchange term per unit cell written as Ex-exact = −

1 2 i

Fig. 3.3 (color online) SiO2 crystal.

j

wi∗ (r)wj (r)wj∗ (r )wi (r ) |r − r |

dr dr

(3.62)

Periodic localized function wik=0 (r) for a 72-atom unit cell of a

WAVEFUNCTION OPTIMIZATION FOR PLANE-WAVE METHODS

95

Evaluation of this integral in a plane-wave basis set requires some care, since representing overlap densities [wi∗ (r)wj (r)] with a plane-wave expansion [i.e., ∗ w i (r)w j (r)] will result in the inclusion of redundant periodic images. Interactions between such images can be eliminated95,96 by replacing the standard Coulomb kernel, 1/r, in Eq. (3.13) by the following cutoff Coulomb kernel: Nc +2

fcutoff (r) =

1 − [1 − e−(r/Rc ) r

]Nc

(3.63)

where Nc and Rc are adjustable parameters. This kernel decays rapidly to zero at distances larger than Rc . Hence, Eq. (3.62) can be transformed to σ 1 wσ∗ Ex-exact = − 2 i (r)w j (r)fcutoff σ=↑,↓

i

j

σ∗

σ (|r − r |)wj (r )w i (r ) dr dr

(3.64)

That is, replacing wi (r) with w i (r), combined with using Eq. (3.14), in Eq. (3.13) will give the same energy, since the cutoff Coulomb interaction is nearly 1/r with itself and zero with its periodic images. The parameter Rc must be chosen carefully. It has to exceed the size of each Wannier orbital to include all of the orbital in the integration, while concurrently having 2Rc be smaller than the shortest linear dimension of the unit cell to exclude periodic interactions. Finally, we note that when one uses the cutoff Coulomb kernel, localized orbitals are not needed to calculate the exchange term since Eq. (3.62) can be unitary transformed, resulting in σ σ∗ σ Ex-exact = − 12 ψσ∗ i (r)ψj (r)fcutoff (|r − r |)ψj (r )ψi (r ) dr dr

σ=↑,↓

i

j

(3.65)

and δEx-exact =− ψσj (r) σ∗ δψi (r)

σ fcutoff (|r − r |)ψσ∗ j (r )ψi (r ) dr

(3.66)

j

We note that while using the localized functions here is not required in this formulation, one should still evaluate the set of maximally localized Wannier functions in order to estimate their extent and, consequently, the minimal size of the unit cell. 3.6 WAVEFUNCTION OPTIMIZATION FOR PLANE-WAVE METHODS

In DFT calculations it is necessary to determine the set of orthonormal oneelectron wavefunctions {ψi } that minimize the Kohn–Sham energy functional. There are two classes of methods available for optimizing the Kohn–Sham energy functional: the self-consistent field approach and the direct minimization approach.

96

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

3.6.1 Self-Consistent Field Method

The steps involved in the self-consistent field procedure are as follows: 1. Set the iteration number m = 0 and choose an initial set of trial molecular orbitals {ψn } and input charge ρ(r); for example, ρ(0) (r) =

occ

|ψi (r)|2

i=1

2. Use the input charge density to construct an effective potential which is a sum of the Hartree and exchange–correlation potentials, respectively: Veff (r) = VH ρ(m) , r + Vxc ρ(m) , r 3. Generate a new set of molecular orbitals by solving the linearized Kohn–Sham equations via an iterative scheme: I I 1 2 (V local (r) + VˆNL ) + Veff (r) ψi (r) = εi ψi (r) −2∇ + I

4. Use the new set of molecular orbitals to construct an output density: ρ(m) out (r) =

occ

˜ n (r)|2 |ψ

n=1

5. Generate a new input density by mixing the output density with the previous input density: ρ(m+1) (r) ⇐ ρ(m) , ρ(m) out 6. If self-consistency is not achieved, m = m + 1; go to step 2. In this scheme, self-consistency is achieved when the distance between the input and output charge densities is zero: D[ρout , ρ] = ρout − ρ|ρout − ρ (3.67) For plane-wave methods, where the molecular orbitals are expanded using ∼10,000 to several million basis functions, an efficient iterative method for diagonalizing the Kohn–Sham Hamiltonian is needed. Many iterative methods have been developed4,6,97 – 101 and several good reviews on the subject are available in the literature. Two of the more popular algorithms used for plane-wave methods include the conjugate gradient algorithm applied to plane-wave calculations proposed by Teter et al.99 and the residual minimization method direct inversion

WAVEFUNCTION OPTIMIZATION FOR PLANE-WAVE METHODS

97

in the iterative subspace (RMM-DIIS) proposed by Pulay.97,98 A preconditioning scheme is generally used with these methods.4,6,7,99 An important step in the self-consistent field procedure is the generation of a new trial density, ρ(m+1) , from prior input, ρ(m+1) , and output, ρ(m) out , densities. A simpleminded iteration, ρ(m+1) = ρ(m) out

(3.68)

in which the input density is replaced by the output density will usually result in the development of charge oscillations which cause the algorithm to diverge. The simplest way to control these oscillations is to dampen them during the iteration process by a simple mixing algorithm, ρ(m+1) = (1 − α)ρ(m) + αρ(m) out

(3.69)

where α is a parameter between [0,1]. In many cases convergence can be achieved by using a suitable choice of α (e.g., 0.1 ≤ α ≤ 0.5). Several other iteration schemes have been developed besides simple mixing.6,97,102 – 113 3.6.2 Direct Methods

An alternative approach is to treat the DFT energy functional as an optimization problem and minimize it directly.4,7,114 – 116 Interest in this method began with the introduction of the Car–Parrinello algorithm.3 These methods stand out in that they rarely, if ever, fail to achieve self-consistency. The simplest of this class of methods is the fixed-step steepest descent algorithm, which is effectively the Car–Parrinello algorithm (see Section 3.7) with the velocity set to zero at every step in the iteration. Orthonormality constraints are handled by Lagrange multipliers. A significantly more powerful approach is the conjugate gradient method on the Grassmann manifold developed by Edelman et al.117 This method is very fast and has been shown to demonstrate superlinear speedup near the minimum. In this algorithm, the set of wavefunctions ψi are written in terms of a tall and skinny N basis × N e matrix: ⎤ ⎡ ψ1 (φ1 ) ψ2 (φ1 ) ··· ψNe (φ1 ) ⎢ ψ1 (φ2 ) ψ2 (φ2 ) ··· ψNe (φ2 ) ⎥ ⎥ ⎢ ⎢ ψ1 (φ3 ) ψ2 (φ3 ) ··· ψNe (φ3 ) ⎥ (3.70) Y =⎢ ⎥ ⎥ ⎢ .. .. .. .. ⎦ ⎣ . . . . ψ1 (φNbasis ) ψ2 (φNbasis ) · · · ψNe (φNbasis ) where the matrix is written in terms of the orthonormal basis φj (r) (or eiGj ·r for a plane-wave basis) by

Nbasis

ψi ( r ) =

j =1

ψi (φj )φj ( r )

(3.71)

98

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

and obeys the orthogonality constraint Y t Y = I . The following steps illustrate this algorithm: 1. Given an initial Y0 such that Y0t Y0 = I , calculate the tangent residual: G0 = (1 −

Y0 Y0t )

δE δY t

Y =Y0

2. Set H0 = −G0 and Enew = Etotal (Y0 ). 3. Find the compact singular value decomposition of H0 : H0 → U V t 4. Minimize Etotal (Y (θ)) in the following geodesic line parameterized by θ: Y (θ) = Y V cos (θ) V t + U sin (θ) V t 5. Set Y1 = Y (θ), Eold = Enew , and Enew = Etotal (Y1 ). 6. Calculate the tangent residual: δE t G1 = (1 − Y1 Y1 ) δY t Y =Y1 7. Parallel-transport the previous search direction along the geodesic: T0 = [−Y0 V sin (θ) + U cos (θ)] V t 8. Compute the new search direction, H1 = −G1 +

Tr[G1 , G1 ] T0 Tr[G0 , G0 ]

9. Set Y0 = Y1 , G0 = G1 , and H0 = H1 . 10. If Eold − Enew > tolerance, go to step 3. 3.7 CAR–PARRINELLO MOLECULAR DYNAMICS

The development of fast and efficient ab initio molecular dynamics methods (AIMD), such as Car–Parrinello molecular dynamics,3 has opened the door to the study of strongly interacting many-body systems by direct dynamics simulation without the introduction of empirical interactions. In AIMD simulations the electronic degrees of freedom are continuously updated at each step in the simulation and all the changes in the electronic structure are properly accounted for. The forces are calculated as derivatives of the total energy calculated with respect to the atomic positions. Hence, the dynamical simulation automatically includes

CAR–PARRINELLO MOLECULAR DYNAMICS

99

all many-body interactions and effects, such as changes in coordination, bond saturation, and polarization. Applications for this first-principles method include the calculation of free energies, search for global minima, explicit simulation of solvated molecules, and so on. This important generalization of molecular dynamics methods to include the essential physics of the interactions of complex systems comes at a considerable price. However, with present-day algorithms and parallel supercomputers, simulations of hundreds atoms for a time scale of several picoseconds are feasible. Although this is far less, both in numbers of particles and in time, than is possible with conventional MD, AIMD simulations might be the only option for systems with complex chemistry where even qualitative interpretation requires proper description of interatomic interactions. In the Car–Parrinello version of AIMD the electronic and ionic degrees of freedom are updated simultaneously. This is accomplished by introducing a fictitious electronic kinetic energy functional ˙ ∗i (r)ψ ˙ i (r) dr μ ψ (3.72) KE({ψi }) = 12 i

where μ is a fictitious mass assigned to electron degrees of freedom. The equations of motion for the ion, RI , and the Kohn–Sham orbitals, ψi , are found by taking the first variation of the auxiliary Lagrangian: 1 ˙ I |2 ˙ ∗i (r)ψ ˙ i (r) dr + 1 μ ψ MI |R L({ψi }, {RI }) = 2 2 i I − Etotal ({ψi }, {RI }) + ψ∗i (r)ψj (r) dr − δi,j j,i i,j

(3.73) The resulting equations of motion are ¨ i (r) = −H ψi (r) + μψ

ψj (r)j,i

(3.74)

j

¨ I = FI MI R

(3.75)

δEtotal = H ψi (r) δψ∗i (r)

(3.76)

where

Given the equations of motion (Sections 3.3.3 and 3.3.4), the electronic and ionic degrees of freedom can be integrated using the Verlet algorithm: ⎡ ⎤ 2 (t) ⎣−H ψti (r) + ⎦ (3.77) (r) = 2ψti (r) − ψt−t (r) + ψtj (r)t+t ψt+t i i j,i μ j

100

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

Rt+t = 2RtI − Rt−t + I I

(t)2 FI MI

(3.78)

is determined by the orthogonality constraint The matrix t+t j,i

(r)ψt+t (r) dr = δi,j ψ∗t+t i i

(3.79)

This constraint yields the matrix Riccatti equation [to simplify the following t (r) + equations the following symbols are used: ψi (r) = 2ψti (r) − ψt−t i t 2 2 (t /μ)H ψi (r), α = t /μ]:

ψ∗t+t (r)ψt+t (r) dr i j ! ∗t ψ∗t i,k ψ∗t = i (r) − α ψi (r) + k (r)

I=

k

ψtj (r)

−α

t ψj (r)

+

!

ψtl (r)l,j

dr

l

= A + Xt B + B t X + Xt CX

(3.80)

where Xij = αij and the matrices Ai,j , Bi,j , and Ci,j are given by Aij = Bij = Cij =

∗t

t

t {ψ∗t i (r) − α[ψi (r)]}{ψj (r) − α[ψj (r)]} dr t

(3.81)

t [ψ∗t i (r)]{ψj (r) − α[ψj (r)]} dr

(3.82)

t ψ∗t i (r)ψj (r) dr

(3.83)

Bl¨ochl28 suggested the following iteration for solving this matrix Riccatti equation: A(0) = A (n)

A

=A

(n+1) = Xrs

(n−1)

(3.84) +X

(n−1)t

B + BX

(n−1)

t Urit Uij (A(n) j k − δj k )Ukl Uls i,j,k,l

bi + bl

+X

(n−1)t

CX

(n−1)

(3.85) (3.86)

where the eigenvalues b and the unitary matrix U are obtained from diagonalizing Uilt bl Ulj . Bij = l

PARALLELIZATION

101

3.8 PARALLELIZATION

During the course of a total energy minimization or molecular dynamics simulation the electron gradient δEtotal /δψ∗i [Eq. (3.37)] needs to be calculated as efficiently as possible. For a pseudopotential plane-wave calculation the main parameters that determine the cost of a calculation are Ng , Ne , Na , and Nproj , where Ng is the size of the three-dimensional FFT grid, Ne is the number of occupied orbitals, Na is the number of atoms, and Nproj is the number of projectors per atom. In most plane-wave DFT programs the solution of eigenvalue equations is typically approached by means of a conjugate gradient algorithm or, for dynamics, a Car–Parrinello algorithm that requires many evaluations of the electron gradient. The operation counts for each part of the electron gradient are shown in Fig. 3.4. The three (or four) major computational pieces of the gradient are: 1. The Hartree potential VH , including the local exchange and correlation potentials Vx + Vc . The main computational kernel in these computations is the calculation of Ne three-dimensional FFTs. 2. The nonlocal pseudopotential, VˆNL . The major computational kernel in this computation can be expressed by the following matrix multiplications: W = Pt · Y, and Y2 = P · W, where P is an Ng × (Nproj · Na ) matrix, Y and Y2 are Ng × Ne matrices, and W is an (Nproj · Na ) × Ne matrix. We note that for most pseudopotential plane-wave calculations, Nproj · Na ≈ Ne . 3. Enforcing orthogonality. The major computational kernels in this computation are following matrix multiplications: S = Yt · Y and Y2 = Y · S, where Y and Y2 are Ng × Ne matrices, and S is an Ne × Ne matrix. 4. When exact exchange is included, the exact exchange operator Kij ψj . The major computational kernel in this computation involves the calculation of (Ne +1) · Ne three-dimensional FFTs.

Fig. 3.4 (color online)

Operation count of H ψ in a plane-wave DFT simulation.

102

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

There are several ways to parallelize a plane-wave Hartree–Fock and DFT program.7,9,11,12,15,18 For many solid-state calculations the computation can be distributed over the Brillouin zone sampling space.11 This approach cannot be used for -point (k = 0) calculations with large unit cells. Another approach is to distribute the one-electron orbitals across processors.12 The drawback of this method is that orthogonality parts of the computation will involve a lot of message passing. Furthermore, this method will not work for simulations with very large cutoff energy requirements (i.e., using large numbers of plane waves to describe the one-electron orbitals) on parallel computers that have nodes with a small amount of memory, because a complete one-electron must be stored on each node. Hence this approach is not practical for Car–Parrinello simulations with large unit cells; however, this approach can work well for simulations with modest-size unit cells and with small cutoff energies, when used in combination with minimization algorithms that perform orthogonalization sparingly (e.g., RMM-DIIS). Another straightforward way is to do a spatial decomposition of the oneelectron orbitals.7,9,15 This approach is versatile, easily implemented, and is well suited for performing Car–Parrinello simulations with large unit cells and cutoff energies. However, a parallel three-dimensional fast Fourier transform (FFT) 1/3 must be used, which is known not to scale beyond ∼Ng processors (or processor groups), where Ng is the number of FFT grid points. In Fig. 3.5, an example of timings versus the number of CPUs for this type of parallelization is shown. These simulations were taken from a Car–Parrinello simulation of the hydrated uranyl cation UO2 2+ + 122H2 O using the plane-wave DFT module (PSPW) in NWChem.118 These calculations were performed on all four cores on the quadcore Cray-XT4 system (NERSC Franklin), composed of a 2.3-GHz single-socket quad-core AMD Opteron processor (Budapest). The NWChem program was compiled using a Portland Group FORTRAN 90 compiler, version 7.2.4, linked with the Cray MPICH2 library, version 3.02,for message passing. The performance of the program is reasonable with an overall parallel efficiency of 84% on 128 CPUs, dropping to 26% on 1024 CPUs. However, not every part of the program scales in exactly the same way. For illustrative purposes, the timings of the FFTs, nonlocal pseudopotential, and orthogonality are also shown. The efficiency of the FFTs are by far the biggest bottleneck in this implementation. At smaller processor sizes the inefficiency of the FFTs are damped out, due to the fact that these parts of the code make up less than 5% of the overall computation, and the largest part of the calculation is the nonlocal pseudopotential evaluation. Ultimately, however, the lack of scalability of the three-dimensional FFT algorithm 1/3 beyond the ∼Ng processor prevails, causing the simulation not to speed up. Recently, Gygi et al. have come up with an approach that can be used to improve the overall efficiency of a plane-wave DFT program.18 In this approach, both the spatial and the orbitals are distributed in a two-dimensional processor geometry, as shown in Fig. 3.6. Using simple scaling arguments, it can be shown that with this decomposition the algorithms will require only O(log(p1 ) + O(log(p2 ) communications per CPU as opposed to O(log(P )) communications per CPU for algorithms in which only the spatial or orbital dimensions are

PARALLELIZATION

103

Fig. 3.5 (color online) Overall and component timings and component from AIMD simulations of UO2+ 2 + 122H2 O using one-dimensional processor geometry. Overall best timings are also shown for a two-dimensional processor grid. Timings are from calculations on the Franklin Cray-XT4 computer system at NERSC.

Fig. 3.6 (color online) Parallel distribution (shown on the left), implemented in most plane-wave DFT software. Each of the one-electron orbitals is identically spatially decomposed. The two-dimensional parallel distribution suggested by Gygi et al.18 is shown on the right.

104

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

distributed (a processor for where the total number of processors, P , can be written as P = p1 p2 ). The overall performance of our plane-wave DFT simulations were found to improve considerably using this new approach. Using the optimal processor geometries, the running time per step took 2699 s (45 min) for 1 CPU down to 3.7 s with a 70% parallel efficiency on 1024 CPUs. The fastest running time found was 1.8 s with 36% parallel efficiency on 4096 CPUs. As shown in Fig. 3.7, these timings were found to be very sensitive to the layout of the two-dimensional processor geometry. For 256, 512, 1024, and 2048 CPUs, the optimal processor geometries were 64 × 4, 64 × 8, 128 × 8, and 128 × 16 processor grids, respectively. The timings of the FFTs, nonlocal pseudopotential, and orthogonality are also shown in Fig. 3.7. Not every part of the program scaled perfectly. The parallel efficiency of several other key operations depends strongly on the shape of the processor geometry. It was found that distributing the processors over the orbitals significantly improved the efficiency of the FFTs and the nonlocal pseudopotential, while distributing the processors over the spatial dimensions favored the orthogonality computations. The two-dimensional processor geometry method can also be used to parallelize the computation of the exact exchange operator. This operator has a cost of O(Ne 2 · Ng · log(Ng )), and when it is included in a plane-wave DFT calculation it is by far the most demanding term. The exchange term is well suited for this method. Whereas if only the spatial or orbital dimensions are distributed, the exchange term does not scale well. When only the spatial dimensions are distributed, each of the Ne (Ne + 1) FFT are computed one at a time, using the entire machine for each evaluation The drawback of this approach is that we are underutilizing the resources; parallel efficiency is effectively bounded to ∼Ng 1/3 processors. When only the orbital dimensions are distributed, the parallelization is realized by multicasting the O(Ne ) orbitals to set up the O(Ne 2 ) wavefunction pairs. This multicast is followed by a multireduction which reverses the pattern. We note that with this type of

Fig. 3.7 (color online) Overall and component timings in seconds for UO2+ 2 + 122H2 O plane-wave DFT simulations at various processor sizes (Np ) and processor grids (nj , ni = Np /nj ). Timings are from calculations on the Franklin Cray-XT4 computer system at NERSC.

PARALLELIZATION

105

algorithm one could readily fill a very large parallel machine by assigning each a few FFTs to each processor. However, to obtain reasonable performance from this algorithm it is vital to mask latency, since the interconnects between the processors will be flooded with O(Ne ) streams, each on long messages comprising Ng floating-point words of data. When both the spatial and orbital dimension are distributed, only the parallel three-dimension FFTs along the processor grid columns need to be computed. Compared with a multicast across all processors the benefit of this approach is to reduce latency costs, since broadcasting is done across the rows of the two-dimensional processor grid only. The overall best timings for hybrid-DFT calculations of an 80-atom supercell of hematite (Fe2 O3 ) with an FFT grid of Ng = 723 (Ne up = 272, Ne down = 272), and a 160-atom supercell of hematite (Fe2 O3 ) with an FFT grid of Ng = 144 × 72 × 72 (Ne up = 544 and Ne down = 544) (wavefunction cutoff energy = 100 Ry and density cutoff energy = 200 Ry) and orbital occupations of Ne up = 272 and Ne down = 272 are shown in Fig. 3.8. The overall best timing per step found for the 80-atom supercell was 3.6 s on 9792 CPUs, and for the 160-atom supercell

Fig. 3.8 (color online) Overall fastest timings taken for an 80- and 160-atom Fe2 O3 hybrid-DFT energy calculations. Timings are from calculations on the Franklin Cray-XT4 computer system at NERSC.

106

LARGE-SCALE PLANE-WAVE-BASED DENSITY FUNCTIONAL THEORY

of hematite, was 17.7 s on 23,936 CPUs. The timings results are somewhat uneven, since limited numbers of processor grids were tried at each processor size. However, even with this limited amount of sampling, these calculations were found to have speedups to at least 25,000 CPUs. We expect that further improvements will be obtained by trying more processor geometry layouts. 3.9 AIMD SIMULATIONS OF HIGHLY CHARGED IONS IN SOLUTION

An understanding of the structure and dynamics of the water molecules in the hydration shells surrounding ions is essential to the interpretation of many chemical processes in aqueous solutions. X-ray and neutron scattering results have been reported which provide direct results about shell structure for many ionic species.119,120 Information about the dynamics of water molecules in this region has also been obtained from other probes, such as NMR, infrared spectroscopy, and inelastic neutron scattering.119,120 For singly charged ions (Na+ , Li+ ), a structured first hydration shell can be identified. The residence time in this shell is short (e.g. j

A −r /F 1 − e ij σi ,σj rij

(4.8)

Here rij is the distance between the N electrons i and j , the σ subscript is the spin label, and the parameter F is√chosen so that √ the electron–electron cusp conditions are obeyed (i.e., F↑↑ = 2A and F↑↓ = A). The value of A could be optimized using variance minimization or whatever. For systems with both electrons and nuclei present, one can write a standard Jastrow with all three terms (ignoring the spin dependence for clarity) as follows: J (R, {rI }) =

N i >j

u(rij ) +

NI N i=1 I =1

χI (riI ) +

NI N

fI (rij , riI , rj I )

(4.9)

i > j I =1

Other terms, such as an extra plane-wave expansion in the electron–electron separation for periodic systems or an additional three-body term, are part of our standard Jastrow54 and can be useful in certain circumstances but are not usually necessary. For particles with attractive interactions one finds that the usual Slater–Jastrow form is not appropriate, and in order to get a better description of exciton formation one might use a determinant of “pairing orbitals” instead.57 A further recent advance by members of our group has been the development of a completely general functional form for the Jastrow factor which allows the inclusion of arbitrary higher-order terms (depending on, for example, the separation of four or more particles); this has now been implemented in our code.58 To convince yourself that the Slater–Jastrow function is doing what it should, consider Fig. 4.2. These are the results of simple VMC calculations of the spindependent pair correlation function (PCF) in a silicon crystal with an electron fixed at a bond center.21 The figure on the left is for parallel spins and corresponds to the Fermi or exchange hole. The figure on the right is for antiparallel spins and corresponds to the correlation hole; note that the former is much wider and deeper than the latter. We have here then a pretty illustration of the different levels of theory that we use. In Hartree theory (where we use a Hartree product of all the orbitals as a wavefunction, and which thus corresponds to entirely uncorrelated electrons), both PCFs would have a value of 1 everywhere. In Hartree–Fock theory, the left-hand plot would look very similar, but the antiparallel PCF on the right would be 1 everywhere. The energy lowering over Hartree theory caused by the fact that parallel spin electrons tend to avoid each other is essentially the exchange energy, which correctly has a negative sign. It is slightly sobering to note that the entire apparatus of quantum chemistry (an expansion in billions of determinants) is devoted to modeling the little hole on the right and thereby evaluating the correlation energy. In QMC our quantum of solace comes from

WAVEFUNCTIONS AND THEIR OPTIMIZATION

133

Fig. 4.2 (color online) VMC plots of the pair correlation function for (on the left) parallel spins and (on the right) antiparallel spins using a Slater–Jastrow wavefunction. The data are shown for crystalline silicon in the (110) plane passing through the atoms and shows the pair correlation functions around a single electron fixed at a bond center. The atoms and bonds in the (110) plane are represented schematically. (From Ref. 20, with permission. Copyright © 1997 by The American Physical Society.)

our compact representation; with a Slater–Jastrow function we can do the same thing in VMC using a simple polynomial expansion involving a few tens of parameters, and if this is not accurate enough we can make the necessary minor corrections to it using the DMC algorithm. However, we do not know a priori what the shape of the hole is, and we must therefore optimize the various parameters in the Slater–Jastrow function in order to find out. The usual procedure is to leave the Slater determinant part alone and optimize the Jastrow factor. With a full inhomogeneous Jastrow such as that of Eq. (4.9), we generally optimize the coefficients of the various polynomial expansions (which appear linearly in the Jastrow factor) and the cutoff radii of the various terms (which are nonlinear). The linearity or otherwise of these terms clearly has a bearing on their ease of optimization. There is, of course, no absolute prohibition on optimizing the Slater part and one might also envisage, for example, optimization of the coefficients of the determinants of a multideterminant wavefunction, or even the orbitals in the Slater determinants themselves (although the latter is quite difficult to do in general, and often pointless). A higher-order technique called backflow , to be explained in a subsequent section, also involves functions with optimizable parameters. We thus turn our attention to the technicalities of the optimization procedure. Now, optimization of the wavefunction is clearly a critical step; it is also a numerically difficult one. It is apparent that the parameters appear in many different contexts, they need to be optimized in the presence of noise, and there can be a great many of them. As has already been stated, there are two basic

134

QUANTUM MONTE CARLO

approaches. Until recently, the most widely used was the optimization of the variance of the energy, [Tα (R)]2 [ELα (R) − EVα ]2 dR 2 (4.10) σE (α) = α 2 [T (R)] dR where EV is the variational energy, with respect to the set of parameters {α}. Now, of course, there is no reason that one may not optimize the energy directly, and because wavefunctions corresponding to the minimum energy turn out to have more desirable properties, this has become the preferred approach in the last few years. Historically, variance minimization was much more widely used60,61 —not just for trivial reasons such as the variance having a known lower bound of zero—but most important because of the difficulties encountered in designing a robust, numerically stable algorithm to minimize the energy, particularly in the case of large systems. First, I briefly summarize how a simple variance minimization is done. Beginning with an initial set of parameters α0 (generated, for example, simply by 2 (α) with zeroing the Jastrow polynomial coefficients), we proceed to minimize σE respect to them. A correlated-sampling approach turns out to be most efficient. α First, a set of some thousands of configurations distributed according to |T 0 |2 is generated. Practically speaking, a configuration in this sense is just a snapshot of the system taken at intervals during a preliminary VMC run and consists of the current particle positions and the associated interaction energies written on a line of a file. We then calculate the variance in the energies for the fully sampled set of configurations. This is the objective function to be minimized. Now, unfortunately, every time we modify the parameters slightly, the wavefunction changes and our configurations are no longer distributed according to the square α of the current Tα , but to the square of the initial wavefunction T 0 . In principle, therefore, we should regenerate the configurations, a relatively expensive procedure. The correlated sampling is what allows us to avoid this; we reuse the initial set of configurations simply by including appropriate weights w in the formula for the variance: α [T 0 (R)]2 wαα0 [ELα (R) − EV (α)]2 dR 2 (α) = (4.11) σE α [T 0 (R)]2 wαα0 dR where

EV (α) =

α

[T 0 (R)]2 wαα0 ELα (R) dR α [T 0 (R)]2 wαα0 dR

(4.12)

WAVEFUNCTIONS AND THEIR OPTIMIZATION

135

α

and the weight factors wα 0 are given simply by wαα0 =

[Tα (R)]2 α [T 0 (R)]2

(4.13)

2 (α) is minimized. This may be done The parameters α are then adjusted until σE using standard algorithms which perform an unconstrained minimization of a sum of m squares of functions that contain n variables (where m ≥ n) without requiring the derivatives of the objective function (see, e.g., Ref. 59). Although in principle we do not need to regenerate the configurations at all, one finds in practice that it usually pays to recalculate them occasionally when the wavefunction strays very far from its initial value. Generally, this needs to be done only a couple of times before we obtain complete convergence within the statistical noise. There is a problem, however. Thus far we have described the optimization of what is known as the reweighted variance. In the limit of perfect sampling, the reweighted variance is equal to the actual variance, and is therefore independent of the initial parameters and the configuration distribution, so that the optimized parameters would not change over successive cycles. The problem arises from the fact that the weights may vary rapidly as the parameters change, especially for large systems. This can lead to severe numerical instabilities. For example, one or a few configurations acquire an exceedingly large weight, incorrectly reducing the estimate of the variance almost to zero. Somewhat surprisingly, perhaps, it usually turns out that the best solution to this is to do without the weights at all; that is, we minimize the unreweighted variance. We can do this because the minimum value of the variance (zero) is obtained only if the local energy is constant throughout configuration space, and this is possible only for eigenfunctions of the Hamiltonian. This procedure turns out to have a number of advantages beyond improving the numerical stability. The self-consistent minimum in the unreweighted variance almost always turns out to give lower energies than the minimum in the reweighted variance. (For some examples of this for model systems, see Ref. 62.) It was recognized only relatively recently62 that one can obtain a huge speedup in the optimization procedure for parameters that occur linearly in the Jastrow, that is, for Jastrows expressible as α αn fn (R). These are the most important optimizable parameters in almost all wavefunctions that we use. The reason this can be done is that the unreweighted variance can be written analytically as a quartic function of the linear parameters. This function usually has a single minimum in the parameter space, and as the minima of multidimensional quartic functions may be found very rapidly, the optimization is extraordinarily efficient compared to the regular algorithm, in particular because we no longer need to generate large numbers of configurations to evaluate the variance. The main nonlinear parameters in the Jastrow factor are the cutoff lengths where the function is constrained to go to zero. These are important variational parameters, and some attempt to optimize them should always be made. We normally recommend that

136

QUANTUM MONTE CARLO

a (relatively cheap) calculation using the standard variance minimization method should be carried out in order to optimize the cutoff lengths, followed by an accurate optimization of the linear parameters using the fast minimization method. For some systems, good values of the cutoff lengths can be supplied immediately (e.g., in periodic systems at high density with small simulation cells, the cutoff length Lu should be set equal to the Wigner–Seitz radius of the simulation cell). Let us now move on to outlining the theory of energy minimization. We know that except in certain trivial cases the usual trial wavefunction forms cannot in general provide an exact representation of energy eigenstates. The minima in the energy and variance therefore do not coincide. Energy minimization should thus produce lower VMC energies, and although it does not necessarily follow that it produces lower DMC energies, experience indicates that more often than not, it does. It is also normally stated that the variance of the DMC energy is more or less proportional to the difference between the VMC and DMC energies,63,64 so one might suppose that energy-optimized wavefunctions may be more efficient in DMC calculations. For a long time, efficient energy minimization with QMC was extremely problematic. The methods that have now been developed are based on a well-known technique for finding approximations to the eigenstates of a Hamiltonian. One expands the wavefunction in some set of basis states, T (R) = N i=1 ai φi (R). Following calculation of the Hamiltonian and overlap = φi |Hˆ |φj and Sij = φi |φj , the two-sided eigenprobmatrix elements, Hij lem j Hij aj = E j Sij aj may be solved through standard diagonalization techniques. People have tried to do this in QMC directly,65 but it is apparent that the number of configurations used to evaluate the integrals converges slowly because of statistical noise in the matrix elements. As shown in Ref. 66, however, far fewer configurations are required if the diagonalization is first reformulated as a least-squares fit. Let us assume that the result of operating with Hˆ on any basis state φi is just some linear combination of all the functions φi (technically speaking, the set {φi } is then said to span an invariant subspace of Hˆ ). We may thus write (for all i) Hˆ φi (R) =

N

Aij φj (R)

(4.14)

j =1

To compute the required eigenstates and associated eigenvalues of Hˆ , we then simply diagonalize the Aij matrix. Within a Monte Carlo approach we could evaluate the φi (R) and Hˆ φi (R) for N uncorrelated configurations generated by a VMC calculation and solve the resulting set of linear equations for the Aij . For problems of interest, however, the assumption that the set {φi } spans an invariant subspace of Hˆ does not hold, and there exists no set of Aij that solves Eq. (4.14). If we took N configurations and solved the set of N linear equations, the values of Aij would depend on which configurations had been chosen. To overcome this problem, a number of configurations M N is sampled to obtain

DIFFUSION MONTE CARLO

137

an overdetermined set of equations which can be solved in a least-squares sense using the singular value decomposition technique. In Ref. 66 it is recommended that Eq. (4.14) be divided by T (R) so that in the limit of perfect sampling the scheme corresponds precisely to standard diagonalization. The method of Ref. 66 is pretty good for linear parameters. How might we generalize it for nonlinear parameters? The obvious way is to consider the basis of the initial trial wavefunction (φ0 = T ) and its derivatives with respect to the variable parameters, φi = ∂T /∂ai |a 0 . The simplest such algorithm is, in fact, i unstable, and this turns out to be because the implied first-order approximation is often not good enough. To overcome this problem, Umrigar et al. introduced a stabilized method67,68 that works well and is quite robust (the details need not concern us here). The VMC energies given by this method are usually slightly lower than those obtained from variance minimization. David Ceperley once asked: “How many graduate students’ lives have been lost optimizing wavefunctions?”69 That was in 1996. To give a more twentyfirst century feeling for the time scale involved in optimizing wavefunctions, I can tell you about the weekend a few years back when I added the entire G2-1 set70,71 to the examples included with the CASINO distribution. This is a standard set of 55 molecules with various experimentally well-characterized properties intended for benchmarking of different quantum chemistry methods (see, e.g., Ref. 72). Grossman has published the results of DMC calculations of these molecules using pseudopotentials,16 and we have now done the same with all-electron calculations.73,74 It took a little over three days using only a few single-processor workstations to create all 55 sets of example files from scratch, including optimizing the Jastrow factors for each molecule. Although if one concentrated very hard on each individual case, one might be able to pull a little more energy out of a VMC simulation, the optimized Jastrow factors were all good enough to be used as input to DMC simulations. The entire procedure of variance minimization can be, and in CASINO is, thoroughly automated, and provided that a systematic approach is adopted, optimizing VMC wavefunctions is not the complicated time-consuming business that it once was. This is certainly the case if one requires the optimized wavefunction only for input into a DMC calculation, in which case one need not be overly concerned with lowering the VMC energy as much as possible. I suggest that the process is sufficiently automated these days that graduate students are better employed elsewhere; certainly we have not suffered any fatalities here in Cambridge. 4.4 DIFFUSION MONTE CARLO

Let us imagine that we are ignorant, or have simply not been paying attention in our quantum mechanics class, and that we believe that the wavefunction of the hydrogen atom has the shape of a big cube centered on the nucleus. If we tried to calculate the expectation value of the Hamiltonian using VMC, we would obtain an energy that was substantially in error. What DMC does, in essence, is to automatically correct the shape of the guessed square box wavefunction so

138

QUANTUM MONTE CARLO

that it looks like the correct exponentially decaying one before calculating the expectation value. In principle it can do this even though our formula for the VMC wavefunction that we have spent so long justifying turns out not to have enough variational freedom to represent the true wavefunction. This is clearly a nice trick, particularly when—as is more usual—we have very little practical idea of what the exact many-electron wavefunction looks like. As one might expect, the DMC algorithm is necessarily rather more involved than that for VMC. I think that an approachable way of understanding it is to focus on the properties of quantum mechanical propagators, so we begin by reminding ourselves about these. Let’s say that we wish to integrate the time-dependent Schr¨odinger equation, i

2 2 ∂(R, t) =− ∇ (R, t) + V (R, t)(R, t) = Hˆ (R, t) ∂t 2m

(4.15)

where R = {r1 , r2 , . . . , rN }, V is the potential energy operator, and ∇ = (∇1 , ∇2 , . . . , ∇N ) is the 3N -dimensional gradient operator. Integrating this is equivalent to wanting a formula for , and to find this, we must invert this differential equation. The result is an integral equation involving the propagator K: (R, t) =

K(R, t; R , t )(R , t ) dR

(4.16)

The propagator is interpreted as the probability amplitude for a particle to travel from one place to another (in this case, from R to R) in a given time t − t . It is a Green’s function for the Schr¨odinger equation. We see that the probability amplitude for a particle to be at R sometime in the future is given by the probability amplitude of it traveling there from R —which is just K(R, t; R , t )—weighted by the probability amplitude of it actually starting at R in the first place—which is (R , t )—summed over all possible starting points R . This is a straightforward concept. How might we calculate the propagator? A typical way might be to use the Feynman path-integral method. For given start and end points R and R, one gets the overall amplitude by summing the contributions of the infinite number of all possible “histories” or paths that include those points. It doesn’t matter why for the moment (look it up!), but the amplitude contributed by a particular history is proportional to eiScl / where Scl is the classical action of that history (i.e., the time integral of the classical Lagrangian 12 mv 2 − V along the corresponding phasespace path of the system). The full expression for the propagator in Feynman’s method may then be written as K F (R, t; R , t ) = N

all paths

exp

t i Lcl (t

) dt

t

(4.17)

DIFFUSION MONTE CARLO

139

An alternative way to calculate the propagator is to use the de Broglie–Bohm pilot-wave interpretation of quantum mechanics,52 where the electrons both objectively exist and have the obvious definite trajectories derived from a straightforward analysis of the streamlines of the quantum mechanical probability current. From this perspective we find that we can achieve precisely the same result as that obtained using the Feynman method, by integrating the quantum Lagrangian Lq (t) = 12 mv 2 − (V + Q) along precisely one path—the path that the electron actually follows—as opposed to linearly superposing amplitudes obtained from the classical Lagrangian associated with the infinite number of all possible paths. Here Q is the quantum potential , which is the potential energy function of the quantum force (the force that the wave field exerts on the electrons). It is easy to show that the equivalent pilot-wave propagator is

t 1 i

exp Lq (t ) dt K (R, t; R , t ) = J (t)12 t B

(4.18)

where J is a simple Jacobian factor. This formula should be contrasted with Eq. (4.17). One should also note that because de Broglie–Bohm trajectories do not cross, one need not sum over all possible starting points R to compute (R, t)—one simply uses the R that the unique trajectory passes through. What is the connection of all this with the diffusion Monte Carlo method? Well, in DMC an arbitrary starting wavefunction is evolved using a (Green’s function) propagator just like the ones we have been discussing. The main difference is that the propagation occurs in imaginary time τ = it as opposed to real time t. For reasons that will shortly become apparent, this has the effect of “improving” the wavefunction (i.e., making it look more like the ground state as imaginary time passes). For technical reasons, it also turns out that the propagation has to take place in a sequence of very short hops in imaginary time, so our evolution equation now looks like this: (R, τ + δτ) =

K DMC (R, R , δτ)(R , τ) dR

(4.19)

The evolving wavefunction is not represented in terms of a basis set of known analytic functions but by the distribution in space and time of randomly diffusing electron positions over an ensemble of copies of the system (“configurations”). In other words, the DMC method is a stochastic projector method whose purpose is to evolve or project out the solution to the imaginary-time Schr¨odinger equation from an arbitrary starting state. We shall write this equation—which is simply what you get by taking the regular time-dependent equation and substituting τ for the time variable it —in atomic units as −

1 ∂ DMC (R, τ) = − ∇ 2 (R, τ) + (V (R) − ET )(R, τ) ∂τ 2

(4.20)

140

QUANTUM MONTE CARLO

Here the real variable τ measures the progress in imaginary time, and for purposes to be revealed presently, I have included a constant ET , an energy offset to the zero of the potential which affects only the wavefunction normalization. How, then, does propagating our trial function in imaginary time “improve” it? For eigenstates, the general solution to the usual time-dependent Schr¨odinger ˆ equation is clearly φ(R, t) = φ(R, 0)e−i(H −ET )t . By definition, we may expand an arbitrary “guessed” (R, t) in terms of a complete set of these eigenfunctions of the Hamiltonian Hˆ : (R, t) =

∞

cn φn (R)e−i(En −ET )t

(4.21)

n=0

On substituting it with imaginary time τ the oscillatory time dependence of the complex exponential phase factors becomes an exponential decay: (R, τ) =

∞

cn φn (R)e−(En −ET )τ

(4.22)

n=0

Let us assume that our initial guess for the wavefunction is not orthogonal to the ground state (i.e., c0 = 0). Then if we magically choose the constant ET to be the ground-state eigenvalue E0 (or, in practice, keep very tight control of it through some type of feedback procedure), it is clear we should eventually get imaginary-time independence of the probability distribution, in the sense that as τ → ∞, our initial (R, 0) comes to look more and more like the stationary ground state φ0 (R) as the contribution of the excited-state eigenfunctions dies away: (R, τ) = c0 φ0 +

∞

cn φn (R)e−(En −E0 )τ

(4.23)

n=1

So now we know why we do this propagation. How, in practice, do we find an expression for the propagator K? Consider now the imaginary-time Schr¨odinger equation in two parts: 1 ∂(R, τ) = ∇ 2 (R, τ) ∂τ 2 ∂(R, τ) = −(V (R) − ET )(R, t) ∂τ

(4.24) (4.25)

These two formulas have the form of the usual diffusion equation and of a rate equation with a position-dependent rate constant, respectively. The appropriate propagator for the diffusion equation is well known; it is a 3N -dimensional Gaussian with variance δτ in each dimension. The propagator for the rate equation is also known; it gives a branching factor which can be interpreted as a positiondependent weight or stochastic survival probability for a member of an ensemble.

DIFFUSION MONTE CARLO

141

Multiplying the two together to get the following propagator for the imaginarytime Schr¨odinger equation is an approximation, the short-time approximation, valid only in the limit of small δτ (which is why we need to do the evolution as a sequence of short hops): K

DMC

1 |R − R |2 (R, R , δτ) = exp − (2πδτ)3N/2 2δτ

V (R) + V (R ) − 2ET exp −δτ 2

(4.26)

Let us then summarize with a simple example how the DMC algorithm works. If we interpret as a probability density, the diffusion equation ∂/∂τ = 12 ∇ 2 represents the movement of N diffusing particles. If we turn this around, we may decide to represent (x, τ) by an ensemble of such sets of particles. Each member of such an ensemble will be called a configuration. We interpret the full propagator K DMC (R, R , δτ) as the probability of a configuration moving from R to R in a time δτ. The branching factor in the propagator will generally be interpreted as a stochastic survival probability for a given configuration rather than as a simple weight, as the latter is prone to numerical instabilities. This means that the configuration population becomes dynamically variable; configurations that stray into regions of high V have a good chance of being killed (removed from the calculation); in low-V regions, configurations have a high probability of multiplying (i.e., they create copies of themselves, which then propagate independently). It is solely this branching or reweighting that “changes the shape of the wavefunction” as it evolves. So, as we have seen, after a sufficiently long period of imaginary-time evolution, all the excited states will decay away, leaving only the ground-state wavefunction, at which point the propagation may be continued to accumulate averages of interesting observables. As a simple example, consider Fig. 4.3. Here we make a deliberately bad guess that the ground-state wavefunction for a single electron in a harmonic potential well is a constant in the vicinity of the well and zero everywhere else. We begin with seven copies of the system or configurations in our ensemble; the electrons in this ensemble are initially randomly distributed according to the uniform probability distribution in the region where the trial function is finite. The particle distribution is then evolved in imaginary time according to the scheme developed above. The electrons are subsequently seen to become distributed according to the proper Gaussian shape of the exact ground-state wavefunction. It is evident from the figure that the change in shape is produced by the branching factor occasionally eliminating configurations in high-V regions and duplicating them in low-V regions. This “pure DMC” algorithm works very well in a single-particle system with a nicely behaved potential, as in the example. Unfortunately, it suffers from two very serious drawbacks which become evident in multiparticle systems with divergent Coulomb potentials.

142

QUANTUM MONTE CARLO

Fig. 4.3 Figure 4.3: Schematic illustration of the DMC algorithm for a single electron in a harmonic potential well, showing the evolution of the shape of the wavefunction due to propagation in imaginary time. (From Ref. 5, with permission. Copyright © 2001 by The American Physical Society.)

The first problem arises due to our assumption that is a probability distribution— necessarily positive everywhere—even though the antisymmetric nature of multiparticle fermionic wavefunctions means that it must have both positive and negative parts separated by a nodal surface, that is, a (3N − 1)-dimensional hypersurface on which it has the value zero. One might think that two separate populations of configurations with attached positive and negative weights might get around this problem (essentially, the well-known fermion sign problem), but in practice there is a severe signal-to-noise issue. It is possible to construct formally exact algorithms of this nature which overcome some of the worst practical problems,75 but to date all seem highly inefficient, with poor system-size scaling. The second problem is less fundamental but in practice very severe. The required rate of removing or duplicating configurations diverges when the

DIFFUSION MONTE CARLO

143

potential energy diverges (which occurs whenever two particles are coincident) due to the presence of V in the branching factor of Eq. (4.26). This leads to stability problems and poor statistical behavior. These problems may be dealt with at the cost of introducing the most important approximation in the DMC algorithm: the fixed-node approximation.76 We say, in effect, that particles may not cross the nodal surface of the trial wavefunction T ; that is, there is an infinite repulsive potential barrier on the nodes. This forces the DMC wavefunction to be zero on that hypersurface. If the nodes of the trial function coincide with the exact nodes, such an algorithm will give the exact ground-state energy (it is, of course, well known that the exact de Broglie–Bohm particle trajectories cannot pass through the nodal surface). If the trial function nodes do not coincide with the exact nodes, the DMC energy will be higher than the ground-state energy (but less than or equal to the VMC energy). The variational principle thus applies. To make such an algorithm efficient we must introduce importance sampling, and this is done in the following way. We require that the imaginary-time evolution produces the mixed distribution f = T rather than the pure distribution. Substituting this into the imaginary-time Schr¨odinger equation, Eq. (4.20), we obtain ∂f (R, τ) 1 = − ∇ 2 f (R, τ) + ∇ · [vD (R)f (R, τ)] + (EL (R) − ET )f (R, τ) ∂τ 2 (4.27) where vD (R) is the 3N -dimensional drift velocity vector, defined by −

∇T (R) T (R)

(4.28)

EL (R) = T−1 − 12 ∇ 2 + V (R) T

(4.29)

vD (R) = ∇ ln |T (R)| = and

is the usual local energy. The propagator from R to R for the importance sampled algorithm now looks like this: K DMC (R, R , δτ) =

(R − R − δτF (R ))2 1 exp − (2πδτ)3N/2 2δτ

δτ exp − (EL (R) + EL (R ) − 2ET ) 2

(4.30)

Because the nodal surface of is constrained to be that of T , their product f is positive everywhere and can now be properly interpreted as a probability distribution. The time evolution generates the distribution f = T , where is now the lowest-energy wavefunction with the same nodes as T . This solves

144

QUANTUM MONTE CARLO

the first of our two problems. The second problem of the poor statistical behavior due to the divergences in the potential energy is also solved because the term V (R) − ET in Eq. (4.20) has been replaced by EL (R) − ET in Eq. (4.27), which is much smoother. Indeed, if T was an exact eigenstate, EL (R) − ET would be independent of position in configuration space. Although we cannot in practice find the exact T , it is possible to eliminate the local energy divergences due to coincident particles by choosing a trial function that has the correct cusplike behavior at the relevant points in the configuration space.56 Note that this is all reflected in the branching factor of the new propagator of Eq. (4.30). The nodal surface partitions the configuration space into regions that we call nodal pockets. The fixed-node approximation implies that we are restricted to sampling only those nodal pockets that are occupied by the initial set of configurations, and this appears to introduce some kind of ergodicity concern, since at first sight it seems that we ought to sample every nodal pocket. This would be an impossible task in large systems. However, the tiling theorem for exact fermion ground states77,78 asserts that all nodal pockets are in fact equivalent and related by permutation symmetry; one need therefore only sample one of them. This theorem is intimately connected with the existence of a variational principle for the DMC ground-state energy.78 Other interesting investigations of properties of nodal surfaces have been published.79 – 81 A practical importance-sampled DMC simulation proceeds as follows. First we pick an ensemble of a few hundred configurations chosen from the distribution |T |2 using VMC and the standard Metropolis algorithm. This ensemble is then evolved according to the short-time approximation to the Green’s function of the importance-sampled imaginary-time Schr¨odinger equation [Eq. (4.27)], which involves repeated steps of biased diffusion followed by the deletion and/or duplication of configurations. The bias in the diffusion is caused by the drift vector arising out of the importance sampling, which directs the sampling toward parts of configuration space where |T | is large (i.e., it plays the role of an Einsteinian osmotic velocity). This drift step is always directed away from the node, and ∇T is in fact a normal vector of the nodal hypersurface. After a period of equilibration the excited-state contributions will have largely died out and the configurations start to trace out the probability distribution f (R)/ f (R) dR. We can then start to accumulate averages, in particular the DMC energy. Note that throughout this process the reference energy ET is varied to keep the configuration population under control through a specific feedback mechanism. The initial stages of a DMC simulation— for solid antiferromagnetic NiO crystal with 128 atoms per cell using unrestricted Hartree–Fock trial functions of the type discussed in Refs. 82 and 83—are shown in Fig. 4.4. The DMC energy is given by EDMC =

f (R)EL (R) dR ≈ EL (Ri ) i f (R) dR

(4.31)

DIFFUSION MONTE CARLO

145

1500 1400 1300 1200 POPULATION

1100 1000

0

500

1000

1500

–55.4 –55.5

Local energy (Ha) Reference energy Best estimate

–55.6 –55.7 –55.8 0

500

1000

1500

Number of moves

Fig. 4.4 (color online) DMC simulation of solid antiferromagnetic NiO. In the lower panel, the noisy black line is the local energy after each move, the smoother green line is the current best estimate of the DMC energy, and the red line is ET in Eq. (4.27), which is varied to control the population of configurations through a feedback mechanism. As the simulation equilibrates, the best estimate of the energy, initially equal to the VMC energy, decreases significantly, then approaches a constant, which is the final DMC energy. The upper panel shows the variation in the population of the ensemble during the simulation as walkers are created or destroyed.

This energy expression would be exact if the nodal surface of T were exact, and the fixed-node error is second order in the error in the T nodal surface (when a variational theorem exists78 ). The accuracy of the fixed-node approximation can be tested on small systems and normally leads to very satisfactory results. The trial wavefunction thus limits the final accuracy that can be obtained and it also controls the statistical efficiency of the algorithm. Like VMC, the DMC algorithm satisfies a zero-variance principle (i.e., the variance of the energy goes to zero as the trial wavefunction goes to an exact eigenstate). For other expectation values of operators that do not commute with the Hamiltonian, the DMC mixed estimator is biased and other techniques are required in order to sample the pure distribution.84 – 86 A final point: The necessity of using the fixed-node approximation suggests that the best way of optimizing wavefunctions would be to do it in DMC directly. The nodal surface could then in principle be optimized to the shape that minimizes the DMC energy. The backflow technique discussed in Section 4.5.1 has some bearing on the problem, but the usual procedure involving optimization of the energy or variance in VMC will not usually lead to the optimal nodes in the sense that the fixed-node DMC energy is minimal. The large number of parameters—up to a few hundred—in your typical Slater–Jastrow(-backflow)

146

QUANTUM MONTE CARLO

wavefunction means that direct variation of the parameters in DMC is too expensive (although this has been done, see, e.g., Refs. 87 and 88). Furthermore, we note that optimizing the energy in DMC is tricky for the nodal surface, as the contribution of the region near the nodes to the energy is small. More exotic ways of optimizing the nodes are still being actively developed.89,90

4.5 BITS AND PIECES 4.5.1 More About Wavefunctions, Orbitals, and Basis Sets

Single-determinant Slater–Jastrow wavefunctions often work very well in QMC calculations since the orbital part alone provides a pretty good description of the system. In the ground state of the carbon pseudoatom, for example, a single Hartree–Fock determinant retrieves about 98.2% of the total energy. The remaining 1.8%, which at the VMC level must be recovered by the Jastrow factor, is the correlation energy, and in this case it amounts to 2.7 eV—clearly important for an accurate description of chemical bonding. By definition a determinant of Hartree–Fock orbitals gives the lowest energy of all single-determinant wavefunctions, and DFT orbitals are often very similar to them. These orbitals are not optimal when a Jastrow factor is included, but it turns out that the Jastrow factor does not change the detailed structure of the optimal orbitals very much, and the changes are well described by a fairly smooth change to the orbitals. This can conveniently be included in the Jastrow factor itself. How, though, might we improve on the Hartree–Fock/DFT orbitals in the presence of the Jastrow factor? One might naturally consider optimizing the orbitals themselves. This has been done, for example, with the atomic orbitals of a neon atom by Drummond et al.,91 optimizing a parameterized function that is added to the self-consistent orbitals. This was found to be useful only in certain cases. In atoms one often sees an improvement in the VMC energy but not in DMC, indicating that the Hartree–Fock nodal surface is close to optimal even in the presence of a correlation function. Unfortunately, direct optimization of both the orbitals and the Jastrow factor cannot easily be done for large polyatomic systems because of the computational cost of optimizing large numbers of parameters, so it is difficult to know how far this observation extends to more complex systems. One technique that has been tried92,93 is to optimize the potential that generates the orbitals rather than the orbitals themselves. It was also suggested by Grossman and Mitas94 that another way to improve the orbitals over the Hartree–Fock form is to use a determinant of the natural orbitals, which diagonalize the one-electron density matrix. While the motivation here is that the convergence of configuration interaction expansions is improved by using natural orbitals instead of Hartree–Fock orbitals, it is not clear why this would work in QMC. The calculation of reasonably accurate natural orbitals costs a lot, and such an approach is therefore less attractive for large systems. It should be noted that all such techniques which move the nodal surface of the trial function (and hence potentially improve the DMC energy) make

BITS AND PIECES

147

wavefunction optimization with fixed configurations more difficult. The nodal surface deforms continuously as the parameters are changed, and in the course of this deformation the fixed set of electron positions of one of the configurations may end up being on the nodal surface. As the local energy Hˆ / diverges on the nodal surface, the unreweighted variance of the local energy of a fixed set of configurations also diverges, making it difficult to locate the global minimum of the variance. A discussion of what one might do about this can be found elsewhere.62 In some cases it is necessary to use multideterminant wavefunctions to preserve important symmetries of the true wavefunction. In other cases a single determinant may give the correct symmetry, but a significantly better wavefunction can be obtained by using a linear combination of a few determinants. Multideterminant wavefunctions have been used successfully in QMC studies of small molecules and even in periodic calculations such as the study of the neutral vacancy in diamond due to Hood et al.27 However, other studies have shown that although using multideterminant functions improves VMC, this sometimes does not extend to DMC, indicating that the nodal surface has not been improved.91 Of course, there is very little point in using methods that employ expansions of large numbers of determinants to generate QMC trial functions, not only because the use of methods that scale so badly as a preliminary calculation completely defeats the entire point of QMC, but because the medium- and short-range correlation which these expansions describe95,96 is dealt with directly and vastly more efficiently by the Jastrow factor. By far the most useful way to go beyond the Slater–Jastrow form is the backflow technique, to which we have already alluded. Backflow correlations were originally derived from a current conservation argument by Feynman97 and by Feynman and Cohen98 to provide a picture of the excitations in liquid 4 He and the effective mass of a 3 He impurity in 4 He. In a modern context they can also be derived from an imaginary-time evolution argument.99,100 In the simplest form of backflow trial function the electron coordinates ri appearing in the Slater determinants of Eq. (4.7) are replaced by quasiparticle coordinates, ri = ri +

N

η(rij )(ri − rj )

(4.32)

j =i

where rij = |ri − rj |. This is supposed to represent the characteristic flow pattern where the quantum fluid is “pushed out of the way” in front of a moving particle and fills in the space behind it. The optimal function η(rij ) may be determined variationally, and in so doing the nodal surface is shifted. Backflow thus represents another practical possibility for relaxing the constraints of the fixed-node approximation in DMC. Kwon et al.99,101 found that the introduction of backflow significantly lowered the VMC and DMC energies of the two- and three-dimensional uniform electron gas at high densities. The use of backflow has also been investigated for metallic hydrogen.102 For real polyatomic systems, a much more complicated inhomogeneous backflow function is required; the one

148

QUANTUM MONTE CARLO

developed in our group and implemented in the CASINO program by L´opez R´ıos103 has the following functional form: ↑

↓

BF (R) = eJ (R) det [ψi (ri + ξi (R))] det [ψi (rj + ξj (R))]

(4.33)

with the backflow displacement for electron i in a system of N electrons and Nn nuclei given by ξi =

N j =i

ηij rij +

Nion I

μiI riI +

Nion N j =i

jI

jI

(i rij + i riI )

(4.34)

I

Here ηij = η(rij ) is a function of electron–electron separation, μiI = μ(riI ) jI jI is a function of electron–ion separation, and i = (riI , rj I , rij ) and i = (riI , rj I , rij ). The functions η, μ, , and are parameterized using power expansions with optimizable coefficients.103 Now, of course, the use of backflow wavefunctions can significantly increase the cost of a QMC calculation. This is largely because every element of the Slater determinant has to be recomputed each time an electron is moved, whereas only a single column of the Slater determinant has to be updated after each move when the basic Slater–Jastrow wavefunction is used. The basic scaling of the algorithm with backflow (assuming localized orbitals and basis set) is thus N 3 rather than N 2 . Backflow functions also introduce more parameters into the trial wavefunction, making the optimization procedure more difficult and costly. However, the reduction in the variance normally observed with backflow greatly improves the statistical efficiency of QMC calculations in the sense that the number of moves required to obtain a fixed error in the energy is smaller. In our Ne-atom calculations,91 for example, it was observed that the computational cost per move in VMC and DMC increased by a factor of between 4 and 7, but overall the time taken to complete the calculation to a fixed error bar increased only by a factor of between 2 and 3. One interesting thing that we found is that energies obtained from VMC with backflow approached those of DMC without backflow. VMC with backflow may thus represent a useful level of theory since it is significantly less expensive than DMC (although the problem with obtaining accurate energy differences in VMC presumably remains). Finally, it should be noted that backflow is expected to improve the QMC estimates of all expectation values, not just the energy. We like it. We now move on to consider the issue of basis sets. The importance of using good-quality single-particle orbitals in building up the Slater determinants in the trial wavefunction is clear. The determinant part accounts for by far the most significant fraction of the variational energy. However, the evaluation of singleparticle orbitals and their first and second derivatives can sometimes take up more than half of the total computer time, and consideration must therefore be given to obtaining accurate orbitals that can be evaluated rapidly at arbitrary points in space. It is not difficult to see that the most critical thing is to expand

BITS AND PIECES

149

the single-particle orbitals in a basis set of localized functions. This ensures that beyond a certain system size, only a fixed number of the localized functions will give a significant contribution to a particular orbital at a particular point. The cost of evaluating the orbitals does not then increase rapidly with the size of the system. Note that localized basis functions can (1) be strictly zero beyond a certain radius, or (2) can decrease monotonically and be prescreened before the calculation starts, so that only those functions that could be significant in a particular region are considered for evaluation. An alternative procedure is to tabulate the orbitals and their derivatives on a grid, and this is feasible for small systems such as atoms, but for periodic solids or larger molecules the storage requirements quickly become enormous. This is an important consideration when using parallel computers, as it is much more efficient to store the single-particle orbitals on every node. Historically, a very large proportion of condensed matter electronic structure theorists have used plane-wave basis sets in their DFT calculations. However, in QMC, plane-wave expansions are normally extremely inefficient because they are not localized in real space; every basis function contributes at every point, and the number of functions required increases linearly with system size. Only if there is a short repeat length in the problem are plane waves not totally unreasonable. Note that this does not mean that all plane-wave DFT codes (such as CASTEP,104 ABINIT,105 and PWSCF106 ) are useless for generating trial wavefunctions for CASINO; a postprocessing utility can be used to reexpand a function expanded in plane waves in another localized basis before the wavefunction is read into CASINO. The usual thing here is to use some form of localized spline functions on a grid such as “blip” functions.107,108 Another reasonable way to do this is to expand the orbitals in a basis of Gaussian-type functions. These are localized, relatively quick to evaluate, and are available from a wide range of sophisticated software packages. Such a large expertise has been built up within the quantum chemistry community with Gaussians that there is significant resistance to using any other type of basis. A great many Gaussian-based packages have been developed by quantum chemists for treating molecules. The best known of these are probably the various versions of the GAUSSIAN software.3 In addition to the regular single-determinant methods, these codes implement various techniques involving multideterminant correlated wavefunctions and are flexible tools for developing accurate molecular trial wavefunctions. For systems with periodic boundary conditions, the Gaussian basis set program CRYSTAL109 turns out to be very useful; it can perform all-electron or pseudopotential Hartree–Fock and DFT calculations both for molecules and for systems periodic in one, two, or three dimensions. For some systems, Slater basis sets may be useful in QMC (since they provide a more compact representation than Gaussians, and hence more rapidly calculable orbitals).74 To this end, we have implemented an interface to the program ADF.110 There is one more issue we must consider that is relevant to all basis sets but is particular to the case of Gaussian-type functions. This has to do with cusp conditions. At a nucleus the exact wavefunction has a cusp so that the divergence

150

QUANTUM MONTE CARLO

in the potential energy is canceled by an equal and opposite divergence in the kinetic energy. Therefore, if this cusp is represented accurately in the QMC trial wavefunction, the fluctuations in the local energy will be greatly reduced. It is relatively easy to produce an accurate representation of this cusp when using a grid-based numerical representation of the orbitals. However, as we have already remarked, such representations cannot really be used for large polyatomic systems because of the excessive storage requirements, and we would prefer to use a Gaussian basis set. But then there can be no cusp in the wavefunction since Gaussians have zero gradient at r = 0. The local energy thus diverges at the nucleus. In practice, one finds that the local energy has wild oscillations close to the nucleus, which can lead to numerical instabilities in DMC calculations. To solve this problem we can make small corrections to the single-particle orbitals close to the nuclei, which impose the correct cusp behavior; these need to be applied at each nucleus for every orbital which is larger than a given tolerance at that nucleus. The scheme we developed to correct for this is outlined elsewhere.73 Generalizations of this method have been developed for other basis set types. To see the cusp corrections in action, let us first look at a hydrogen atom where the basis set has been made to model the cusp very closely by using very sharp Gaussians with high exponents. Visually (top left in Fig. 4.5), the fact that the orbital does not obey the cusp condition is not immediately apparent. If we zoom in on the region close to the nucleus (top right), we see the problem; the black line is the orbital expanded in Gaussians and the red line is the cusp-corrected orbital. The effect on the gradient and local energy is clearly significant. This scheme has been implemented within the CASINO code for both finite and periodic systems, and produces a significant reduction in the computer time required to achieve a specified error bar, as one can appreciate from looking at the bottom two panels in Fig. 4.5, which show the local energy as a function of move number for a carbon monoxide molecule with and without cusp corrections. The problem with electron–nucleus cusps is clearly more significant for atoms of higher atomic number. To understand how this helps to do all-electron DMC calculations for heavier atoms, and to understand how the necessary computer time scales with atomic number, we performed calculations for various noble gas atoms.64 By ensuring that the electron–nucleus cusps were represented accurately, it proved perfectly possible to produce converged DMC energies with acceptably small error bars for atoms up to xenon (Z = 54). 4.5.2 Pseudopotentials

Well, “perfectly possible,” I said. Possible, maybe, but definitely somewhat tiresome. On trying to do all-electron calculations for heavier atoms than xenon, we were quickly forced to stop when smoke was observed coming out of the side of the computer.111 Might it therefore be better to do heavy atoms using pseudopotentials, as is commonly done with other methods, such as DFT? In electronic structure calculations pseudopotentials or effective core potentials are used to remove the inert core electrons from the problem and to improve the computational efficiency. Although QMC scales very favorably with system size

151

BITS AND PIECES

Orbital

Orbital

0.5

0.56

0.4 0.3 0.2

0.55

0.1 0–2

–1

0

1

2

0.54 –0.02

–0.01

0

0.01

0.02

0.6 x-gradients

0.4

0

0.2

–100

0 –200

–0.2

–300

–0.4

Local –0.02

–0.01

0

0.01

0.02

–0.02

0

0

–200

–200

–400

–400

–600

–600

–0.01

Energy

0 r (Å)

0.01

0.02

Local energy

–800

0

5000 10000 15000 Number of moves

20000–800 0

5000 10000 15000 Number of moves

20000

Fig. 4.5 (color online) The top two rows show the effect of Gaussian basis set cusp corrections in the hydrogen atom (red straight-line segments corrected; black lines not corrected). The bottom row shows local energy as a function of move number in a VMC calculation for a carbon monoxide molecule with a standard reasonably good Gaussian basis set. The cusp corrections are imposed only in the figure on the right. The reduction in the local energy fluctuations with the new scheme is clearly apparent.

in general, it has been estimated63 that the scaling of all-electron calculations with the atomic number Z is approximately Z 5.5 , which in the relatively recent past was generally considered to rule out applications to atoms with Z greater than about 10. Our paper64 pushing all-electron QMC calculations to Z = 54 was therefore a significant step. The use of a pseudopotential then serves to reduce the effective value of Z and to improve the scaling to Z 3.5 . Although errors are inevitably introduced, the gain in computational efficiency is easily sufficient to make pseudopotentials preferable in heavier atoms. They also offer a simple way to incorporate approximate relativistic corrections.

152

QUANTUM MONTE CARLO

Accurate pseudopotentials for single-particle theories such as DFT or Hartree–Fock theory are well developed, but pseudopotentials for correlated wavefunction techniques such as QMC present additional challenges. The presence of core electrons causes two related problems. The first is that the shorter length-scale variations in the wavefunction near a nucleus of large Z require the use of a small time step. In VMC this problem can, at least in principle, be somewhat reduced by the use of acceleration schemes.112,113 The second problem is that the fluctuations in the local energy tend to be large near the nucleus because both the kinetic and potential energies are large. The central idea of pseudopotential theory is to create an effective potential that reproduces the effects of both the nucleus and the core electrons on the valence electrons. This is done separately for each of the different angular momentum states, so the pseudopotential contains angular momentum projectors and is therefore a nonlocal operator. It is convenient to divide the pseudopotential ps for each atom into a local part Vloc (r) common to all angular momenta and a corps rection, Vnl,l (r), for each angular momentum l. The electron–ion potential energy term in the full many-electron Hamiltonian of the atom then takes the form ps ps Vˆnl,i Vloc + Vˆnl = Vloc (ri ) + (4.35) i

i

where Vˆnl,i is a nonlocal operator that acts on an arbitrary function g(ri ) as follows: ps

ps Vˆnl,i g(ri ) =

ps Vnl,l (ri )

l

l

Ylm (ri )

∗ Ylm (ri )g(ri ) d i

(4.36)

m=−l

where the angular integration is over the sphere passing through the ri . This expression can be simplified by choosing the z-axis along ri , noting that Ylm (0, 0) = 0 for m = 0, and using the definition of the spherical harmonics to give ps 2l + 1 ps ˆ Vnl,l (ri ) (4.37) Vnl,i g(ri ) = Pl [cos(θ i )]g(ri ) d i 4π l

where Pl denotes a Legendre polynomial. While the use of nonlocal pseudopentials is relatively straightforward in a VMC calculation,115,116 there is an issue with DMC. The fixed-node boundary condition turns out not to be compatible with the nonlocality. This forces us to introduce an additional approximation (the locality approximation 117 ) whereby the nonlocal pseudopotential operator Vˆnl acts on the trial function rather than the DMC wavefunction; that is, we replace Vˆnl by T−1 Vˆnl T . The leading-order error term is proportional to (T − 0 )2 , where 0 is the exact fixed-node groundstate wavefunction.117 Unfortunately, this error may be positive or negative, so the method is no longer strictly variational. An alternative to this approximation

BITS AND PIECES

153

is the semilocalization scheme for DMC nonlocal pseudopotentials introduced by Casula et al. in 2005118,119 ; as well as restoring the variational property, this method appears to have better numerical stability than the older scheme. It is not currently possible to construct pseudopotentials for heavy atoms entirely within a QMC framework, although progress in this direction was made by Acioli and Ceperley.114 It is therefore currently necessary to use pseudopotentials generated within some other framework. Possible schemes include Hartree–Fock theory and local DFT, where there is a great deal of experience in generating accurate pseudopotentials. There is evidence to show that Hartree–Fock pseudopotentials give better results within QMC calculations than DFT pseudopotentials,120 although the latter work quite well in many cases. The problem with DFT pseudopotentials appears to be that they already include a (local) description of correlation which is quite different from the QMC description. Hartree–Fock theory, on the other hand, does not contain any effects of correlation. The QMC calculation puts back the valence–valence correlations but neglects core–core correlations (which have only an indirect and small effect on the valence electrons) and core–valence correlations. Core–valence correlations are significant when the core is highly polarizable, such as in alkali-metal atoms. The core–valence correlations may be approximately included by using a core polarization potential (CPP), which represents the polarization of the core due to the instantaneous positions of the surrounding electrons and ions. Another issue is that relativistic effects are important for heavy elements. It is still, however, possible to use a QMC method for solving the Schr¨odinger equation with the scalar relativistic effects obtained within the Dirac formalism incorporated within the pseudopotentials. The combination of Dirac–Hartree–Fock pseudopotentials and CPPs appears to work well in many QMC calculations. CPPs have been generated for a wide range of elements (see, e.g., Ref. 121). Many Hartree–Fock pseudopotentials are available in the literature, mostly in the form of sets of parameters for fits to Gaussian basis sets. Unfortunately, many of them diverge at the origin and it well known that this can lead to significant time step errors in DMC calculations.120 It was thus apparent a few years ago that none of the available sets were ideal for QMC calculations, and it was decided that it would be helpful if we generated an online periodic table of smooth nondivergent Hartree–Fock pseudopotentials (with relativistic corrections) developed specifically for QMC. This project has now been completed and has been described in detail by Trail and Needs.122,123 The resulting pseudopotentials are available online124 ; the repository includes both Dirac–Fock and Hartree–Fock potentials, and a choice of small or large core potentials (the latter being more amenable to plane-wave calculations). Burkatzki et al. have since developed another set of pseudopotentials, also intended for use in QMC calculations.125 Although data are limited, tests126,127 appear to show that the Trail–Needs pseudopotentials give essentially the same results as the Burkatzki pseudopotentials, although the smaller core radii of the former appear to lead to a slight increase in efficiency.

154

QUANTUM MONTE CARLO

4.5.3 Periodic Systems

As with other methods, QMC calculations for extended systems may be performed using finite clusters or infinitely large crystals with periodic boundary conditions. The latter are generally preferred because they approximate the desired large-size limit (i.e., the infinite system size without periodic boundary conditions) more closely. One can also use the standard supercell approach for aperiodic systems such as point defects. For such cases, cells containing a point defect and a small part of the host crystal are repeated periodically throughout space; the supercell must clearly be made large enough so the interactions between defects in different cells are negligible. In periodic DFT calculations the charge density and potentials are taken to have the periodicity of a suitably chosen lattice. The single-particle orbitals can then be made to obey Bloch’s theorem, and the results for the infinite system are obtained by summing quantities obtained from the different Bloch wave vectors within the first Brillouin zone. The situation with many-particle wavefunctions is rather different, since it is not possible to reduce the problem to solving within a primitive unit cell. Such a reduction is allowed in single-particle methods because the Hamiltonian is invariant under the translation of a single electronic coordinate by a translation vector of the primitive lattice, but this is not a symmetry of the many-body Hamiltonian.129,128 Consequently, QMC calculations must be performed at a single k-point. This normally gives a poor approximation to the result for the infinite system, unless one chooses a pretty large nonprimitive simulation cell. One may also average over the results of QMC calculations done at different single k-points.130 There are also a number of problems associated with the long-range Coulomb interaction in many-body techniques such as QMC. It is well known that simply summing the 1/r interaction out over cells on the surface of an ever-expanding cluster never settles down because of the contribution from shape-dependent arrangements of surface charge. The usual solution to this problem is to employ the Ewald method.131 The Ewald interaction contains an effective depolarization field intended to cancel the field produced by the surface charges (and is thus equivalent to what you get if you put the large cluster in a medium of infinite dielectric constant). Long-range interactions also induce long-range exchangecorrelation interactions, and if the simulation cell is not large enough, these effects are described incorrectly. Such effects are absent in local DFT calculations because the interaction energy is written in terms of the electronic charge density, but Hartree–Fock calculations show very strong effects of this kind, and various ways to accelerate the convergence have been developed. The finitesize effects arising from the long-range interaction can be divided into potential and kinetic energy contributions.132,133 The potential energy component can be removed from the calculations by replacing the Ewald interaction by the model periodic Coulomb (MPC) interaction.134 – 136 Recent work has added substantially to our understanding of finite-size effects, and theoretical expressions have been derived for them,132,133 but at the moment it seems that they cannot entirely

BITS AND PIECES

155

replace extrapolation procedures. An alternative approach to estimating finitesize errors in QMC calculations has been developed recently.137 DMC results for the three-dimensional homogeneous electron gas are used to obtain a systemsize-dependent local-density approximation functional. The correction to the total energy is given by the difference between the DFT energies for finite-sized and infinite systems. This approach is interesting, although it does rely on the LDA giving a reasonable description of the system. As will be shown later, DMC calculations using periodic boundary conditions with thousands of atoms per cell have now been done, and the technology is clearly approaching maturity. 4.5.4 Differences, Derivatives, and Forces

Calculations in computational electronic structure theory almost always involve the evaluation of differences in energy, and all methods that work in complex systems rely for their accuracy on the cancellation of errors in such energy differences. Apart from the statistical errors, all known errors in DMC have the same sign and partially cancel out in the subtraction because the method is variational. That said, incomplete cancellation of nodal errors is the most important source of error in DMC results, even though DMC often retrieves 95% or more of the correlation energy. Correlated sampling138 is one way of improving computation of the energy difference between two similar systems with a smaller statistical error than those obtained for the individual energies. This is relatively straightforward in VMC, and a version of it was described briefly in Section 4.3 when discussing variance minimization. As well as simple differences, we would quite often like to calculate derivatives. Many quantities of physical interest can be formulated as an energy derivative, and thus an ability to calculate them accurately in QMC considerably enhances the scope of the method. Normally, of course, this sort of thing would be encountered in the calculation of forces on atoms, but if we expand the energy in a Taylor series in a perturbation such as the strength of an applied electric field, for example, the coefficients of the first- and second-order terms, respectively, give the dipole moment and the various elements of the dipole polarizability tensor:

2 3 1 ∂E ∂ E + Fi Fj + · · · (4.38) E(Fi ) = E(0) + Fi ∂Fi Fi =0 2 ∂Fi Fj Fi =0,Fj =0 j =1 dipole moment

dipole polarizability tensor

One may also calculate the dipole moment (no surprise) by evaluating the expectation value of the dipole-moment operator. However, since the operator doesn’t commute with the Hamiltonian, there will be a significant error using the mixed distribution in DMC—you need to use the pure distribution using future walking84,85 or whatever. This is a significant extra complication, and by formulating the thing as a derivative, you avoid having to do that. As well as the electric field, the perturbation could be the displacement of nuclear positions

156

QUANTUM MONTE CARLO

(giving forces, etc.) or a combination of both (e.g., the intensity of peaks in infrared spectra depends on changes in the dipole moment corresponding to changes in geometry). Such energy derivatives can, of course, be computed numerically (by finite differencing) or analytically (by differentiating the appropriate energy expressions), the latter being clearly preferable in this case. First, we focus on atomic forces. These are generally used in three main areas of computational electronic structure theory: structural optimization, the computation of vibrational properties, and in explicit molecular dynamics simulations of atomic behavior.139 Unfortunately, methods for calculating accurate forces in QMC in a reasonable amount of computer time have proved elusive, at least until relatively recently, due to the lack of readily calculable expressions with reasonable statistical properties. As usual, we begin with a discussion of the Hellmann–Feynman theorem (HFT), which in this context is the statement that the force is the expectation value of the gradient of the Hamiltonian Hˆ : ∇ Hˆ dR F = −∇E = − (4.39) dR The other terms in the expression for the gradient of the expectation value of the energy (the ones involving derivatives of the wavefunction itself) have disappeared only because we are assuming that the wavefunction is an exact eigenstate. Inevitably, then, the use of the HFT is an approximation in QMC because we have only an inexact trial function. The correct QMC expressions for the forces must contain additional (“Pulay”) terms, which depend on wavefunction derivatives. There is also an additional term which accounts for the action of the gradient operator on parameters which couple only indirectly with the nuclear positions (e.g., orbital coefficients), but this can be greatly reduced by optimizing the wavefunction through minimization of the energy rather than the variance. There is another type of Pulay term which arises in DMC. The HFT is expected to be valid for the exact DMC algorithm since it solves for the ground state of the fixed-node Hamiltonian exactly. However, this Hamiltonian differs from the physical one due to the presence of the infinite potential barrier on the trial nodal surface, which constrains the DMC wavefunction φ0 to go to zero there. As we vary the nuclear position(s), the nodal surface moves, and hence the infinite potential barrier moves, giving a contribution to ∇ Hˆ that depends on both T and its first derivative.140 – 142 To calculate the Pulay terms arising from the derivative of the mixed estimator of Eq. (4.31), we need in principle to calculate a derivative of the DMC wavefunction φ0 . Because we don’t have any kind of formula for φ0 , this derivative cannot be readily evaluated, and what has been done in the past is to use the expression for the derivative of the trial function T in its place.142 – 150 The resulting errors are of first order in (T − φ0 ) and (T − φ 0 ); therefore, its accuracy depends sensitively on the quality of the trial function and its derivative.

APPLICATIONS

157

In practice the results obtained from this procedure are not generally accurate enough. Instead of using the usual mixed DMC energy expression, one may calculate forces from the “pure DMC” energy given by ED = φ0 Hˆ φ0 dR/ φ0 φ0 dR, which, by construction, is equal to the mixed DMC energy. It is more expensive to do things this way, but the benefits are now clear. Despite the fact that the derivative ED contains the derivative of the DMC wavefunction, φ 0 , Badinski et al.142 were able to show that φ 0 can be eliminated from the pure DMC formula to give the following exact expression (where dS is a nodal surface element): −1 ˆ φ0 φ0 φ0 H φ0 dR φ0 φ0 T−2 |∇R T |T dS 1

− (4.40) ED = 2 φ0 φ0 dR φ0 φ0 dR Of course it is not easy to compute integrals over the nodal surface, and luckily, the expression can be converted into a regular volume integral with no φ 0 . The error in the required approximation is then of order (T − φ0 )2 , giving −1 ˆ

ˆ φ0 φ0 [φ−1 0 H φ0 + T (H − ED )T ] dR

ED = φ0 φ0 dR T T (EL − ED )T−1 T dR + O[(T − φ0 )2 ] (4.41) + T T dR One may readily evaluate this expression by generating configurations distributed according to the pure (φ20 ) and variational (T2 ) distributions. The approximation is in the Pulay terms, which are smaller in pure than in mixed DMC, and in addition, the approximation in equation (4.41) is second order, in contrast to the first-order error obtained by simply substituting T for φ 0 . This equation satisfies the zero-variance condition; if T and T are exact, the variance of the force obtained from this formula is zero (the variance of the Hellman–Feynman estimator is, strictly speaking, infinite!). Although it remains true that not many papers have been published with actual applications of these methods (some calculations of very accurate forces in small molecules can be found, e.g., in Refs. 150 and 151), one can certainly say that reasonable approximations for the difficult expressions have been found and that the outlook for QMC forces is very promising. 4.6 APPLICATIONS

Time and space preclude me from presenting a long list of applications. Here are two: (1) a somewhat unfair comparison of the worst DFT functional with VMC

158

QUANTUM MONTE CARLO

and DMC for some cohesive energies of tetrahedrally bonded semiconductors, and (2) the equations of state of diamond and iron. Many other applications can be found, for example, in Ref. 5. 4.6.1 Cohesive Energies

A number of VMC and DMC studies have been performed on the cohesive energies of solids. This quantity is given by the difference between the summed energies of the appropriate isolated atoms and the energies of the same atoms in the bulk crystal. This is generally reckoned to be a severe test of QMC methods because the trial wavefunctions used in the two cases must be closely matched in quality to maximize the effective cancellation of errors. Data for Si, Ge, C, and BN have been collected in Table 4.1. The local spin density approximation (LSDA) density functional theory data shows the standard overestimation of the cohesive energy, while the QMC data is in good agreement with experiment. Studies such as these have been important in establishing DMC as an accurate method for calculating the energies of crystalline solids. 4.6.2 Equations of State of Diamond and Iron

The equation of state is the equilibrium relationship between the pressure, volume, and temperature. Computed equations of state are of particular interest in regions where experimental data are difficult to obtain. Diamond anvil cells are

TABLE 4.1 Cohesive Energies of Tetrahedrally Bonded Semiconductors Calculated Within the LSDA, VMC, and DMC Methods and Compared with Experimental Valuesa Method

Si

Ge

C

BN

LSDA VMC

5.28b 4.38(4)d 4.82(7)f 4.48(1)h 4.63(2)h 4.62(8)b

4.59b 3.80(2)e —

8.61b 7.27(7)f 7.36(1)g

15.07c 12.85(9)c

3.85(2)e 3.85b

7.346(6)g 7.37b

DMC Expt.

12.9i

a The energies for Si, Ge, and C are quoted in eV per atom, while those for BN are in eV per two atoms. b From Ref. 152 and references therein. c From Ref. 153. d From Ref. 162. e From Ref. 128. f From Ref. 115. Zero-point energy corrections of 0.18 eV for C and 0.06 eV for Si have been added to the published values for consistency with the other data in the table. g From Ref. 27. h From Ref. 26. i From Ref. 154, estimated from experimental results on hexagonal BN.

APPLICATIONS

159

widely used in high-pressure research, and one of the important problems is the measurement of the pressure inside the cell. The most common approach is to place a small grain of ruby in the sample chamber and measure the frequency of a strong laser-stimulated fluorescence line. The resolution is, however, poor at pressures above about 100GPa, and alternative methods are being investigated. One possibility is to measure the Raman frequency of diamond itself, assuming that the highest frequency derives from the diamond faces adjacent to the sample chamber. Calibrating such a scale requires an accurate equation of state and the corresponding pressure dependence of the Raman frequency. Maezono et al. performed VMC, DMC, and DFT calculations of the equation of state of diamond.12 The DMC and DFT data are shown in Fig. 4.6, along with equations of state derived from experimental data.155,156 The experimentally derived equations of state differ significantly at high pressures. It is now believed that the pressure calibration in the more modern experiment of Occelli et al.156 is inaccurate, and our DMC data support this view. As can be seen in Fig. 4.6, the equations of state calculated within DFT depend on the choice of exchange-correlation functional, undermining confidence in the DFT method. A recent QMC study of the equation of state and Raman frequency of cubic boron nitride has produced data that could be used to calibrate pressure measurements in diamond anvil cells.157 Another example of a DMC equation of state was produced by Sola et al.,158 who calculated the equation of state of hexagonal close-packed (hcp) iron under Earth’s core conditions. With up to 150 atoms or 2400 electrons per

Pressure (GPa)

800 Expt (McSkimin & Andreatch) Expt (Occelli et al.) DFT-LDA DFT-PBE DMC

600

400

200 3

3.5 4 Volume per atom (Å3)

4.5

Fig. 4.6 (color online) Equation of state of diamond at high pressures from measurements by McSkimin and Andreatch155 and Occelli et al.,156 and as calculated using DFT with two different functionals and DMC.12 The shaded areas indicate the uncertainty in the experimental equations of state. The zero-point phonon pressure calculated using DFT with the PBE functional is included in the theoretical curves.

160

QUANTUM MONTE CARLO

Fig. 4.7 (color online) Pressure–volume curve in iron obtained from DMC calculations (solid line158 ). The small yellow error band above the DMC curve is due to the errors in the parameters of a fit to the Birch–Murnaghan equation of state. DFT-PW91 results (dotted line160 ) and experimental data (circles161 and open triangles159 ) are reported for comparison.

cell, these represent some of the largest systems studied with DMC to date and demonstrate the ability of QMC to treat heavier transition metal atoms. Figure 4.7 shows the calculated equation of state, which agrees closely with experiments and with previous DFT calculations. (DFT is expected to work well in this system and the DMC calculations appear to confirm this.) Notice the discontinuity due to the hcp–bcc (body-centered cubic) phase transition in the experimental values reported by Dewaele et al.159 At low pressures, the calculations and experiments differ because of the magnetism, which is not taken into account in these particular calculations (although it could be in principle). 4.7 CONCLUSIONS

Quite a lot of progress has been made in the theory and practical implementation of quantum Monte Carlo over the past few years, but certainly many interesting problems remain to be solved. For its most important purpose of calculating highly accurate total energies, the method works well and currently has no serious competitors for medium-sized and large systems. Our group has developed the software package CASINO,46 – 48 which has been designed to allow researchers to explore the potential of QMC in arbitrary molecules, polymers, slabs, and crystalline solids and in various model systems, including standard electron and electron–hole phases such as the homogeneous electron gas and Wigner crystals. Many young people also seem to believe that QMC is way cooler than boring old density functional theory, and they’re probably right. So that’s all right, then.

REFERENCES

161

Acknowledgments

M.D.T. would like to thank the Royal Society for the award of a long-term university research fellowship. He also wishes to acknowledge the many contributions of R.J. Needs, N.D. Drummond, and P. L´opez R´ıos to the work described in this chapter, along with all the other members of the Cavendish Laboratory TCM Group, plus our many collaborators around the world. Computing facilities were provided largely by the Cambridge High Performance Computing Service.

REFERENCES 1. Cramer, C. J. Essentials of Computational Chemistry, Wiley, Hoboken, NJ, 2002, pp. 191–232. 2. Parr, R. G.; Yang, W. Density Functional Theory of Atoms and Molecules, Oxford University Press, New York, 1994. 3. Frisch, M. J.; et al. Gaussian 09 , Gaussian Inc., Wallingford, CT, 2009. 4. Hammond, B. L.; Lester, W. A., Jr.; Reynolds, P. J. Monte Carlo Methods in Ab Initio Quantum Chemistry, World Scientific, Singapore, 1994. 5. Foulkes, W. M. C.; Mitas, L.; Needs, R. J.; Rajagopal, G. Rev. Mod. Phys. 2001, 73 , 33. 6. Ceperley, D. M.; Alder, B. J. Phys. Rev. Lett. 1980, 45 , 566. 7. Vosko, S. H.; Wilk, L.; Nusair, M. Can. J. Phys. 1980, 58 , 1200. 8. Perdew, J. P.; Zunger, A. Phys. Rev. B 1981, 23 , 5048. 9. Wu, Y. S. M.; Kuppermann, A.; Anderson, J. B. Phys. Chem. Chem. Phys. 1999, 1 , 929. 10. Natoli, V.; Martin, R. M.; Ceperley, D. M. Phys. Rev. Lett. 1993, 70 , 1952. 11. Delaney, K. T.; Pierleoni, C.; Ceperley, D. M. Phys. Rev. Lett. 2006, 97 , 235702. 12. Maezono, R.; Ma, A.; Towler, M. D.; Needs, R. J. Phys. Rev. Lett. 2007, 98 , 025701. 13. Pozzo, M.; Alf`e, D. Phys. Rev. B 2008, 77 , 104103. 14. Alf`e, D.; Alfredsson, M.; Brodholt, J.; Gillan, M. J.; Towler, M. D.; Needs, R. J. Phys. Rev. B 2005, 72 , 014114. 15. Manten, S.; L¨uchow, A. J. Chem. Phys. 2001, 115 , 5362. 16. Grossman, J. C. J. Chem. Phys. 2002, 117 , 1434. 17. Aspuru-Guzik, A.; El Akramine, O.; Grossman, J. C.; Lester, W. A., Jr. J. Chem. Phys. 2004, 120 , 3049. 18. Gurtubay, I. G.; Drummond, N. D.; Towler, M. D.; Needs, R. J. J. Chem. Phys. 2006, 124 , 024318. 19. Gurtubay, I. G.; Needs, R. J. J. Chem. Phys. 2007, 127 , 124306. 20. Hood, R. Q.; Chou, M.-Y.; Williamson, A. J.; Rajagopal, G.; Needs, R. J.; Foulkes, W. M. C. Phys. Rev. Lett. 1997, 78 , 3350. 21. Hood, R. Q.; Chou, M.-Y.; Williamson, A. J.; Rajagopal, G.; Needs, R. J. Phys. Rev. B 1998, 57 , 8972. 22. Nekovee, M.; Foulkes, W. M. C.; Needs, R. J. Phys. Rev. Lett. 2001, 87 , 036401. 23. Nekovee, M.; Foulkes, W. M. C.; Needs, R. J. Phys. Rev. B 2003, 68 , 235108.

162

QUANTUM MONTE CARLO

24. Williamson, A. J.; Grossman, J. C.; Hood, R. Q.; Puzder, A.; Galli, G. Phys. Rev. Lett. 2002, 89 , 196803. 25. Drummond, N. D.; Williamson, A. J.; Needs, R. J.; Galli, G. Phys. Rev. Lett. 2005, 95 , 096801. 26. Leung, W.-K.; Needs, R. J.; Rajagopal, G.; Itoh, S.; Ihara, S. Phys. Rev. Lett. 1999, 83 , 2351. 27. Hood, R. Q.; Kent, P. R. C.; Needs, R. J.; Briddon, P. R. Phys. Rev. Lett. 2003, 91 , 076403. 28. Alf`e, D.; Gillan, M. J. Phys. Rev. B 2005, 71 , 220101. 29. Towler, M. D.; Needs, R. J. Int. J. Mod. Phys. B 2003, 17 , 5425. 30. Wagner, L. K.; Mitas, L. Chem. Phys. Lett. 2003, 370 , 412. 31. Wagner, L. K.; Mitas, L. J. Chem. Phys. 2007, 126 , 034105. 32. Mitas, L.; Martin, R. M. Phys. Rev. Lett. 1994, 72 , 2438. 33. Williamson, A. J.; Hood, R. Q.; Needs, R. J.; Rajagopal, G. Phys. Rev. B 1998, 57 , 12140. 34. Towler, M. D.; Hood, R. Q.; Needs, R. J. Phys. Rev. B 2000, 62 , 2330. 35. Ghosal, A.; Guclu, A. D.; Umrigar, C. J.; Ullmo, D.; Baranger, H. Nature Phys. 2006, 2 , 336. 36. Healy, S. B.; Filippi, C.; Kratzer, P.; Penev, E.; Scheffler, M. Phys. Rev. Lett. 2001, 87 , 016105. 37. Filippi, C.; Healy, S. B.; Kratzer, P.; Pehlke, E.; Scheffler, M. Phys. Rev. Lett. 2002, 89 , 166102. 38. Kim, Y.-H.; Zhao, Y.; Williamson, A.; Heben, M. J.; Zhang, S. Phys. Rev. Lett. 2006, 96 , 016102. 39. Carlson, J.; Chang, S.-Y.; Pandharipande, V. R.; Schmidt, K. E. Phys. Rev. Lett. 2003, 91 , 050401. 40. Astrakharchik, G. E.; Boronat, J.; Casulleras, J.; Giorgini, S. Phys. Rev. Lett. 2004, 93 , 200404. 41. Carlson, J.; Reddy, S. Phys. Rev. Lett. 2008, 100 , 150403. 42. Schr¨odinger, E. Ann. Phys. 1926, 79 , 361. 43. Ashcroft, N. W.; Mermin, N. D. Solid State Physics, W. B. Saunders, Philadelphia, 1976, p. 330. 44. Kent, P. R. C., Towler, M. D.; Needs, R. J.; Rajagopal, G. Phys. Rev. B 2000, 62 , 15394. 45. http://www.qmcwiki.org/index.php/Research_resources. 46. Needs, R. J.; Towler, M. D.; Drummond, N. D.; L´opez R´ıos, P. CASINO Version 2.5 User Manual , Cambridge University, Cambridge, UK, 2009. 47. CASINO Web site: http://www.tcm.phy.cam.ac.uk/∼mdt26/casino2.html. 48. http://www.vallico.net/tti/tti.html. Click on “PUBLIC EVENTS.” 49. Trail, J. R. Phys. Rev. E 2008, 77 , 016703. 50. Trail, J. R. Phys. Rev. E 2008, 77 , 016704. 51. Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. M.; Teller, E. J. Chem. Phys. 1953, 21 , 1087.

REFERENCES

163

52. Towler, M. D. De Broglie-Bohm pilot-wave theory and the foundations of quantum mechanics. Graduate lecture course, available at http://www.tcm. phy.cam.ac.uk/∼mdt26/pilot_waves.html, 2009. 53. Jastrow, R. J. Phys. Rev . 1955, 98 , 1479. 54. Drummond, N. D.; Towler, M. D.; Needs, R. J. Phys. Rev. B 2004, 70 , 235119. 55. Aragon, S. Density Functional Theory: A Primer , San Francisco State University teaching material, available at www.wag.caltech.edu/PASI/lectures/SFSUElectronicStructure-Lect-6.doc. 56. Kato, T. Commun. Pure Appl. Math. 1957, 10 , 151. 57. de Palo, S.; Rapisarda, F.; Senatore, G. Phys. Rev. Lett. 2002, 88 , 206401. 58. L´opez R´ıos, P.; Needs, R. J. Unpublished. 59. Dennis, J. E.; Gay, D. M.; Welsch, R. E. ACM Trans. Math. Software 1981, 7 , 369. 60. Umrigar, C. J.; Wilson, K. G.; Wilkins, J. W. Phys. Rev. Lett. 1988, 60 , 1719. 61. Kent, P. R. C.; Needs, R. J.; Rajagopal, G. Phys. Rev. B 1999, 59 , 12344. 62. Drummond, N. D.; Needs, R. J. Phys. Rev. B 2005, 72 , 085124. 63. Ceperley, D. M. J. Stat. Phys. 1986, 43 , 815. 64. Ma, A.; Drummond, N. D.; Towler, M. D.; Needs, R. J. Phys. Rev. E 2005, 71 , 066704. 65. Riley, K. E.; Anderson, J. B. Mol. Phys. 2003, 101 , 3129. 66. Nightingale, M. P.; Melik-Alaverdian, V. Phys. Rev. Lett. 2001, 87 , 043401. 67. Umrigar, C. J.; Toulouse, J.; Filippi, C.; Sorella, S.; Hennig, R. G. Phys. Rev. Lett. 2007, 98 , 110201. 68. Toulouse, J.; Umrigar, C. J. J. Chem. Phys. 2007, 126 , 084102. 69. Ceperley, D. M. Top-ten reasons why no-one uses quantum Monte Carlo, Ceperley group Web site, 1996; since removed. 70. Pople, J. A.; Head-Gordon, M.; Fox, D. J.; Raghavachari, K.; Curtiss, L. A. J. Chem. Phys. 1989, 90 , 5622. 71. Curtiss, L. A.; Jones, C.; Trucks, G. W.; Raghavachari, K.; Pople, J. A. J. Chem. Phys. 1990, 93 , 2537. 72. Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Pople, J. A. J. Chem. Phys. 1997, 106 , 1063. 73. Ma, A.; Drummond, N. D.; Towler, M. D.; Needs, R. J. J. Chem. Phys. 2005, 122 , 224322. 74. Nemec, N.; Towler, M. D.; Needs, R. J. J. Chem. Phys. 2010, 132 , 034111. 75. Kalos, M. H.; Colletti, L.; Pederiva, F. J. Low Temp. Phys. 2005, 138 , 747. 76. Anderson, J. B. J. Chem. Phys. 1975, 63 , 1499; Ibid., 1976, 65 , 4121. 77. Ceperley, D. M. J. Stat. Phys. 1991, 63 , 1237. 78. Foulkes, W. M. C.; Hood, R. Q.; Needs, R. J. Phys. Rev. B 1999, 60 , 4558. 79. Glauser, W.; Brown, W.; Lester, W.; Bressanini, D.; Hammond, B. J. Chem. Phys. 1992, 97 , 9200. 80. Bressanini, B.; Reynolds, P. J. Phys. Rev. Lett. 2005, 95 , 110201. 81. Bajdich, M.; Mitas, L.; Drobn´y, G.; Wagner, L. K. Phys. Rev. B 1999, 60 , 4558. 82. Towler, M. D.; Allan, N. L.; Harrison, N. M.; Saunders, V. R.; Mackrodt, W. C.; Apr`a, E. Phys. Rev. B 1994, 50 , 5041.

164

83. 84. 85. 86. 87. 88. 89. 90. 91. 92.

93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105.

106. 107. 108. 109.

110. 111.

QUANTUM MONTE CARLO

Needs, R. J.; Towler, M. D. Int. J. Mod. Phys. B 2003, 17 , 5425. Liu, S. K.; Kalos, M. H.; Chester, G. V. Phys. Rev. A 1974, 10 , 303. Barnett, R. N.; Reynolds, P. J.; Lester, W. A., Jr. J. Comput. Phys. 1991, 96 , 258. Baroni, S.; Moroni, S. Phys. Rev. Lett. 1999, 82 , 4745. Drummond, N. D.; Radnai, Z.; Trail, J. R.; Towler, M. D.; Needs, R. J. Phys. Rev. B 2004, 69 , 085116. Drummond, N. D.; Needs, R. J. Phys. Rev. Lett. 2009, 102 , 126402. L¨uchow, A.; Petz, R.; Scott, T. C. J. Chem. Phys. 2007, 126 , 144110. Reboredo, F. A.; Hood, R. Q.; Kent, P. R. C. Phys. Rev. B 2009, 79 , 195117. Drummond, N. D.; L´opez R´ıos, P.; Ma, A.; Trail, J. R.; Spink, G.; Towler, M. D.; Needs, R. J. J. Chem. Phys. 2006, 124 , 224104. Fahy, S. In Quantum Monte Carlo Methods in Physics and Chemistry, Nato Science Series C: Mathematical and Physical Sciences, Vol. 525, Nightingale, P., Umrigar, C. J., Eds., Kluwer Academic, Dordrecht, The Netherlands, 1999, p. 101. Filippi, C.; Fahy, S. J. Chem. Phys. 2000, 112 , 3523. Grossman, J. C.; Mitas, L. Phys. Rev. Lett. 1995, 74 , 1323. Kutzlnigg, W.; Morgan, J. D., III. J. Phys. Chem. 1992, 96 , 4484. Prendergast, D.; Nolan, M.; Filippi, C.; Fahy, S.; Greer, J. C. J. Chem. Phys. 2001, 115 , 1626. Feynman, R. P. Phys. Rev . 1954, 94 , 262. Feynman, R. P.; Cohen, M. Phys. Rev . 1956, 102 , 1189. Kwon, Y.; Ceperley, D. M.; Martin, R. M. Phys. Rev. B 1993, 48 , 12037. Holzmann, M.; Ceperley, D. M.; Pierleoni, C.; Esler, K. Phys. Rev. E 2003, 68 , 046707. Kwon, Y.; Ceperley, D. M.; Martin, R. M. Phys. Rev. B 1998, 58 , 6800. Pierleoni, C.; Ceperley, D. M.; Holzmann, M. Phys. Rev. Lett. 2004, 93 , 146402. L´opez R´ıos, P.; Ma, A.; Drummond, N. D.; Towler, M. D.; Needs, R. J. Phys. Rev. E 2006, 74 , 066701. Segall, M. D.; Lindan, P. L. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C. J. Phys. Condens. Matter 2002, 14 , 2717. Gonze, X.; Beuken, J.-M.; Caracas, R.; Detraux, F.; Fuchs, M.; Rignanese, G.-M.; Sindic, L.; Verstraete, M.; Zerah, G.; Jollet, F.; Torrent, M.; Roy, A.; Mikami, M.; Ghosez, Ph.; Raty, J.-Y.; Allan, D. C. Comput. Mater. Sci . 2002, 25 , 478. Baroni, S.; Dal Corso, A.; de Gironcoli, S.; Giannozzi, P. http://www.pwscf.org. Hernandez, E.; Gillan, M. J.; Goringe, C. M. Phys. Rev. B 1997, 55 , 13485. Alf`e, D.; Gillan, M. J. Phys. Rev. B 2004, 70 , 161101. Dovesi, R.; Saunders, V. R.; Roetti, C.; Orlando, R.; Zicovich-Wilson, C. M.; Pascale, F.; Civalleri, B.; Doll, K.; Harrison, N. M.; Bush, I. J.; D’Arco, Ph.; Llunell, M. CRYSTAL06 User’s Manual , University of Torino, Torino, Italy, 2006. te Velde, G.; Bickelhaupt, F. M.; Baerends, E. J.; Guerra, C. F.; van Gisbergen, S. J. A.; Snijders, J. G.; Ziegler, T. J. Comput. Chem. 2001, 22 , 931. This practice has recently been outlawed in our department by new university antismoking legislation. My thanks to an anonymous referee for supplying me with this joke.

REFERENCES

112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138.

139. 140. 141. 142. 143.

165

Umrigar, C. J. Phys. Rev. Lett. 1993, 71 , 408. Stedman, M. L.; Foulkes, W. M. C.; Nekovee, M. J. Chem. Phys. 1998, 109 , 2630. Acioli, P. H.; Ceperley, D. M. J. Chem. Phys. 1994, 100 , 8169. Fahy, S.; Wang, X. W.; Louie, S. G. Phys. Rev. B 1990, 42 , 3503. Fahy, S.; Wang, X. W.; Louie, S. G. Phys. Rev. Lett. 1998, 61 , 1631. Mitas, L.; Shirley, E. L.; Ceperley, D. M. J. Chem. Phys. 1991, 95 , 3467. Casula, M.; Filippi, C.; Sorella, S. Phys. Rev. Lett. 2005, 95 , 100201. Casula, M. Phys. Rev. B 2006, 74 , 161102. Greeff, C. W.; Lester, W. A., Jr. J. Chem. Phys. 1998, 109 , 1607. Shirley, E. L.; Martin, R. M. Phys. Rev. B 1993, 47 , 15413. Trail, J. R.; Needs, R. J. J. Chem. Phys. 2005, 122 , 174109. Trail, J. R.; Needs, R. J. J. Chem. Phys. 2005, 122 , 014112. http://www.tcm.phy.cam.ac.uk/∼mdt26/casino2_pseudopotentials.html. Burkatzki, M.; Filippi, C.; Dolg, M. J. Chem. Phys. 2007, 126 , 234105; ibid., 2008, 129 , 164115. Trail, J. R.; Needs, R. J. J. Chem. Phys. 2008, 128 , 204103. Santra, B.; Michaelides, A.; Fuchs, M.; Tkatchenko, A.; Filippi, C.; Scheffler, M. J. Chem. Phys. 2008, 129 , 194111. Rajagopal, G.; Needs, R. J.; James, A. J.; Kenny, S. D.; Foulkes, W. M. C. Phys. Rev. B 1995, 51 , 10591. Rajagopal, G.; Needs, R. J.; Kenny, S. D.; Foulkes, W. M. C.; James, A. J. Phys. Rev. Lett. 1994, 73 , 1959. Lin, C.; Zong, F. H.; Ceperley, D. M. Phys. Rev. E 2001, 64 , 016702. Ewald, P. P. Ann. Phys. 1921, 64 , 25. Chiesa, S.; Ceperley, D. M.; Martin, R. M.; Holzmann, M. Phys. Rev. Lett. 2006, 97 , 076404. Drummond, N. D.; Needs, R. J.; Sorouri, A.; Foulkes, W. M. C. Phys. Rev. B 2008, 78 , 125106. Fraser, L. M.; Foulkes, W. M. C.; Rajagopal, G.; Needs, R. J.; Kenny, S. D.; Williamson, A. J. Phys. Rev. B 1996, 53 , 1814. Williamson, A. J.; Rajagopal, G.; Needs, R. J.; Fraser, L. M.; Foulkes, W. M. C.; Wang, Y.; Chou, M.-Y. Phys. Rev. B 1997, 55 , R4851. Kent, P. R. C.; Hood, R. Q.; Williamson, A. J.; Needs, R. J.; Foulkes, W. M. C.; Rajagopal, G. Phys. Rev. B 1999, 59 , 1917. Kwee, H.; Zhang, S.; Krakauer, H. Phys. Rev. Lett. 2008, 100 , 126404. Dewing, M.; Ceperley, D. M. Methods for coupled electronic–ionic Monte Carlo. In Recent Advances in Quantum Monte Carlo Methods, Part II, Lester, W. A., Rothstein, S. M., and Tanaka, S., Eds., World Scientific, Singapore, 2002. Grossman, J. C.; Mitas, L. Phys. Rev. Lett. 2005, 94 , 056403. Huang, K. C.; Needs, R. J.; Rajagopal, G. J. Chem. Phys. 2000, 112 , 4419. Schautz, F.; Flad, H.-J. J. Chem. Phys. 2000, 112 , 4421. Badinski, A.; Haynes, P. D.; Needs, R. J. Phys. Rev. B 2008, 77 , 085111. Reynolds, P. J.; Barnett, R. N.; Hammond, B. L.; Grimes, R. M.; Lester, W. A., Jr. Int. J. Quantum Chem. 1986, 29 , 589.

166

144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163.

QUANTUM MONTE CARLO

Assaraf, R.; Caffarel, M. Phys. Rev. Lett. 1999, 83 , 4682. Casalegno, M.; Mella, M.; Rappe, A. M. J. Chem. Phys. 2003, 118 , 7193. Assaraf, R.; Caffarel, M. J. Chem. Phys. 2003, 119 , 10536. Lee, M. W.; Mella, M.; Rappe, A. M. J. Chem. Phys. 2005, 122 , 244103. Badinski, A.; Needs, R. J. Phys. Rev. E 2007, 76 , 036707. Badinski, A.; Needs, R. J. Phys. Rev. B 2008, 78 , 035134. Badinski, A.; Trail, J. R.; Needs, R. J. J. Chem. Phys. 2008, 129 , 224101. Badinski, A.; Haynes, P. D.; Trail, J. R.; Needs, R. J. J. Phys. Condens. Matter 2010, 22 , 074202. Farid, B.; Needs, R. J. Phys. Rev. B 1992, 45 , 1067. Malatesta, A.; Fahy, S.; Bachelet, G. B. Phys. Rev. B 1997, 56 , 12201. Knittle, E.; Wentzcovitch, R.; Jeanloz, R.; Cohen, M. L. Nature 1989, 337 , 349. McSkimin, H. J.; Andreatch, P. J. Appl. Phys. 1972, 43 , 2944. Occelli, F.; Loubeyre, P.; LeToullec, R. Nature Mater. 2003, 2 , 151. Esler, K. P.; Cohen, R. E.; Militzer, B.; Kim, J.; Needs, R. J.; Towler, M. D. Phys. Rev. Lett. 2010, 104 , 185702. Sola, E.; Brodholt, J. P.; Alf`e, D. Phys. Rev. B 2009, 79 , 024107. Dewaele, A.; Loubeyre, P.; Occelli, F.; Mezouar, M.; Dorogokupets, P. I.; Torrent, M. Phys. Rev. Lett. 2006, 97 , 215504. S¨oderlind, P.; Moriarty, J. A.; Wills, J. M. Phys. Rev. B 1996, 53 , 14063. Mao, K.; Wu, Y.; Chen, L. C.; Shu, J. F. J. Geophys. Res. 1990, 95 , 21737. Li, X.-P.; Ceperley, D. M.; Martin, R. M. Phys. Rev. B 1991, 44 , 10929. Towler, M.D.; Russell, N.J.; Valentini, A. arXiv 2011, 1103.1589v1 [quant-ph].

5

Coupled-Cluster Calculations for Large Molecular and Extended Systems KAROL KOWALSKI William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington

JEFF R. HAMMOND The University of Chicago, Chicago, Illinois

WIBE A. de JONG, PENG-DONG FAN, MARAT VALIEV, DUNYOU WANG, and NIRANJAN GOVIND William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington

The ever-increasing power of modern computer systems is advancing many areas of computational chemistry and allowing one to study significantly larger systems with extremely accurate quantum chemistry methods. This has been made possible, in part, by the developments of highly scalable implementations of core quantum chemistry methodologies. In particular, there has been significant progress in the parallel implementations of coupled-cluster (CC) methods, which has become a method of choice for studying complex chemical processes that require accurate treatment of the electron correlation. In this chapter we outline the various CC formalisms available in NWChem and discuss the parallel implementation of these methods in our code. Performance issues, system-size limitations, and the accuracies that can be achieved with these calculations are also discussed. Representative examples from two key domains of CC theory (excited-state formalism and linear response studies) are reviewed and the possibilities of coupling CC methods with different multiscale approaches are highlighted.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

167

168

COUPLED-CLUSTER CALCULATIONS FOR LARGE SYSTEMS

5.1 INTRODUCTION

Many aspects of computational chemistry require accuracies that can only be achieved by highly accurate computational methods that account appropriately for the instantaneous interactions or correlations between electrons in molecules.1 Including these electronic correlation effects is necessary to be able to compare theory and experiment in a precise manner. Even though these correlation effects contribute less than 1% of the total energy, they are fundamental to an understanding of the electronic structure of various systems and in the development of predictive models. For this reason these methods have become an integral part of many computational chemistry packages. Among the many methods that describe correlation effects systematically, the coupled-cluster (CC) formalism2,3 has evolved into a widely used and very accurate method for solving the electronic Schr¨odinger equation. Compared with other formalisms, such as perturbative methods or approaches based on the linear expansion of the wavefunction (e.g., configuration interaction methods), the main advantage of CC methods lies in the fact that the correlation effects are elegantly captured in the exponential form of the wavefunction. A simple consequence of this ansatz is the size extensivity of the resulting energies or, equivalently, proper scaling of the energy with the number of electrons. Although the CC method was initially proposed in nuclear physics,4,5 it was quickly adopted by quantum chemists, and since the late 1960s there has been steady development that has spawned a variety of CC methodologies. In the last decade this formalism has been “rediscovered” by the nuclear physics community.6 – 8 This clearly demonstrates the universal applicability of the method across a wide energy scale. Despite these successes, the inherent numerical cost of CC methods, which grows rapidly with system size, significantly hampers the wide applicability of this formalism. This difficulty may be overcome through the use of massively parallel computer systems and highly scalable CC implementations. The parallel implementations available in quantum chemistry programs such as ACES II MAB,9 ACES III,10,11 PQS,12 – 15 MOLPRO,16 GAMESS(US),17 – 19 and NWChem implementations20 – 24 are excellent examples of recent developments. In this chapter we demonstrate the capabilities and review the parallel CC implementation in NWChem. We refer the reader to other papers listed above for discussions on other implementations. The rest of this chapter is organized as follows. An overview of CC theory for ground or excited states and CC linear response theory is given in Section 5.2. The details of our parallel CC implementation are described in Section 5.3. In Section 5.4 we present various groundand excited-state examples and studies involving coupling CC methodologies with multiphysics approaches. 5.2 THEORY

The details of the CC formalism have been discussed in many review articles.1,25 – 27 For the purpose of this chapter we present only the most

THEORY

169

important approaches within the single reference formulation, where the CC ground-state wavefunction |0 is represented in the form of the exponential Ansatz, |0 = eT |

(5.1)

where the reference function | is usually chosen as a Hartree–Fock (HF) determinant and the cluster operator T is represented as T =

N

Ti

(5.2)

i=1

where N refers to the total number of correlated electrons. Each component Tn takes the form in + + Tn = tai11··· (5.3) ··· an Xa1 · · · Xan Xin · · · Xi1 i1 A One can find that the GEBF-HF energy differs from the conventional HF energy by less than 1 mHa (see Table 7.A2). It should be mentioned that other properties can be calculated similarly as a linear combination of corresponding properties of all subsystems.

REFERENCES

TABLE 7.A3

NPA Charges of All Atoms Used in the GEBF Approach

Atom Element 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

255

C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C H

Charge −0.478580 −0.306510 −0.312180 −0.307940 −0.306800 −0.306840 −0.306660 −0.306660 −0.306660 −0.306660 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306670 −0.306660 −0.306660 −0.306660 −0.306660 −0.306840 −0.306800 −0.307940 −0.312180 −0.306510 −0.478580 0.159400

Atom Element 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H

Charge 0.152910 0.151950 0.151950 0.165180 0.159400 0.159400 0.154180 0.154180 0.152640 0.152640 0.152960 0.152960 0.152910 0.152910 0.152550 0.152550 0.151880 0.151880 0.151910 0.151910 0.151920 0.151920 0.151920 0.151920 0.151940 0.151940 0.151940 0.151940 0.151950 0.151950 0.151950 0.151950 0.151950

Atom Element 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98

H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H

Charge 0.151950 0.151950 0.151950 0.151950 0.151950 0.151950 0.151950 0.151950 0.151950 0.151940 0.151940 0.151940 0.151940 0.151920 0.151920 0.151920 0.151920 0.151910 0.151910 0.151880 0.151880 0.152550 0.152550 0.152910 0.152960 0.152960 0.152640 0.152640 0.154180 0.154180 0.165180 0.159400

REFERENCES 1. Alsenoy, C. V.; Yu, C.-H.; Peeters, A.; Martin, J. M. L.; Sch¨afer, L. J. Phys. Chem. A 1998, 102 , 2246. 2. Scuseria, G. E. J. Phys. Chem. A 1999, 103 , 4782. 3. Inaba, T.; Tahara, S.; Nisikawa, N.; Kashiwagi, H.; Sato, F. J. Comput. Chem. 2005, 26 , 987. 4. Xu, H.; Ma, J.; Chen, X.; Hu, Z.; Huo, K.; Chen, Y. J. Phys. Chem. B 2004, 108 , 4024.

256

THE ENERGY-BASED FRAGMENTATION APPROACH

5. Gao, B.; Jiang, J.; Liu, K.; Wu, Z.; Lu, W.; Luo, Y. J. Comput. Chem. 2008, 29 , 434. 6. Brothers, E. N.; Izmaylov, A. F.; Scuseria, G. E. J. Phys. Chem. C 2008, 112 , 1396. 7. Strout, D. L.; Scuseria, G. E. J. Chem. Phys. 1995, 102 , 8448. 8. Strain, M. C.; Scuseria, G. E.; Frisch, M. J. Science 1996, 271 , 51. 9. White, C. A.; Head-Gordon, M. J. Chem. Phys. 1994, 101 , 6593. 10. Schwegler, E.; Challacombe, M. J. Chem. Phys. 1996, 105 , 2726. 11. Ochsenfeld, C.; White, C. A.; Head-Gordon, M. J. Chem. Phys. 1998, 109 , 1663. 12. Burant, J. C.; Strain, M. C.; Scuseria, G. E.; Frisch, M. J. Chem. Phys. Lett. 1996, 248 , 43. 13. Kudin, K. N.; Scuseria, G. E. Phys. Rev. B 2000, 61 , 16440. 14. Stratmann, R. E.; Scuseria, G. E.; Frisch, M. J. Chem. Phys. Lett. 1996, 257 , 213. 15. Millam, J. M.; Scuseria, G. E. J. Chem. Phys. 1997, 106 , 5569. 16. Li, X.; Millam, J. M.; Scuseria, G. E.; Frisch, M. J.; Schlegel, H. B. J. Chem. Phys. 2003, 119 , 7651. 17. Lecszsynski, J. Computational Chemistry: Review of Current Trends, World Scientific, Singapore, 2002. 18. Pulay, P. Chem. Phys. Lett. 1983, 100 , 151. 19. Saebø, S.; Pulay, P. Annu. Rev. Phys. Chem. 1993, 44 , 213. 20. Hampel, C.; Werner, H.-J. J. Chem. Phys. 1996, 104 , 6286. 21. Sch¨utz, M.; Hetzer, G.; Werner, H.-J. J. Chem. Phys. 1999, 111 , 5691. 22. Sch¨utz, M.; Werner, H.-J. J. Chem. Phys. 2001, 114 , 661. 23. Werner, H.-J.; Manby, F. R.; Knowles, P. J. J. Chem. Phys. 2003, 118 , 8149. 24. Ayala, P. Y.; Scuseria, G. E. J. Chem. Phys. 1999, 110 , 3660. 25. Scuseria, G. E.; Ayala, P. Y. J. Chem. Phys. 1999, 111 , 8330. 26. Ayala, P. Y.; Kudin, K. N.; Scuseria, G. E. J. Chem. Phys. 2001, 115 , 9698. 27. Alml¨of, J. Chem. Phys. Lett. 1991, 181 , 319. 28. Head-Gordon, M.; Maslen, P. E.; White, C. A. J. Chem. Phys. 1998, 108 , 616. 29. Nakao, Y.; Hirao, K. J. Chem. Phys. 2004, 120 , 6375. 30. Christiansen, O.; Manninen, P.; Jørgensen, P.; Olsen, J. J. Chem. Phys. 2006, 124 , 084103 31. F¨orner, W.; Ladik, J.; Otto, P.; E´ızˇ ek, J. Chem. Phys. 1985, 97 , 251. 32. Li, S.; Ma, J.; Jiang, Y. J. Comput. Chem. 2002, 23 , 237. 33. Li, S.; Shen, J.; Li, W.; Jiang, Y. J. Chem. Phys. 2006, 125 , 074109. 34. Saebø, S.; Baker, J.; Wolinski, K.; Pulay, P. J. Chem. Phys. 2004, 120 , 11423. 35. Azhary, A. E.; Rauhut, G.; Pulay, P.; Werner, H.-J. J. Chem. Phys. 1998, 108 , 5185. 36. Rauhut, G.; Werner, H.-J. Phys. Chem. Chem. Phys. 2001, 3 , 4853. 37. Sch¨utz, M.; Werner, H.-J.; Lindh, R.; Manby, F. R. J. Chem. Phys. 2004, 121 , 737. 38. Yang, W. Phys. Rev. Lett. 1991, 66 , 1438. 39. Yang, W.; Lee, T.-S. J. Chem. Phys. 1995, 103 , 5674. 40. Exner, T. E.; Mezey, P. G. J. Phys. Chem. A 2004, 108 , 4301. 41. He, X.; Zhang, J. Z. H. J. Chem. Phys. 2005, 122 , 031103. 42. Chen, X.; Zhang, Y.; Zhang, J. Z. H. J. Chem. Phys. 2005, 122 , 184105.

REFERENCES

257

43. Chen, X.; Zhang, J. Z. H. J. Chem. Phys. 2006, 125 , 044903. 44. Li, W.; Li, S. J. Chem. Phys. 2005, 122 , 194109 45. Gu, F. L.; Aoki, Y.; Korchowiec, J.; Imamura, A.; Kirtman, B. J. Chem. Phys. 2004, 121 , 10385. 46. Kitaura, K.; Ikeo, E.; Asada, T.; Nakano, T.; Uebayasi, M. Chem. Phys. Lett. 1999, 313 , 701. 47. Fedorov, D. G.; Kitaura, K. J. Chem. Phys. 2004, 120 , 6832. 48. Fedorov, D. G.; Ishida, T.; Uebayasi, M.; Kitaura, K. J. Phys. Chem. A 2007, 111 , 2722. 49. Fedorov, D. G.; Kitaura, K. J. Phys. Chem. A 2007, 111 , 6904. 50. Morita, S.; Sakai, S. J. Comput. Chem. 2001, 22 , 1107. 51. Sakai, S.; Morita, S. J. Phys. Chem. A 2005, 109 , 8424. 52. Hirata, S.; Valiev, M.; Dupuis, M.; Xantheas, S. S.; Sugiki, S.; Sekino, H. Mol. Phys. 2005, 103 , 2255. 53. Li, W.; Li, S. J. Chem. Phys. 2004, 121 , 6649. 54. Li, S.; Li, W.; Fang, T. J. Am. Chem. Soc. 2005 127 , 7215. 55. Deev, V.; Collins, M. A. J. Chem. Phys. 2005, 122 , 154102. 56. Collins, M. A.; Deev, V. A. J. Chem. Phys. 2006, 125 104104. 57. Bettens, R. P. A.; Lee, A. M. J. Phys. Chem. A 2006, 110 , 8777. 58. Lee, A. M.; Bettens, R. P. A. J. Phys. Chem. A 2007, 111 , 5111. 59. Jiang, N.; Ma, J.; Jiang, Y. J. Chem. Phys. 2006, 124 , 114112. 60. Li, W.; Fang, T.; Li, S. J. Chem. Phys. 2006, 124 154102. 61. Ganesh, V.; Dongare, R. K.; Balanarayan, P.; Gadre, S. R. J. Chem. Phys. 2006, 125 , 104109. 62. Rahalkar, A. P.; Ganesh, V.; Gadre, S. R. J. Chem. Phys. 2008, 129 , 234101. 63. Dahlke, E. E.; Truhlar, D. G. J. Chem. Theory Comput. 2007, 3 , 46. 64. Dahlke, E. E.; Truhlar, D. G. J. Chem. Theory Comput. 2007, 3 , 1342. 65. Li, W.; Li, S.; Jiang, Y. J. Phys. Chem. A 2007, 111 , 2193. 66. Hua, W.; Fang, T.; Li, W.; Yu, J.-G.; Li, S. J. Phys. Chem. A 2008, 112 , 10864. 67. Li, S.; Li, W. Annu. Rep. Prog. Chem. Sect. C 2008, 104 , 256. 68. Li, W.; Dong, H.; Li, S. Progress in Theoretical Chemistry Physics, Vol. 18, Frontiers in Quantum Systems in Chemistry Physics, Wilson, S., Grout, P. J., Maruani, J., Delgado-Barrio, G., and Piecuch, P., Eds., Springer-Verlag, Berlin, 2008, pp. 289–299. 69. Zhang, D. W.; Zhang, J. Z. H. J. Chem. Phys. 2003, 119 , 3599. 70. Zhang, D. W.; Xiang, Y.; Zhang, J. Z. H. J. Phys. Chem. B 2003, 107 , 12039. 71. Gadre, S. R.; Shirsat, R. N.; Limaye, A. C. J. Phys. Chem. 1994, 98 , 9165. 72. Pulay, P. Adv. Chem. Phys. 1987, 69 , 241. 73. Amos, R. D.; Rice, J. E. Comput. Phys. Rep. 1989, 10 , 147. 74. The criterion for hydrogen bonds X − H · · · Y in our calculations is rH···Y ≤ ˚ ∠X − H · · · Y ≥ 120◦ . ˚ rX···Y ≤ 3.5A 2.9A 75. Foster, J. P.; Weinhold, F. J. Am. Chem. Soc. 1980, 102 , 7211. 76. Reed, A. E.; Weinstock, R. B.; Weinhold, F. J. Chem. Phys. 1985, 83 , 735.

258

THE ENERGY-BASED FRAGMENTATION APPROACH

77. Hurst, J. B.; Dupuis, M.; Clementi, E. J. Chem. Phys. 1989, 89 , 385. 78. Kamada, K.; Ueda, M.; Nagao, H.; Tawa, K.; Sugino, T.; Shmizu, Y.; Ohta, K. J. Phys. Chem. A 2000, 104 , 4723. 79. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28 , 235. 80. Case, D. A.; Cheatham, T. E., III; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M., Jr.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. J. Comput. Chem. 2005, 26 , 1668. 81. Ponder, J. W. Tinker Software Tools for Molecular Design, 4.2 ed., http://dasher.wustl.edu/tinker, 2004. 82. Jørgensen, W. L.; Chandrasekhar, J; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79 , 926. 83. http://www.pci.tu-bs.de/agbauerecker/Sigurd/WaterClusterDatabase/. 84. Li, S.; Li, W.; Fang, T.; Ma, J.; Jiang, Y. LSQC Program, version 1.1 , Nanjing University, Nanjing, China, 2006. 85. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; et al. In Gaussian 03, Revision D.01 , Gaussian, Inc., Wallingford, CT, 2004. 86. Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S. J.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J. Comput. Chem. 1993, 14 , 1347. 87. Li, W.; Piecuch, P.; Gour, J. R.; Li, S. J. Chem. Phys. 2009, 131 , 114109. 88. Frechet, D.; Guitton, J. D.; Herman, F.; Faucher, D.; Helynck, G.; du Sorbier, B. M.; Ridoux, J. P.; James-Surcouf, E.; Vuilhorgne, M. Biochemistry 1994, 33 , 42. 89. Farkas, O.; Schlegel, H. B. J. Chem. Phys. 1999, 111 , 10806. 90. Schlegel, H. B. J. Comput. Chem. 1982, 3 , 214. 91. Pulay, P.; Fogarasi, G. J. Chem. Phys. 1992, 96 , 2856. 92. Leach, A. R. Molecular Modelling: Principles and Applications, Addison Wesley Longman, London, 1996. 93. Structures available at http://itcc.nju.edu.cn/itcc/shuhua/Mol/. 94. http://www-unix.mcs.anl.gov/mpi/.

8

MNDO-like Semiempirical Molecular Orbital Theory and Its Application to Large Systems TIMOTHY CLARK Computer-Chemie-Centrum, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany

JAMES J. P. STEWART Stewart Computational Chemistry, Colorado Springs, Colorado

In this chapter we describe modern MNDO-like semiempirical theory and its application either to very large molecules or to a very large number of smaller ones. We use the term MNDO-like to describe methods that use variations of the original MNDO1 and MNDO/d2 – 6 techniques. This covers essentially all commonly used techniques, which all use the original multipole formulation for the two-electron integrals, and many of the original MNDO approximations. We first outline the theory of LCAO-SCF methods in general, followed by a more detailed discussion of the neglect of diatomic differential overlap (NDDO) approximation and the MNDO technique. We discuss individual Hamiltonians and their parameterization and describe the strengths of these remarkably powerful methods and their application to large systems.

8.1 BASIC THEORY 8.1.1 LCAO-SCF Theory

The two approximations linear combination of atomic orbitals (LCAO) and selfconsistent field (SCF) form the core of modern (MNDO-like) semiempirical molecular orbital theory. They have been described in many standard textbooks Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

259

260

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

but are important for understanding MNDO-like techniques and so are outlined briefly here. We can write the Hamiltonian for a molecule that consists of M nuclei and N electrons as N 1

H =

i=1

2

∇i2 +

M A=1

N N M M N M 1 ZA 1 ZA ZB ∇A2 − + + 2MA RAi r RAB > ij > i=1 j

i=1 A=1

i

A=1 B

A

(8.1) where the indices i and j run over the electrons and A and B over the nuclei. The individual terms that make up the Hamiltonian are defined in Table 8.1. We make use of the Born–Oppenheimer approximation,7 which in turn uses the fact that the nuclei move so much more slowly than the electrons that the former can, in effect, be regarded as being stationary. This reduces the kinetic energy of the nuclei to zero and makes the nucleus–nucleus repulsion term a constant, so that they can be neglected in the electronic Hamiltonian: H = Hnuclear + Helectronic = Hnuclear +

N 1 i=1

2

∇i2 −

N N M N ZA 1 + RAi r > ij i=1 A=1

i=1 j

i

(8.2) TABLE 8.1

Definitions of the Individual Terms in Eq. (8.1)

Term

Definition

Variables

Kinetic energy of the electrons

∇i = the first derivative of the position of electron i with respect to time (its velocity)

Kinetic energy of the nuclei (zero within the Born–Oppenheimer approximation)

∇A = the first derivative of the position of nucleus A with respect to time (its velocity)

N M ZA RAi

Nucleus–electron attraction

ZA is the nuclear charge of atom A and RAi is the distance between atom A and electron i

N N 1 r > ij

Electron–electron repulsion

rij is the distance between electrons i and j

Nucleus–nucleus repulsion (constant within the Born–Oppenheimer approximation)

RAB is the distance between atoms A and B

N 1 i=1 M A=1

2

∇i2

1 ∇2 2MA A

i=1 A=1

i=1 j

i

M M ZA ZB RAB >

A=1 B

A

BASIC THEORY

261

where the total Hamiltonian H has now been separated into nuclear and electronic components. This allows us to write the total energy as the sum of the nuclear repulsion energy and the electronic energy defined by the Hamiltonian Helectronic : Etotal = Eelectronic +

M M ZA ZB RAB >

A=1 B

(8.3)

A

Thus, we “only” need to calculate the electronic energy, which according to the Schr¨odinger equation8 is obtained from the electronic wavefunction. The electronic wavefunction electronic in turn is a function of the positions and spins of the N electrons of the system: electronic = (x1 , x2 , x3 , . . . , xN )

where xi = {ri , ωi }

(8.4)

Here ri denotes the (vector) position of electron i and ωi its spin. Thus, the wavefunction is a function of 4N variables (the three coordinates and the spin per electron). To cut a long story short, we can only solve Schr¨odinger’s equation for systems with only one electron, so we are forced to introduce approximations. The first of these is the SCF (also known as mean-field or Hartree–Fock ) approximation.9,10 Basically, rather than solving the Schr¨odinger equation for many particles, we approximate the many-particle solution in terms of many one-electron wavefunctions, which are solvable. This means that we make the approximation that Helectronic ≈

N

hi

(8.5)

i=1

where hi is the one-electron Hamiltonian for electron i . This leads to the Hartree product, HP , which is an approximation for a many-electron wavefunction, electronic : HP (x1 , x2 , . . . , xN ) = χ1 (x1 )χ2 (x2 ) · · · χN (xN )

(8.6)

In Eq. (8.6), χi are the spin orbitals, which are one-electron wavefunctions. The Schr¨odinger equation based on the Hartree approximation becomes H HP = EHP ,

(8.7)

so that the eigenvalues εi of the one-electron wavefunctions χi can be summed to give the electronic energy: Eelectronic =

N i=1

εi

(8.8)

262

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

This would all be fine except for one significant complication. Because electrons are fermions (i.e., they have spin), they must obey the Pauli exclusion principle,11 which can be formulated as the antisymmetry principle, which states that the wavefunction must be antisymmetric with respect to the exchange of any two electrons. Fock’s contribution was to point out that the Hartree product does not obey the antisymmetry principle. Slater12 later pointed out that the wavefunction suggested by Fock can be expressed as a determinant now known as a Slater determinant, Slater : χ1 (x1 ), χ2 (x1 ), . . . , χN (x1 ) χ1 (x2 ), χ2 (x2 ), . . . , χN (x2 ) 1 (8.9) Slater = √ .. .. .. N! . . . χ1 (xN ),χ2 (xN ), . . . , χN (xN ) √ The prefactor 1/ N ! is simply a normalization constant. This is the Hartree–Fock (or SCF) wavefunction, but the question remains as to how we define the spin orbitals χi . This is where the almost universal LCAO approximation, introduced by Erich H¨uckel,13 comes into play. H¨uckel’s idea was that molecular orbitals (in our case the χi introduced above) can be represented as a linear combination of atomic orbitals appropriate for the constituent atoms. For a system constituted of N atomic orbitals (AOs),

NAOs

χi =

cji ϕj

(8.10)

j =1

where cji is the coefficient of atomic orbital ϕj in molecular orbital χi , and the NAOs i 2 (cj ) = 1. coefficients are normalized so that j =1 We still cannot solve for the wavefunction directly, even using the SCF and LCAO approximations. This is where the variational principle, which says that there are no solutions with a lower energy than the correct wavefunction, comes into play. Solutions are generally found by starting with a set of guessed molecular orbitals χi and iterating until the energy converges to its minimum value and the electron density does not vary. We discuss this algorithm in more detail below. 8.1.2 Implications of LCAO-SCF Theory

LCAO-SCF theory is remarkably successful but has two limitations that we need to discuss in order to understand MNDO-like theories better. The first is a consequence of the SCF approximation and is known as electron correlation. Physically, the introduction of the Hartree product [Eq. (8.6)] means that the electrons do not feel each other individually. Instead, each electron feels the electron density (but not the instantaneous positions) of the others. This means that the individual electrons are not given the opportunity to avoid each other

BASIC THEORY

263

instantaneously, which they would obviously do because they are negatively charged. Thus, the SCF approximation means that the electron–electron repulsion is overestimated. This effect, which is purely a consequence of the SCF approximation, is known as dynamic correlation.14 A second type of correlation (nondynamic or static correlation) has also been defined. It is a consequence of using only a single Slater determinant to describe the wavefunction. Although most “normal” molecules can be described very well using a single Slater determinant, some (such as diradicals) cannot. This is essentially because the wavefunction cannot be described adequately by a single scheme in which a single set of molecular orbitals is occupied by zero, one, or two electrons. This second type of correlation is very different from the first and not as easily treated. However, the implicit treatment of dynamic correlation in MNDO-like theories is poorly appreciated and will be discussed below. The second implication of the LCAO-SCF approximations concerns the limitations placed on the wavefunction by the atomic orbitals used to form the MOs. Although the LCAO approximation is very instinctive and actually forms the basis of our qualitative understanding of bonding effects,15 it nevertheless has no physical basis. It is very convenient for calculations, but we can also describe MOs as combinations of non-atom-centered functions or simply as numerical grids. The LCAO approach, however, does bring some limitations. We can only describe wavefunctions that are linear combinations of the atomic orbitals [which are usually called the basis set in ab initio and density functional theory (DFT) calculations]. Current MNDO-like semiempirical techniques use single-valence basis sets. This means that each atomic orbital in the valence shell is represented by only one basis function. This, in turn, means that the size of the orbital is fixed, although in reality some valence orbitals are more or less diffuse than others. This is a serious limitation in ab initio and DFT calculations, but appears to be less serious in MNDO-like techniques. The one possible exception is hydrogen, for which a single valence 1s orbital is not ideal in some bonding situations.16 8.1.3 Neglect of Diatomic Differential Overlap

The NDDO approximation is perhaps the key simplification made in MNDOlike semiempirical MO theories. Interestingly, although some adverse effects of other approximations have been identified (see below), the NDDO approximation appears to be extremely robust and does not lead to identifiable systematic errors. In full (ab initio) Hartree–Fock theory, calculating the electron–electron repulsion requires that all integrals of the type (μυ|λσ) (i.e., all integrals in which the indices μ, ν, λ, and σ vary from 1 to NAOs , the number of atomic orbitals) be 4 /4, calculated. This means that a very large number of integrals (formally NAOs if we ignore symmetry) must be calculated and processed in every iteration of the SCF procedure. The NDDO approximation sets all integrals (μν|λσ) to zero in which either atomic orbitals μ and ν or λ and σ are on different atoms. The combinations μν and λσ are known as charge distributions, so that the NDDO approximation can also be expressed as meaning that we only consider integrals between charge distributions μν and λσ situated on single, but not necessarily

264

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

the same, atoms. Thus, the NDDO approximation reduces the problem of calculating and using the two-electron integrals (i.e., those needed for calculating the electron–electron repulsion) from one of four centers to one of only two; we calculate only one- and two-center two-electron integrals and ignore three- and four-center two-electron integrals. Having reduced the number of integrals to be calculated, we need an efficient technique to calculate them. Ab initio and DFT calculations often use basis sets based on Gaussian functions because these are particularly suitable for calculating the integrals. Gaussian orbitals have the form ϕlm (r) = Ylm e−ζr

2

(8.11)

where Ylm is the angular part (a spherical harmonic function) of the orbital with principal quantum number l and angular momentum quantum number m. The 2 expression e−ζr describes the radial behavior of the wavefunction, where ζ is the exponent that governs how fast the wavefunction falls off with increasing distance r from the nucleus. Despite their almost universal use as atom-centered basis sets in ab initio and DFT techniques, Gaussian functions are far from ideal. Because the distance from the nucleus is squared in the exponent, the wavefunction falls off far faster than it should do and also does not describe the wavefunction at the nucleus correctly. A far better choice would be Slater orbitals, which have the form ϕlm (r) = Ylm e−ζ|r|

(8.12)

However, the two-electron integrals are very expensive to calculate for Slater orbitals, so that they are not used as often as Gaussians, despite their inherent advantages. MNDO-like techniques use Slater-type orbitals, but must therefore resort to a fast, approximate method for calculating the two-electron integrals. This is the multipole approach introduced with MNDO1 and extended to d-orbitals for MNDO/d.2 In this approximation, the interactions between Slater orbitals are approximated as interactions between electrostatic monopoles, dipoles, and quadrupoles, which allows the integrals to be calculated very effectively and with reasonable accuracy. The multipole model has been used to calculate the molecular electrostatic potential for MNDO-like wavefunctions, and the definitions for all the multipoles for the 45 charge distributions that arise with an s-, p-, d-basis set have been listed.17 An important approximation in standard MNDO-like theories is that the basis set (the atomic orbitals) is assumed to be orthogonal (i.e., the orbitals have zero overlap with each other). This saves an initial orthogonalization step in the SCF calculation, which would slow semiempirical calculations considerably. Jorgensen et al.18 reintroduced this orthogonalization into MNDO and found that the resulting method (NO-MNDO) performed as well as later, more highly parameterized, methods and gave improvements in two problem areas: the rotational

BASIC THEORY

265

barriers about C—C single bonds and the relative stabilities of branched and unbranched hydrocarbons. NO-MNDO require about twice the CPU time needed for a standard MNDO calculation. A better known solution to the orthogonalization problem is to add an orthogonalization correction that mimics the effects of the orthogonalization step at less cost in CPU time. This is the basis of the OMn (n = 1 to 3) methods introduced by Thiel and co-workers.19 – 22 These methods are probably the most sophisticated MNDO-like techniques available. One of the most difficult areas in MNDO-like theories is the treatment of the nucleus–nucleus repulsion. What appears initially in Eq. (8.1) and Table 8.1 to be a very simple Coulomb repulsion is, in fact, a fairly complex entity in MNDOlike theories. The problem arises from the fact that the Coulomb interactions in MNDO-like theories are not all treated equally well. Whereas we treat the nucleus–nucleus repulsion exactly in Eq. (8.1), introducing the NDDO approximation leads to some neglect of Coulomb terms involving the electrons. Specifically, the long-range behavior of the electron–electron and nucleus–electron integrals is not correct, so that the simple, physically correct nucleus–nucleus repulsion term in Eq. (8.1) would lead to a net repulsion between neutral atoms or molecules at distances outside their van der Waals radii. Thus, an artificial screening effect must be introduced. In MNDO, the nucleus–nucleus repulsion term EAB becomes MNDO = ZA ZB (sA sA |sB sB )(1 + e−αARAB + e−αB RAB ) EAB

(8.13)

where the integral is treated in the same way as the electron–electron integrals and the two constants αA and αB are parameters specific to the elements A and B. However, MNDO is not able to reproduce hydrogen bonds, an effect that was,23 probably erroneously,16 attributed to the nucleus–nucleus repulsion being too strong. Therefore, this term was modified by the addition of up to four Gaussian terms in MNDO/H.23 These Gaussian terms were later adopted for other methods (see below), but lead to some artifacts. The corresponding expression for EAB becomes

EAB

⎛ ⎞ Z Z 2 2 A B ⎝ MNDO = EAB + aA,i e−bA,i (RAB −cA,i ) + aB,j e−bB,j (RAB −cB,j ) ⎠ RAB i

j

(8.14) where there are i Gaussian functions for atom A and j for atom B. The variables a, b, and c are parameterized for each element [A and B in Eq. (8.14)] and each individual Gaussian function [1 − i and 1 − j in Eq. (8.14)]. Use of these Gaussian functions is not without hazard because they can lead to spurious minima24 and is generally undesirable because the function introduce a large number of additional parameters for each element. A solution that has been found more practical and yields very good results is to introduce two-center terms in to the nucleus–nucleus repulsion, as suggested originally for AM1(d)

266

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

by Voityuk and R¨osch.25 The nucleus–nucleus repulsion term then becomes MNDO (1 + δAB e−αAB RAB ) EAB = EAB

(8.15)

where δAB and αAB are parameters specific to the pair of elements AB. In addition, it is common to use distance-dependent expressions for metal–hydrogen nucleus–nucleus interactions. The problem with all these corrections is that they essentially represent fixes to a fundamental deficiency of current MNDO-like theories. In addition, they all represent modifications to a two-center potential and can adversely affect the parameterization of other such interactions because the effects of the two potentials are not independent of each other. 8.1.4 SCF Iterations and Pseudodiagonalization

Figure 8.1 is a standard flow diagram for a semiempirical MO SCF iteration algorithm. Given a set of Cartesian coordinates, the number of electrons, and the spin multiplicity, the program first assigns atomic orbitals (the basis set) to the atoms and calculates the one-electron matrix, which contains all the interactions except the electron–electron term. In order to proceed, an initial guess density matrix is required. In standard semiempirical MO programs, this initial guess consists of simply dividing the electrons evenly over the available atomic orbitals. More sophisticated initial guesses, such as extended H¨uckel MOs, could be envisaged but would involve an extra diagonalization. The two-electron contribution is then added to the one-electron matrix to give the Fock matrix. This two-electron contribution depends on the density matrix and the two-electron integrals, which are generally precalculated and stored in memory. The Fock matrix is then diagonalized to give a new set of MOs, from which a new density matrix can be generated. The total energy and the density matrix are then tested

Calculate oneelectron matrix Calculate twoelectron integrals Calculate initial guess density matrix

Convergence test Assemble Fock matrix Diagonalize ( MOs)

Fig. 8.1

Calculate density matrix

Standard semiempirical MO SCF flow diagram.

BASIC THEORY

267

for convergence by comparison with the last cycle, and if they have not yet converged, another SCF cycle is started using the new density matrix. The energy improves from cycle to cycle and the density converges steadily until they are both static within predefined thresholds, after which the program exits the SCF cycles. In practice, additional features, such as interpolation schemes, damping, or level shifting, are often included to improve convergence, but Fig. 8.1 gives the basics of the algorithm. However, because the other steps of the calculation are so fast, the diagonalization of the Fock matrix typically takes up approximately 50% of the CPU time for an implementation such as that shown in Fig. 8.1. This is often not appreciated because the diagonalization is a relatively minor component of the calculation for ab initio or DFT calculations. Modern semiempirical programs therefore do not perform full diagonalizations in every SCF cycle but, rather, switch to pseudodiagonalization 26 as soon as the SCF converges far enough. This is shown in Fig. 8.2. The pseudodiagonalization procedure is key to the remaining discussion and therefore is described in detail. The principle of pseudodiagonalization is that the MO eigenvectors are updated but not their eigenvalues. However, as the differences between eigenvalues are needed for the pseudodiagonalization procedure, full diagonalizations must be performed until the eigenvalues have settled to more or less constant values. This is shown in Fig. 8.2. Full diagonalizations are performed until a given threshold (usually, convergence on the density matrix, although convergence of the eigenvalues would be more relevant), after which the pseudodiagonalization can be used until the SCF criteria are met. A final full diagonalization must be performed after convergence to obtain the final eigenvalues and eigenvectors. Using the pseudodiagonalization procedure rather than full diagonalizations at every cycle does not slow convergence and speeds up the calculation by approximately a factor of 2. Just as important, the

Convergence test

Final diagonalization

Assemble Fock matrix > 10–1

Diagonalize ( MO vectors and Eigenvalues)

Convergence?

< 10–1

Calculate density matrix

Pseudodiagonalize ( MO vectors only)

Fig. 8.2 Cyclic section of the SCF iteration algorithm with pseudodiagonalization.

268

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

pseudodiagonalization procedure has properties that can be exploited for alternative SCF iteration schemes, as outlined below. Note that separate calculation of the eigenvalues and pseudodiagonalization can be used to replace the full diagonalizations in Fig. 8.2. Alternatively, if the initial guess is close enough to the final solution, no initial full diagonalizations are needed. The principle behind pseudodiagonalization is that improvements in the eigenvectors for the occupied MOs must come from mixing with virtual MOs. Essentially, there is nothing to win by mixing two occupied MOs. Therefore, the first step is to calculate the occupied-virtual block of the Fock matrix, , in the current MO basis: = co+ F cv

(8.16)

where the subscripts o and v denote the occupied and virtual blocks, respectively, c are the current eigenvector coefficients, and F is the Fock matrix. Large elements of indicate strong interactions between occupied and virtual MOs, which must be removed by mixing the two. The mixing is achieved by a Givens rotation. For an updated occupied eigenvector c˜o , 2 )c c˜o = xov co − (1 − xov (8.17) v where co and cv are the coefficients of the relevant occupied and virtual eigenvectors, respectively, and xov is the rotation angle between the two eigenvectors. The expression for the corresponding updated virtual eigenvector is 2 )c (8.18) c˜v = xov cv + (1 − xov o Thus, the Givens rotations simply mix an occupied MO with a virtual MO with which it interacts strongly. However, the rotation angle xov must be determined before the rotation can be carried out. This is achieved using what is essentially a first-order perturbation theory expression: xov =

ov εo − εv

(8.19)

where ov is the element of that connects the occupied and virtual orbitals o and v, and εo and εv are the eigenvalues of these two orbitals, respectively. This expression explains the need for relatively constant eigenvalues (or eigenvalues calculated explicitly from the eigenvectors) before using the pseudodiagonalization, as these determine the rotation angles. The importance of the pseudodiagonalization procedure is that is allows us to select which orbitals to mix in a very transparent way. This feature is used, for example, in the MOZYME algorithm (see below). For normal-sized molecules, one possible implementation is to calculate and to select a certain proportion

BASIC THEORY

269

of the largest elements (the details of this step vary from implementation to implementation) in order to carry out the rotations between the orbitals connected by these elements. After testing for convergence and calculating the new density and Fock matrices, is calculated for the new Fock matrix and the process is repeated until convergence. 8.1.5 Dispersion

MNDO-like semiempirical MO techniques exhibit the weakness also found for ab initio Hartree–Fock and DFT: that weak (van der Waals) interactions (dispersion) are not reproduced. This problem is more severe than might seem at first sight because, in addition to the obvious intermolecular interaction energies, the intramolecular dispersion energies, which become very significant for large molecules such as those now treated routinely by MNDO-like methods, are also affected. The solution that was introduced for ab initio Hartree–Fock27 and has also been used for DFT28 – 30 has been to add a classical two-center potential with a damping function for short distances to the DFT Hamiltonian. A similar correction has been added to SCC-DFTB calculations (see Chapter 9).31 Such corrections are very successful, but suffer from the inherent problem for MNDOlike methods that they represent an additional two-center potential that can lead to linear dependencies with the nucleus–nucleus potential function. This is not a problem if the dispersion term is added after parameterization, as in OMnD,32 although some methods have been reported in which a dispersion potential was parameterized together with the remaining parameters.33 A more consistent way to treat this problem is to modify the existing two-center potential (the nucleus–nucleus repulsion potential) to include the effects of dispersion. This is the approach used by PM6,34 for which the core–core term is given by 6

PM6 MNDO EAB = EAB (1 + δAB e−αAB (RAB +0.0003RAB ) )

(8.20)

This modification of Voityuk and R¨osch’s formula [Eq. (8.15)] behaves very ˚ and larger gives a noticeably similarly at short distances, but at distances of 3 A smaller repulsion. This, together with an additional correction to take account of the nonvalence electrons (which are neglected in MNDO-like methods), leads to better performance and behavior similar to that expected from a method that includes dispersion. Each of these modifications assumes that the dispersion interaction attributable to a given atom is isotropic. Even if we accept the hypothesis that dispersion interactions can be assigned on an atom–atom basis, this is probably not a good approximation, for example, for sp2 -hybridized carbon atoms or atoms with lone pairs. One Ansatz that takes this effect into account also has the advantage that the dispersion term can be separated from other two-center potentials because it is based on (and parameterized for) the polarizability. In the early 1970s, Rinaldi and Rivail introduced a variational treatment for calculating molecular electronic polarizabilities using MNDO-like methods.35 This approach leads to very fast calculations but is not very accurate. However, Sch¨urer et al.36 were

270

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

able to show that parameterizing the atomic multipole integrals (three per nonhydrogen element), rather than using the analytical values, gave very accurate molecular electronic polarizabilities. Furthermore, this technique lends itself to (arbitrarily) partitioning the molecular electronic polarizability into atomic, or even atomic–orbital, contributions.37 The “atomic polarizability tensors” thus obtained can be used in conjunction with the London equation38 and a damping function at short distances to provide a dispersion correction to MNDO.39 8.1.6 Need for Linear-Scaling Methods

Using current, readily available computers, conventional semiempirical SCF methods are limited to systems of only a few hundred atoms; above that, the computational effort becomes prohibitive. This limit is a direct consequence of the use of matrix algebra for solving the SCF equations, for which several operations, such as inversion and diagonalization, scale as the third power of the size of the system. By using special methods, such as pseudodiagonalization, this effort can be minimized, but elimination of the N 3 dependency is impossible when matrix algebra is used. Before larger systems could be studied, alternatives to matrix algebra methods had to be developed; two of the more successful are the divide-and-conquer linear-scaling method, and the localized molecular orbital method MOZYME. 8.1.7 Divide-and-Conquer Linear Scaling

Given that the N 3 dependency cannot be eliminated, the computational effort required to solve the SCF for a large system can be reduced by splitting the system into smaller ones, which can then be solved separately. Thus, if a system of N atoms is split into m equal parts, each of the m parts will require a computational effort approximately proportional to (N/m)3 . That is, the total effort is reduced by a factor of m2 . This is the basis for the divide-and-conquer (D&C) method.40 Once special care is taken to ensure that the joins between the various parts are handled correctly, the results are almost indistinguishable from those obtained using exact matrix algebra methods.41 The computational effort involved in the D&C method scales linearly with the size of system, which makes it suitable for modeling phenomena in very large species, including protein–protein interactions.42 8.1.8 Localized Orbital SCF

For a self-consistent field to exist, it is a necessary and sufficient condition that all Fock integrals involving occupied and virtual molecular orbitals be zero. On the assumption that a rough approximation to the electronic structure of a molecule is provided by its Lewis structure, the conditions necessary for an SCF provide a guide for moving from the simple Lewis structure to the optimized electronic structure. This is the premise for MOZYME43 : Starting with a Lewis structure represented by localized molecular orbitals (LMOs) on one or at most two atoms,

PARAMETERIZATION

271

in order to generate an NSCF it is sufficient to eliminate the Fock terms between these LMOs and the nearby virtual LMOs. For each pair of LMOs, this operation is very fast and can be performed using a 2 × 2 Givens rotation. The operation is carried out on every occupied LMO and every nearby virtual LMO. A result of this operation is to move the system in the direction of the SCF. However, because each Givens rotation modifies the occupied and virtual LMOs, the result of one annihilation rotation is to cause some matrix elements that had been eliminated by earlier Givens rotations now to become nonzero. This means that the process of annihilating occupied-virtual LMO interactions must be repeated. Over the first few complete sweeps of Givens rotations, the size of the LMOs, represented by the number of atoms on which the LMO has significant intensity, increases rapidly, and then tapers off as the system converges toward self-consistency. To the degree that each complete set of annihilation steps results in the system moving closer to the energy minimum, the MOZYME method is similar to the conventional matrix algebra procedure. Indeed, when an SCF is achieved, MOZYME and conventional matrix algebra give rise to identical electron density distributions. Surprisingly, the MOZYME method is intrinsically more arithmetically stable than the conventional method. Using conventional methods, an SCF sometimes fails to form—the charge distribution simply oscillates from iteration to iteration. This propensity increases as the HOMO-LUMO energy gap decreases. When the gap is very small, the polarizabilities of the HOMO and LUMO become very large, and autoregenerative charge fluctuations effectively prevent an SCF from forming. In conventional methods the MOs are eigenvectors; therefore, the HOMO–LUMO gap is irreducibly small. By contrast, when LMOs are used, the HOMO–LUMO gap is at or near its maximum possible value, and the polarizability of the HOMO is correspondingly small. One practical consequence is that, in general, the MOZYME procedure requires fewer iterations to achieve an SCF. Using the MOZYME technique, the computational effort scales approximately as N 1.4 , and much larger systems can be studied, with the upper limit now being on the order of 15,000 atoms.44 Because having a starting Lewis structure is a prerequisite, the MOZYME method is limited to systems for which a Lewis structure can be defined. At present, only closed-shell systems are allowed, so while ferrocene, FeII (Cp)2 , and crystalline potassium chromate, K2 CrVI O4 , can be modeled, no open-shell system (e.g., [CrIII (H2 O)6 ]3+ ) can be run. Similarly, systems with extended π-conjugation cannot be treated using the MOZYME or D&C techniques because individual orbitals are delocalized across the boundaries between subsystems or cannot be localized. 8.2 PARAMETERIZATION

Many of the equations used in semiempirical methods contain adjustable parameters. Within the broad family of NDDO45,46 methods, the main difference between the various methods lies in the values of these parameters. Provided that the set

272

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

of approximations is sufficiently flexible and physically realistic, the accuracy of a semiempirical method depends on precisely two quantities: the accuracy and range of the reference data used in determining the values of the parameters and the thoroughness of optimization of the parameters. 8.2.1 Data

The set of reference data used in parameterization must satisfy several criteria: It obviously must be as accurate as possible, it must represent a wide range of chemical systems and properties, and it must be manipulated easily by the parameter optimization program. Several useful collections of reference data are available, such as the NIST databases of atomic energy levels,47 reference heats of formation,48 and atomic47 and molecular ionization potentials,49 and the Cambridge Structural Database50 for molecular geometries. Despite the large amount of available experimental reference data, important gaps or deficiencies exist. For the organic elements C, H, N, and O, this is not a problem, but for less popular elements, particularly transition metals, such as Sc and Tc, there is a paucity of reliable reference data. Where data are missing or are incomplete, the few data that do exist can be augmented by using reference data generated from the results of high-level (i.e., highly accurate) theoretical calculations. Of course, since the objective of a semiempirical method is to model the real world, great care must be taken to maximize confidence in the accuracy of all calculated reference data. In the most recent parameterization, the training set consisted of over 10,000 individual data representing over 9000 separate species. 8.2.2 Parameterization Techniques

Although parameterization might initially appear to be a complicated process, in principle it is really very simple51 : Given a set of reference data, x ref , and a set of adjustable parameters, Pi , the values of the parameters are modified so as to minimize the root-mean-square difference between the data predicted and the reference data. That is, given (xi − xiref )2 (8.21) S= i

parameters are consider optimized when ∂S/∂Pi = 0 and ∂ 2 S/∂Pi2 > 0 for all parameters. The first step is to take all the various reference data (dipole moments, bond lengths, heats of formation, etc.) and render them dimensionless, so that they can be manipulated using standard mathematical tools. Default weighting factors for this operation are shown in Table 8.2. In the early days of parameter optimization, making decisions regarding the initial values for the various parameters for the different elements was difficult52 ; in that groundbreaking work, there was no precedent to refer to. A real risk at that time was that an incorrect choice could result in the parameters converging

PARAMETERIZATION

273

TABLE 8.2 Weighting Factors for Reference Data Reference Data Hf0 Bond length Angle Dipole Ionization potential

Weight 1.0 mol · kcal−1 ˚ mol · kcal−1 0.7 A ˚ mol · kcal−1 0.7 A 20 debye−1 10 V−1

on a false minimum. This risk was not hypothetical; computers available in the 1970s were much less powerful than now and only a small number of reference data could be used in a parameter optimization. This increased the probability that spurious minima might be encountered. Over time, and by dint of hard work, these issues were resolved, and now, more than 30 years later, there is a wealth of knowledge of suitable starting values for parameter optimization. 8.2.3 Methods and Hamiltonians

In ab initio work, different methods (e.g., Hartree–Fock and density functional) can be defined using quantum mechanical terms such as the one- and two-electron operators and instantaneous correlation. These terms are a natural consequence of the underlying quantum theory. Within a given method, a balance can be struck between computational effort and accuracy. In part, this is achieved by the choice of basis set—a small set would give rise to a faster but less accurate method, and vice versa. Ab initio methods are thus defined by two quantities: the method and the basis set. The NDDO-based semiempirical methods, on the other hand, use similar sets of approximations and are best distinguished by the values of the parameters. Minor differences do exist in the approximations, with most of these having to do with the core–core terms. Thus, the oldest NDDO method, MNDO,1 had the simplest core–core term; AM1,53 PM3,54,55 and RM156 had terms added to mimic the van der Waals attraction; and in PM634 diatomic parameters were used. These changes were the results of attempts to make the set of approximations more realistic. That the main difference between the methods lies in the values of the parameters can be readily shown. If the original MNDO set of approximations were used and the parameters for H, C, N, and O were reoptimized using modern reference data and modern optimization techniques, the accuracy of the resulting method would be significantly higher than that of the original MNDO method. This is not to disparage the quality of parameterization in MNDO (when it was first developed, it represented a large improvement over even older methods); rather, it demonstrates how the accuracy of methods can be increased as the quality of parameter optimization improves. NDDO methods are best defined by the set of approximations and the set of parameters. This definition is easily seen to be necessary: If the set of parameters

274

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

is not specified, the three methods AM1, PM3, and RM1, methods of very different accuracies, would become indistinguishable. 8.2.3.1 MNDO First published in 1977, MNDO1,52 is the oldest of the NDDO methods. At that time it represented a large increase in accuracy over the thenpopular MINDO/3.57 There were two reasons for this increase in accuracy: For the first time, a semiempirical method could represent the lone-pair/lone-pair interaction of the type found in hydrazine and in hydrogen peroxide (hitherto, such interactions had simply been ignored) and also for the first time reference data based on experimental results for molecular systems were used in the parameter optimization. Parameters for H, C, N, and O were optimized using data on 34 compounds. The much-increased accuracy of MNDO resulted in its becoming instantly popular. But as it was applied to more and more species, various systematic errors became apparent, the most serious of these being the almost complete absence of a hydrogen bond. 8.2.3.2 AM1 Hydrogen bonds are much weaker than covalent bonds and can best be represented by three terms: an electrostatic, a covalent, and a third term variously called the instantaneous correlation, dispersion, or van der Waals interaction. MNDO included the electrostatic and covalent terms, but not the VDW term. To mimic the effect of the VDW term, during the development of AM1 the core–core interaction in MNDO was modified by the addition of simple Gaussian functions to provide a weak attractive force. This extra stabilization allowed hydrogen bonds to form. Parameters for H, C, N, and O were again optimized, now using a larger set of reference data, and the resulting AM1 method was published in 1985.53 Over the following few years, parameters were optimized for many more main-group elements. Each new element was parameterized without changing the parameters for the original AM1 elements. This resulted in a piecemeal method—the values of the parameters depended on the sequence in which the parameterizations were done. At the time the parameters in the AM1 method were being optimized, two different philosophical approaches were explored. One, advocated by Michael Dewar, was to guide the progress of the optimization by using chemical knowledge. At the same time, by carefully selecting the reference data used in the parameterization, the size of the training data set could be kept to a minimum. The quality of such a method could then be determined by its accuracy and predictive power; that is, the ability of the method to predict the properties of systems not used in the training set. As Dewar had an encyclopedic knowledge in this field, this approach had obvious merit. The other approach, advocated by one of us (J.S.), was to provide the parameter optimization procedure with a wide range of reference data, in the hope that if enough data were provided, the rules of chemistry would be implicitly provided to the parameter optimization. In the development of AM1, the first of these two approaches was used. 8.2.3.3 PM3 In contrast to the approach used in AM1, a large amount of reference data was used in the training set for the development of PM3.54,55 In

PARAMETERIZATION

275

the initial parameter optimization, parameters for 12 elements, H, C, N, O, F, Al, Si, P, S, Cl, Br, and I, were optimized simultaneously. Also, in contrast to the development of AM1, no external constraints based on chemical experience were applied. When PM3 was completed, it was found that the average errors for common properties such as heats of formation were lower than those in AM1, but the troubling question of predictive power of PM3 versus AM1 became more difficult to answer. Possibly because of this, although PM3 was widely used, it was never as widely used as was AM1. PM3 was soon extended to include most,58 and ultimately all,59 of the main group. As with AM1, the later parameterizations were carried out using fixed values for the elements that had previously been parameterized. In the initial PM3 work, parameters for all 12 elements were optimized simultaneously, this eliminating any error due to undesired restrictions on the values of the parameters. At the same time, the training set increased in both size and quality. Each entry in it was checked for consistency with the other data. Errors due to incomplete parameterization and inconsistent reference data were minimized. Despite all this, the average unsigned error in the heat of formation remained stubbornly and unacceptably large. 8.2.3.4 PM6 In 2000, in an attempt to improve the accuracy of a method for modeling systems containing molybdenum, Voityuk and R¨osch25 proposed using diatomic core–core parameters. This modification was tested using various pairs of elements in the first PM3 set. In every case, the average error decreased. The next step was obvious: to replace the original MNDO core–core term with a simple function that used diatomic parameters. A few other minor modifications were made to the core–core term, mainly to cater for highly specific interactions such as the acetylenic triple bond. Parameters for the whole of the main group, plus Zn, Cd, and Hg (three elements that behave like main-group elements), 42 elements in all, were then optimized simultaneously. This was followed by the remaining 27 transition metals of periods 4, 5, and 6, and the fourteenth lanthanide, Lu. Two other approaches had been considered, but these were not completed (PM4) or not published (PM5), so the new method was named PM6. A reasonable question to ask is: How does the accuracy of PM6, the most recent semiempirical method, compare with standard ab initio methods? This can best be answered by comparing standard quantities. In PM6, the accuracy of prediction of heats of formation of common organic compounds is somewhat better than those predicted by B3LYP DFT calculations using the 6-31G(d) basis set,60 which in turn is significantly better than Hartree–Fock, using the same basis set. Unfortunately, Hf0 is the only property for which PM6 is superior to B3LYP, for geometries it is somewhat worse, and for ionization potentials and dipole moments—purely electronic properties—it is significantly worse. There is a reason for this initially surprising high accuracy relative to standard ab initio HF and DFT methods, methods that require considerably more computational effort than PM6. Semiempirical methods are parameterized to reproduce experimental reference data, which by definition take into account

276

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

all possible phenomena. Many of these phenomena (e.g., instantaneous correlation) are extremely difficult to calculate ab initio, but in semiempirical methods their effects are simply absorbed into the values of the parameters, and, in turn, when the methods are used in modeling chemical systems, the effects are reproduced. This benefit comes at a price: In semiempirical methods, each atomic basis set is normally referred to by using the standard principal quantum number (PQN), but because the associated parameters are optimized using experimental data, the basis set cannot strictly be identified with a specific PQN. Instead, it represents the blend of atomic functions that most precisely reproduces the phenomena observed. A result of this is that the theoretical underpinnings of semiempirical methods cannot, and should not, be compared with those of ab initio methods. 8.2.3.5 AM1* AM1*61 – 66 provides an interesting contrast to PM6. In AM1*, d-orbitals were added to various elements that had previously been parameterized at the AM1 level, but the original AM1 parameterization was retained for the elements H, C, N, O, and F. Using the original AM1 parameters for these elements obviously limits its ultimate accuracy. Unlike other methods, where the objective was to increase accuracy, the motivation for the development of AM1* was an exploration of the role of the training data and development of a strategy for increasing the robustness or predictive power. To this end, training data calculated using DFT or ab initio techniques were used extensively to supplement the experimental data available. Also in contrast to PM6, the “chemical intuition” approach was used to provide a “reasonable” parameterization. The resulting method performs very similarly to PM6 in terms of its overall statistics. AM1* is usually statistically better than PM6 for its own training data, but usually not for the PM6 training data set. This is expected for local parameterizations, especially so for cases in which it is impossible to use an independent validation data set because of the lack of experimental data. Together, PM6 and AM1* provide an opportunity to validate results by comparing the results of the two methods, which are essentially identical quantum mechanically but were parameterized using different data and philosophies. 8.2.3.6 Methods with Orthogonalization Corrections The desirability of either explicit orthogonalization of the atomic orbitals18 or a more computationally efficient orthogonalization correction was discussed above. The latter technique has been used by Thiel and co-workers in the OMn methods. The first such method, OM1,19 introduced orthogonalization corrections to the one-electron terms within the NDDO approximation. This work was extended to include two-center corrections and the use of effective core potentials in place of the frozen-core approximation in OM2.20 The faster OM3 method22 neglects some of the expensive, but less important, terms included in OM2. The benefits of orthogonalization corrections lie predominantly in improved performance in reproducing relative conformational energies in, for example, peptides.21 OM2 combined with a multireference configuration-interaction technique performs extremely well for excited states (see below).67

PARAMETERIZATION

277

8.2.3.7 Other Hamiltonians Over the past 30 years, several avenues for improving semiempirical methods have been explored. In each instance there were good reasons to believe that the proposed change would be beneficial. Sometimes this was true; other times the proposed benefit did not materialize or there were competing factors that militated against the change being adopted. Some of the more important ideas that were examined will now be described. MNDOC An increase in accuracy should occur if correlation effects were included in semiempirical methods such as MNDO. This principle was examined by Thiel68 in 1982, when parameters for H, C, N, and O were optimized using a modification of MNDO in which a perturbational correction for electron correlation was included explicitly. Whereas the results obtained using the new method, MNDOC, were better than for stand-alone MNDO, the computational effort was significantly larger, and MNDOC was not widely used. MNDO/d In its original form, MNDO was limited to an sp-basis set. This obviously constrained its use to modeling normal-valent systems; the study of hypervalent species such as H2 SVI O4 and PV Cl5 , which occur frequently in normal chemistry, was precluded. During chemical reactions, many main-group elements expand their valency temporarily to form extra bonds with ligands; such phenomena could not be modeled using MNDO. In 1992, Thiel and Voityuk2 added d -orbitals to some elements, and in 1996 demonstrated6 that this resulted in a significant increase in accuracy, particularly in reducing the average unsigned errors (AUE) in Hf0 . The new method involved optimizing parameters for several elements that could be hypervalent, but did not involve reoptimizing those for the other MNDO elements. As such, it was a piecemeal approach. Nevertheless, the demonstration was convincing, and all subsequent methods employed used Thiel and Voityuk’s multipole formalism for the integrals involving d -orbitals. SAM1 While modifications to the core–core repulsion function have resulted in large improvements in accuracy, another function, the electron repulsion integral (ER), should also be regarded as a candidate for examination. Various forms of the ER were examined, and parameters for H, C, N, and O were optimized. When it was published in 1993, the new method, SAM1,69 was shown to be more accurate than the then-current methods AM1 and PM3. It is unfortunate that no further work has been reported on this topic: If the improvements resulting from modifying the ER approximation are real, and there is no reason to doubt that, there is a high probability that further work on modifying the ER term would result in significant improvements over current methods. PDDG As just mentioned, a computationally inexpensive way to reduce error in NDDO-type methods is by modification of the core–core term. In MNDO itself, the analytic expression ZA ZB /RAB had been replaced by an approximation that took into account the long-range electron–nuclear attraction and

278

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

electron–electron repulsion terms. The core–core term had been further modified in AM1 and PM3, and in PDDG, Jorgensen et al., explored the effects of using a pairwise distance-directed Gaussian modification.70 At the heart of the PDDG method is a modification of the core repulsion function, the modification being the addition of the following term: PDDG(A,B) =

A BA

1 nA + nB

⎧ 2 2 ⎨ ⎩

i=1 j =1

(nA PAi

⎫ ⎬

+ nB PBj ) exp −10(RAB − DAi − DBj )2 ⎭ (8.22) and DAi are

where nA is the number of valence electrons on atom A, and PAi parameters. As with SAM1, the PDDG method resulted in an increase in accuracy over AM1 and PM3.

RM1 A convincing demonstration of the importance of training set and parameterization is provided by RM1.56 Starting with the AM1 method, and without making any change to the formalism, parameters for H, C, N, P, P, S, F, Cl, Br, and I were reoptimized. The AUE for heats of formation dropped to about half of that for AM1, and for dipole moments the accuracy exceeded that of PM6.

8.3 NATURAL HISTORY OR EVOLUTION OF MNDO-LIKE METHODS

The evolution of NDDO methods has followed a completely logical course. When it first appeared, MNDO represented a large improvement over the earlier purely atom-based method, MINDO/3. This improvement was due to the more sophisticated set of approximations and to the use of molecular reference data. Only after it had been used for awhile did severe errors in MNDO become apparent, the most important of these being the almost complete lack of a hydrogen bond. This deficiency contained within it an indication of the direction for further improvement—to add a term to represent the hydrogen bond. Still using a small set of reference data, parameters for H, C, N, and O were reoptimized; this resulted in AM1. A consequence of piecemeal parameterization of AM1, in which the first elements parameterized were not reoptimized when more elements were added, was that the final set of parameters were by no means optimal. An obvious next step to correct this was to investigate the consequences of optimizing many elements simultaneously using large amounts of reference data. This gave rise to PM3. No further reduction in accuracy could be achieved by better parameterization or better reference data, so the focus turned to the third and last possible cause of error: the set of approximations used. The core–core terms were modified

NATURAL HISTORY OR EVOLUTION OF MNDO-LIKE METHODS

279

to include diatomic parameters, and a reparameterization involving the entire main group resulted in a dramatic drop in AUE for heats of formation. The new method was named PM6. Each modification addressed a definite fault in the earlier method and resulted in a significant improvement in accuracy. This sequence of incremental improvement is both clear and simple and the overall effect is a natural evolution in the direction of increased accuracy. As the accuracy improves, various faults in any given method that were hidden by much more severe errors in earlier methods become apparent, and these could then be addressed. There is every indication that this sequence will continue far into the future. As just mentioned, the most recent method, PM6, represents a large improvement over PM3. Nevertheless, soon after it was released, errors that were masked by the relatively large errors in PM3 became apparent, the most important of these being a bias in favor of zwitterions instead of neutral biochemical species. It is likely that such errors had existed in earlier methods, but they only became obvious in PM6. In principle, correcting such an error is straightforward—simply adding appropriate reference data to the training set and rerunning the parameterization. In practice, such operations are time consuming, as checks have to be run to ensure that none of the previous gains made are compromised. 8.3.1 Strengths of MNDO-like Methods

The most recent methods developed from the MNDO line, PM6 and AM1*, are particularly useful, that is, accurate, in modeling the structural and thermochemical properties of a broad swath of ordinary chemistry, particularly biochemical systems. However, like the earlier methods, their accuracy is much reduced when they are used for modeling exotic systems, such as transition states, electronic excited states, high-energy systems such as radicals, and solids with low or zero bandgaps, such as metals. For such systems, ab initio methods still reign supreme. In part, this reflects the emphasis or bias imposed on the parameterization: Since one of the objectives of the development of PM6 was to focus on systems of biochemical interest, it is not surprising that it is particularly suitable for modeling such systems. This accuracy comes at a price: A direct consequence of the increased emphasis on ordinary chemistry is the inability to model exotic systems accurately. AM1* provides some contrast because of the conscious attempt to represent “more chemistry” in its parameterization. Once again, the dominant effect of the training data on determining the range of applicability of a semiempirical molecular orbital method cannot be overemphasized. Nevertheless, MNDO-like methods as a general class have important strengths that have tended to be forgotten since the rise of DFT techniques. We outline some of these below. 8.3.1.1 Correlation in MNDO-like Methods As outlined in Section 8.1, MNDO-like methods are based on the LCAO-SCF approximations. They do not, therefore, explicitly include electron correlation. However, in an analogy

280

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

to DFT that is often overlooked, dynamic correlation is included implicitly in MNDO-like techniques. This is achieved through parameterization (experimental results clearly include correlation) and through scaling of the two-electron integrals so that they are correct at the one-center limit (i.e., at RAB = 0). Perhaps the best known pre-MNDO scaling scheme is that of Klopman–Ohno.71,72 In MNDO1 this scaling is achieved by constructing the multipoles used to calculate the two-electron integrals so that they give the correct values at RAB = ∞ and at the one-center limit. The values at the one-center limit are determined by fitting to atomic spectra using Oleari’s method.73,74 This restriction was relaxed when PM3 was introduced54 and the one-center two-electron repulsion integrals were treated as variable parameters. The result of this integral scaling is similar to that of treating electron correlation using a functional of the density in DFT. Dynamic correlation can be treated quite effectively in this fashion and the implicit consideration of dynamic correlation in MNDO-like methods has important consequences for configurationinteraction (CI) calculations on excited states, as discussed below. 8.3.2 One-Electron Properties

One-electron properties,75 in this case primarily the molecular electrostatic potential and field and electrostatic and transition moments, are generally reproduced very well by MNDO-like methods, almost independent of the particular Hamiltonian being used. As an example, we can think of the molecular electrostatic potential (MEP), which has been shown to be a dominant factor in determining intermolecular interactions.76 The MNDO formalism offers a convenient model for representing the electrostatics of molecules because we can derive an atomcentered multipole model77 (up to quadrupoles) directly from the MNDO multipole model for the two-electron integrals.1 Using the AM1* Hamiltonian,61 – 66 for a small test set of diverse molecules, standard deviations between AM1* multipole MEPs at points on the isodensity surfaces of the molecules and those calculated at the same points using MP2/6-31G(d) or B3LYP/6-31G(d) was only on the order of 2 kcal mol−1 if a simple linear scaling factor was used. This observation has significant consequences for many branches of chemistry. It means, for example, that we can happily use MNDO-like methods to calculate solvation energies using polarizable continuum methods because the electrostatics of the molecules are correct. Further examples are given below for the use of transition moments in ensemble models. 8.3.3 Excited States

Semiempirical molecular orbital techniques were used very early to investigate excited states and to predict spectra. The early π-only Pople–Pariser–Parr technique78 was quite successful in predicting ultraviolet/visible spectra.79 Later, the development of the specially parameterized INDO/S technique,80 which used CI calculations limited to single excitations, became the method of choice for calculating spectra of organic and inorganic molecules.81 In the late 1990s, INDO/S

LARGE SYSTEMS

281

allowed calculation of the excited states of systems as large as a bacteriochlorophyll hexadecamer with 704 atoms, more than 2000 electrons, and a CI expansion of 4096 symmetry-selected configurations.82 Semiempirical CI calculations are not limited to INDO/S. Even “general purpose” methods such as AM1 give surprisingly good results for predicting absorption and fluorescence spectra and nonlinear optical (NLO) properties.83,84 It is probably fair to say that semiempirical CI calculations can give similar agreement with experimental excitation energies as current standard time-dependent DFT (TDDFT) methods, although the latter clearly have considerable potential for improvement. Multireference semiempirical techniques can provide remarkably accurate results when used with an orthogonalization correction and are eminently suitable for geometry optimizations on excited states.67 One major advantage of semiempirical CI calculations is that they are computationally very efficient, so that we can afford to perform tens of thousands of calculations on snapshots from classical moleculardynamics simulations. This is the basis of the ensemble model, which has been used to simulate fluorescence resonant energy transfer (FRET) in proteins85 and field-dependent second-harmonic generation by a dye embedded in a biological membrane.86 Such applications demonstrate the real potential and one of the most promising areas of application for MNDO-like methods. 8.4 LARGE SYSTEMS

By large systems we mean both very large molecules and large databases of smaller molecules. Semiempirical molecular orbital methods are useful for the former because of their potential linear scaling. Their inherent speed makes them the ideal choice for both applications. 8.4.1 Databases

Because of their ability to deliver accurate geometries, energies, and one-electron properties, semiempirical MO methods are ideally suited for providing extra information about, for example, druglike molecules.87 It is important to emphasize that the all-important76 molecular electrostatic potential (MEP) is reproduced very poorly by the atomic monopoles commonly used in force fields. The MEP calculated from an atomic-monopole model may even be so much in error as to preclude important intermolecular bonding effects, such as halogen bonding.88 The MEP generated from common semiempirical methods is, however, in very good agreement with that calculated by DFT or ab initio methods.77 Furthermore, semiempirical MO techniques can be used to calculate an array of local properties that describe intermolecular interactions.89 It is therefore not surprising that a complete database of 53,000 compounds was treated (the geometries of all molecules optimized) with AM1 as early as 199890 and to process the entire NCI database (250,000 compounds) in 2005.91 Several in-house databases of companies in the pharmaceutical industry (1 to 2 million compounds) have been treated similarly.

282

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

8.4.2 Ensemble Models

Large databases are not the only area in which very many calculations are required. The two major challenges that face computational chemistry are to represent the potential energy hypersurface of the system correctly (the Hamiltonian) and, for large flexible systems, to sample the conformational space adequately to be able to calculate thermodynamic or spectral properties of the real system (sampling). Clearly, we cannot calculate Avogadro’s number of molecules in order to simulate a mole of substance. We can, however, use the ergodic hypothesis,92,93 which basically proposes that if we sample long enough, we will obtain a distribution of conformations for a single molecule that corresponds to that of an ensemble of very many molecules. This leads to the ensemble models94 for simulating macroscopic systems. In these models, very many snapshots (instantaneous geometries of the system) are taken from a single (or several) molecular-dynamics simulations, their properties calculated by a suitable method (in the examples below semiempirical CI) and the properties of the real system calculated as the average of those of the individual snapshots. Such models have been very successful in calculating the details of FRET in the tetracycline repressor protein85 and simulating the effects of an applied potential on an NLO dye embedded in a cell membrane.86 Semiempirical CI calculations are the only techniques that can provide the necessary accuracy and throughput for such applications. 8.4.3 Proteins

Linear scaling techniques have made the calculation of protein properties— structure, energetics, interactions—possible with quantum mechanical techniques. In part, this was due to the fact that the computational effort required in solving the SCF equations had limited the size of the systems to just a few hundred atoms; this meant that only the smaller proteins, such as crambin, could be studied. More important, weak interatomic interactions such as those found in hydrogen bonds and π − π stacking, were poorly represented by the “fast” quantum mechanical techniques (semiempirical and DFT). As interactions of this type are important in proteins, this fault cast doubt on any predicted results. But now, with the development of linear scaling methods, the properties of proteins containing up to 15,000 atoms can be modeled; less than 13% of all entries in the Protein Data Bank95 are larger than that, and with the advent of PM6, weak interactions of the type found in proteins can also be reproduced with unprecedented accuracy using semiempirical MO theory. These developments have resulted in the ability to model protein chemistry with relative ease; using PM6 and the linear scaling function MOZYME, the properties of over 40 proteins were modeled using a simple desktop computer.96 Among these properties are structure (albeit starting from the PDB geometry), heat of formation, transition states for enzyme-catalyzed reaction, and elastic modulus for structural proteins. The more general problem of de novo predicting protein structure is still unsolved.

REFERENCES

283

D&C methods were the first to be used for calculations on moderately sized proteins, both with97 and without98 solvent effects simulated using the Poisson–Boltzmann equation. Both AM1 and PM3 have proven to be useful in distinguishing between native and misfolded protein structures.99 The more recent PM6 technique in combination with the LMO linear scaling approach has proven to be very useful for studying proteins.96 Many phenomena in proteins can be modeled with good accuracy using PM6, but significant limitations remain. The long-standing fault of semiempirical methods—that predicted barrier heights for covalent reactions are of low accuracy—still exists in PM6. Another fault is that despite the improvements in modeling weak interactions, intermolecular interactions of the type that occurs when a substrate binds to a protein are also poorly reproduced. Very recent work suggests that by making simple modifications to the core–core interactions, to include100 an explicit correction for hydrogen bonds involving oxygen or nitrogen, and adding in a correlation term,29 the accuracy of prediction of intermolecular interactions can be increased significantly. Thus, for the S22 data set,101 intermolecular interactions were reproduced with chemical accuracy (average unsigned error = 0.8 kcal mol−1 ), considerably less than the 3.4 kcal mol−1 found when PM6 was used.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

Dewar, M. J. S.; Thiel, W. J. Am. Chem. Soc. 1977, 99 , 4899. Thiel, W.; Voityuk, A. A. Theor. Chim. Acta 1992, 81 , 391. Thiel, W.; Voityuk, A. A. Theor. Chim. Acta 1996, 93 , 315. Thiel, W.; Voityuk, A. A. Int. J. Quantum Chem. 1994, 44 , 807. Thiel, W.; Voityuk, A. A. J. Mol. Struct . 1994, 313 , 141. Thiel, W.; Voityuk, A. A. J. Phys. Chem. 1996, 100 , 616. Born, M.; Oppenheimer, J. R. Ann. Phys. (Leipzig) 1927, 84 , 457. Schr¨odinger, E. Phys. Rev . 1926, 28 , 1049. Hartree, D. R. Proc. Cambridge Phil. Soc. 1928, 24 , 89, 111, 426. Fock, V. Z. Phys. 1930, 61 , 126. Pauli, W. Z. Phys. 1925, 31 , 765. Slater, J. C. Phys. Rev . 1929, 34 , 1293; 1930, 35 , 509. H¨uckel, E. Z. Phys. 1931, 70 , 204; 1931, 72 , 310; 1932, 76 , 628; 1933, 83 , 632. Sinanoglu, O.; Fu-Tai Tan, D. Chem. Phys. 1963, 38 , 1740. Clark, T.; Koch, R. The Chemist’s Electronic Book of Orbitals, Springer-Verlag, Berlin, 1999. 16. Winget, P.; Selc¸uki, C.; Horn, A. H. C.; Martin, B.; Clark, T. Theor. Chem. Acc. 2003, 110 , 254. 17. Horn, A. H. C.; Lin, J.-H.; Clark, T. Theor. Chem. Acc. 2005, 114 , 159–168; erratum: Theor. Chem. Acc. 2007, 117 , 461–465. 18. Sattelmeyer, K. W.; Tubert-Brohmann, I.; Jørgensen, W. L. J. Chem. Theor. Comput. 2006, 2 , 413.

284

19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48.

49.

50.

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

Kolb, M.; Thiel, W. J. Comput. Chem. 1993, 14 , 775. Weber, W.; Thiel, W. Theor. Chem. Acc. 2000, 103 , 495. Mohle, K.; Hofmann, H.-J.; Thiel, W. J. Comput. Chem. 2001, 22 , 509. Scholten, M. Ph.D. dissertation, Heinrich-Heine-Universit¨at, D¨usseldorf, Germany, 2003. Burstein, K. Y.; Isaev, A. N. Theor. Chim. Acta 1984, 64 , 397. ´ Csonka, G. I.; Angy´ an, J. G. J. Mol. Struct . (Theochem) 1997, 393 , 31. Voityuk, A. A.; R¨osch, N. J. Phys. Chem. A 2000, 104 , 4089. Stewart, J. J. P.; Cs´asz´ar, P.; Pulay, P. J. Comput. Chem. 1982, 3 , 227. Ahlrichs, R.; Penco, R.; Scoles, G. Chem. Phys. 1977, 19 , 119. Grimme, S. J. Comput. Chem. 2004, 25 , 1463. Jurecka, J.; Cerny, J.; Hobza, P.; Salahub, D. J. Comput. Chem. 2007, 28 , 555. Cerny, J.; Jurecka, J.; Hobza, P.; Valdes, H. J. Phys. Chem. A 2007, 111 , 1146. Elstner, M.; Hobza, P.; Frauenheim, T.; Suhai, S.; Kaxiras, E. J. Chem. Phys. 2001, 114 , 5149. Tuttle, T.; Thiel, W. Phys. Chem. Chem. Phys. 2008, 10 , 2159. McNamara, J. P.; Hillier, I. H. Phys. Chem. Chem. Phys. 2007, 9 , 2362. Stewart, J. J. P. J. Mol. Model . 2007, 13 , 1173. Rinaldi, D.; Rivail, J.-L. Theor. Chim. Acta 1973, 32 , 57; 1974, 32 , 243. Sch¨urer, G.; Gedeck, P.; Gottschalk, M.; Clark, T. Int. J. Quantum Chem. 1999, 75 , 17. Martin, B.; Gedeck, P.; Clark, T. Int. J. Quantum Chem. 2000, 77 , 473. Eisenschitz, R.; London, F. Z. Phys. 1930, 60 , 491. Martin, B.; Clark, T. Int. J. Quantum Chem. 2006, 106 , 1208. Yang, W. Phys. Rev. Lett. 1991, 66 , 1438. Dixon, S. L.; Merz, K. M., Jr. J. Chem. Phys. 1997, 107 , 879. Ababoua, A; van der Vaart, A.; Gogonea, V.; Merz, K. M., Jr. Biophys. Chem. 2007, 125 , 221. Stewart, J. J. P. Int. J. Quantum Chem. 1996, 58 , 133. Stewart, J. J. P. J. Mol. Model . 2009, 15 , 765. Pople, J. A.; Santry, D. P.; Segal, G. A. J. Chem. Phys. 1965, 43 , S129. Pople, J. A.; Beveridge, D. L.; Dobosh, P. A. J. Chem. Phys. 1967, 47 , 2026. Kramida, A. E.; Martin, W. C.; Musgrove, A.; Olsen, K.; Reader, J.; Saloman, E. B. http://physicsnistgov/cgi-bin/ASBib1/Elevbib/search_formcgi, 2009. Afeefy, H. Y.; Liebman, J. F.; Stein, S. E. Neutral thermochemical data. In NIST Chemistry WebBook , Linstrom, P. J., and Mallard, W. G., Eds., NIST Standard Reference 69, National Institute of Standards and Technology, Gaithersburg, MD, 2003. Available at http://webbooknistgov/chemistry. Levin, R. D.; Lias, S. G. Ionization Potentials and Appearance Potential Measurements, National Standards Reference Data Series, Vol. 71, National Bureau of Standards, Washington, DC, 1982. Allen, F. H. Acta Crystallogr. B 2007, 58 , 380.

REFERENCES

285

51. Stewart, J. J. P. Parameterization of semiempirical M.O. methods. In Encyclopedia of Computational Chemistry, Vol. 3, Schleyer, P. v. R., Allinger, N. L., Clark, T., Gasteiger, J., Kollman, P. A., Schaefer, H. F. S., III, and Schreiner, P. R., Eds., Wiley, Chichester, UK, 2000. 52. Dewar, M. J. S.; Thiel, W. J. Am. Chem. Soc. 1977, 99 , 4907. 53. Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107 , 3902. 54. Stewart, J. J. P. J. Comput. Chem. 1989, 10 , 209. 55. Stewart, J. J. P. J. Comput. Chem. 1989, 10 , 221. 56. Rocha, G. B.; Freire, R. O.; Simas, A. M.; Stewart, J. J. P. J. Comput. Chem. 2006, 27 , 1101. 57. Bingham, R. C.; Dewar, M. J. S.; Lo, D. H. J. Am. Chem. Soc. 1975, 97 , 1285. 58. Stewart, J. J. P. J. Comput. Chem. 1991, 12 , 320. 59. Stewart, J. J. P. J. Mol. Model . 2004, 10 , 155. 60. (a) Ditchfield, R.; Hehre, W. J.; Pople, J. A. J. Chem. Phys. 1971, 54 , 724. (b) Hehre, W. J.; Ditchfield, R.; Pople, J. A. J. Chem. Phys. 1972, 56 , 2257. (c) Hariharan, P. C.; Pople, J. A. Mol. Phys. 1974, 27 , 209. (d) Gordon, M. S. Chem. Phys. Lett. 1980, 76 , 163. (e) Hariharan, P. C.; Pople, J. A. Theor. Chim. Acta 1973, 28 , 213. (f) Blaudeau, J. -P.; McGrath, M. P.; Curtiss, L. A.; Radom, L. J. Chem. Phys. 1997, 107 , 5016. (g) Francl, M. M.; Pietro, W. J.; Hehre, W. J.; Binkley, J. S.; DeFrees, D. J.; Pople, J. A.; Gordon, M. S. J. Chem. Phys. 1982, 77 , 3654. (h) Binning, R. C., Jr.; Curtiss, L. A. J. Comput. Chem. 1990, 11 , 1206. (i) Rassolov, V. A.; Pople, J. A.; Ratner, M. A.; Windus, T. L. J. Chem. Phys. 1998, 109 , 1223. (j) Rassolov, V. A.; Ratner, M. A.; Pople, J. A.; Redfern, P. C.; Curtiss, L. A. J. Comput. Chem. 2001, 22 , 976. (k) Frisch, M. J.; Pople, J. A.; Binkley, J. S. J. Chem. Phys. 1984, 80 , 3265. 61. Winget, P.; Horn, A. H. C.; Selc¸uki, C.; Martin, B.; Clark, T. J. Mol. Model . 2004, 9 , 408. 62. Winget, P.; Clark, T. J. Mol. Model . 2005, 11 , 439. 63. Kayi, H.; Clark, T. J. Mol. Model . 2007, 13 , 965. 64. Kayi, H.; Clark, T. J. Mol. Model . 2009, 15 , 295. 65. Kayi, H.; Clark, T. J. Mol. Model . 2009, 15 , 1253. 66. Kayi, H.; Clark, T. J. Mol. Model . 2010, 16 , 29. 67. Koslowski, A.; Beck, M. E.; Thiel, W. J. Comput. Chem. 2003, 24 , 714–726. 68. Thiel, W. Quantum Chemistry Program Exchange, QCPE 438, University of Indiana, Bloomington, IN, 1982. 69. Dewar, M. J. S.; Jie, C.; Yu, J. Tetrahedron 1993, 49 , 5003. 70. Repasky, M. P.; Chandrasekhar, J.; Jørgensen, W. L. J. Comput. Chem. 2002, 23 , 1601. 71. Klopman, G. J. Am. Chem. Soc. 1964, 86 , 4550. 72. Ohno, K. Theor. Chim. Acta 1964, 3 , 219. 73. Oleari, L.; DiSipio, L.; DeMichelis, G. Mol. Phys. 1966, 10 , 97. 74. Dewar, M. J. S.; Lo, D. H. J. Am. Chem. Soc. 1972, 94 , 5296. 75. See Karplus, M.; Kuppermann, A.; Isaacson, L. M. J. Chem. Phys. 1958, 29 , 1240.

286

MNDO-LIKE SEMIEMPIRICAL MOLECULAR ORBITAL THEORY

76. Murray, J. S.; Politzer, P. J. Mol. Struct . (Theochem) 1998, 425 , 107; Murray, J. S.; Lane, P.; Brinck, T.; Paulsen, K.; Grince, M. E.; Politzer, P. J. Phys. Chem. 1993, 97 , 9369. 77. Horn, A. H. C.; Lin, J.-H.; Clark, T. Theor. Chem. Acc. 2005, 114 , 159; erratum: Theor. Chem. Acc. 2007, 117 , 461. 78. Pariser, R.; Parr, R. G. J. Chem. Phys. 1963, 21 , 466. 79. See, e.g., Griffiths, J. Dyes Pigments 1982, 3 , 211. 80. Ridley, J.; Zerner, M. C. Theor. Chim. Acta, 1973, 32 , 111. 81. Zerner, M. C. In Reviews of Computational Chemistry, Vol. 2, Lipkowitz, K. B., Ed., VCH, New York, 1991, p. 313. 82. Cory, M. G.; Zerner, M. C.; Hu X.; Schulten, K. J. Phys. Chem. B 1998, 102 , 7640. 83. Clark, T.; Chandrasekhar, J. Israel J. Chem. 1993, 33 , 435. 84. G¨oller, A.; Grummt, U. W. Int. J. Quantum Chem. 2000, 77 , 727. 85. Beierlein, F. R.; Othersen, O. G.; Lanig, H.; Schneider, S.; Clark, T. J. Am. Chem. Soc. 2006, 128 , 5142. 86. Rusu, C.; Lanig, H.; Clark, T.; Kryschi, C. J. Phys. Chem. B 2008, 112 , 2445. 87. Clark, T. In Molecular Informatics: Confronting Complexity, Hicks M. G., and Kettner C., Eds., Logos Verlag, Berlin, 2003, p. 193. 88. Politzer, P.; Murray, J. S.; Concha, M. J. Mol. Model . 2008, 14 , 659. 89. Clark, T.; Byler, K. G.; de Groot M. J. In Molecular Interactions: Bringing Chemistry to Life, Hicks M. G., and Kettner C., Eds., Logos Verlag, Berlin, 2008, p. 129. 90. Beck, B.; Horn, A. H. C.; Carpenter, J. E.; Clark, T. J. Chem. Inf. Comput. Sci . 1998, 38 , 1214. 91. Murray-Rust, P.; Rzepa, H. S.; Stewart J. J. P.; Zhang, Y. J. Mol. Model . 2005, 11 , 532. 92. Boltzmann, L. Einige allgemeine S¨atze u¨ ber das W¨armegleichgewicht , Vienna, Austria, 1871. 93. Boltzmann, L. Creeles J . 1884, 98 , 68. 94. Lee, M.; Tang, J.; Hochstrasser, R. M. Chem. Phys. Lett. 2001, 344 , 501. 95. http://www.pdb.org/, Research Collaboratory for Structural Bioinformatics, The San Diego Supercomputer Center, San Diego, CA, 2007. 96. Stewart, J. J. P. J. Mol. Model . 2008, 15 , 765. 97. Gogonea, V.; Merz, K. M., Jr. J. Phys. Chem. A 1999, 103 , 5171. 98. For a review, see van der Vaart, A.; Gogonea, V.; Dixon, S. L.; Merz, K. M., Jr. J. Comput. Chem. 2000, 21 , 1494. 99. Wollacott, A. M.; Merz, K. M., Jr. J. Chem. Theor. Comput. 2007, 3 , 1609. 100. Rezac, J.; Fanfrlik, J.; Salahub, D.; Hobza, P. J. Chem. Theor. Comput . 2009, 5 , 1749. 101. Jurecka, P.; Sponer, J.; Cerny, J.; Hobza, P. Phys. Chem. Chem. Phys. 2006, 8 , 1985.

9

Self-Consistent-Charge Density Functional Tight-Binding Method: An Efficient Approximation of Density Functional Theory MARCUS ELSTNER and MICHAEL GAUS Institute of Physical Chemistry, Universit¨at Karlsruhe, Karlsruhe, Germany; Institute for Physical and Theoretical Chemistry, Technische Universit¨at Braunschweig, Braunschweig, Germany

In this chapter we describe the derivation of the approximate DFT method SCCDFTB from DFT. The basic formalism of SCC-DFTB results from a second-order expansion of the DFT total energy, followed by appropriate approximations. The formal basis of SCC-DFTB is the non-self-consistent Harris functional. We discuss the performance of SCC-DFTB as well as recent extensions such as the inclusion of third-order terms and van der Waals corrections.

9.1 INTRODUCTION

Most semiempirical (SE) methods are derived either from Hartree–Fock (HF) or density functional theory (DFT) applying two types of approximations: first, they are based primarily on a minimal atomic orbital-like basis set; second, the numerous integrals, which have to be evaluated in HF and DFT, are partially neglected and the remaining ones can be calculated either using further approximations or can be substituted by parameters, which in turn are be fitted to reproduce experimental data. As a result, no integrals have to be evaluated during the runtime of the program, and the dominant computational cost is given by the diagonalization of the Fock (Hamilton) matrix. Since this matrix is represented in a minimal atomic basis set, solution of the eigenvalue problem is much less expensive than for full DFT and HF methods, which usually

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

287

288

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

apply more extended basis sets. Typically, SE methods are about three orders of magnitude faster than HF/DFT methods using double-zeta basis sets. They exhibit an O(N 3 ) scaling behavior, that means that the computing time increases cubically with the system size (which is roughly proportional to the number of atoms, or, more correctly, proportional to the number of electrons N ). Since DFT is also O(N 3 ) scaling, the factor of 1000 gained in computational speed with respect to DFT means that about 10-fold larger systems can be treated. For example, today about 100 atoms can be handled by DFT on standard desktop PCs, while roughly 1000 atoms can be treated using SE methods. The bottleneck here is the diagonalization of the Fock–Hamilton matrix, and methods that avoid this step, such as O(N ) scaling algorithms,1 help to increase the system size dramatically, as discussed in Chapters 2 and 8. However, in many cases the system size is not the limiting issue. Chemistry often occurs in localized regions and the “active site” of interest often contains only several 10 to 100 atoms [i.e., a quantum mechanical (QM) treatment is needed only for this small subsystem (this applies often in biological systems)]. The remainder of the system can be treated by empirical potentials [molecular mechanics (MM)]. A combination of QM methods with MM force fields in QM/MM methods can now be applied routinely (for recent comprehensive reviews, see, e.g., Refs. 2 and 3). A major issue however, is the time scale that can be reached using molecular dynamics (MD) simulations. HF and DFT make it possible to follow the system dynamics (for several tens of atoms) in the picosecond regime. In this case, the factor 1000 gained in computational speed by SE methods allows for 1000-fold longer MD simulations (i.e., the nanosecond time scale is easily accessible). In many applications, this helps to follow the relevant conformational changes or, much more important, to compute free-energy changes along reaction pathways.4 This is probably the main reason why SE methods have been used increasingly in the past years, although they sacrifice accuracy compared to DFT in many cases (note that this can be reversed for specific applications). In quantum chemistry, the classical route to deriving SE methods is to start from HF theory and fit the remaining parameters (integrals) to experimental data. This approach leads to a family of SE methods, with MNDO, AM1, and PM3 being the best known. The latest and most accurate members of this family are discussed by Clark and Stewart in Chapter 8. In solid-state physics, tight-binding (TB) approaches have been used extensively to study the properties of solids and clusters,5,6 directly paralleling the development of the H¨uckel model in chemistry; these methods are reviewed in Chapter 10. Standard tight-binding methods are usually based on the Harris functional approach7 (i.e., they diagonalize a suitable Hamiltonian once and use this non-self-consistent solution to derive further properties, such as forces and second derivatives). The relation of DFT and TB methods has been discussed in detail by Foulkes and Haydock.8 TB methods can be understood as a stationary approximation to DFT and tend to work well when the “guess” density, which is incorporated into the predetermined Hamilton matrix, is a good approximation to the DFT ground-state density.

THEORY

289

SCC-DFTB is an approximate quantum chemical method that is derived from DFT by a second-order expansion of the DFT total energy with respect to density fluctuations around a suitable reference density.9 On the other hand, SCC-DFTB can be viewed as an extension of a tight-binding method, which includes charge self-consistency and is parameterized using DFT. Energy in tight-binding methods consists of two parts: electronic and repulsive. The electronic part is described by a Hamiltonian, which is usually represented in a minimal basis of atomcentered basis functions. In DFTB, this Hamilton matrix is derived from DFT using as a reference density the superposition of neutral atomic densities and a minimal basis of atomic wavefunctions, which is calculated explicitly.10 – 14 The repulsive energy, which consists of the DFT double-counting contributions and the core–core repulsion, can be approximated as a sum of atomic pair repulsion functions. SCC-DFTB is parameterized using the generalized-gradient approximation (GGA). In the actual version the electronic parameters are calculated using the PBE functional.15 This means, however, that the well-known DFT-GGA deficiencies are inherited by SCC-DFTB. Of particular relevance is the DFTGGA tendency to overpolarize extended π-conjugate systems,16 the problems of ionic and charge-transfer excited states,17 and the missing dispersion interactions, which have been included by augmenting SCC-DFTB using an empirical extension.18 The performance and deficiencies of SCC-DFTB with respect to biological applications have been reviewed recently,19,20 and methodological developments have been described elsewhere.21 9.2 THEORY

The derivation of SCC-DFTB starts from the DFT total energy. In a first step, we discuss the Harris functional approximation as the basis for non-self-consistent TB methods. In a second step, second-order corrections to Harris functional theory are introduced, leading after further approximations to the SCC-DFTB formalism. In a next step, the remaining approximations, the performance and possible extensions of this methodology, are discussed. 9.2.1 DFT and the Harris Functional

The DFT total energy reads ρ(r)ρ(r ) 1 ext dr dr E[ρ] = T [ρ] + v (r)ρ(r) dr + 2 |r − r | 1 Zα Zβ + E xc [ρ] + 2 Rαβ

(9.1)

αβ

where ρ(r) is the electron density, T [ρ] the kinetic energy of the electrons, v ext the external potential arising from the nuclei with charge Z, and E xc [ρ] is the

290

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

exchange-correlation energy. Application of the variational principle leads to the well-known Kohn–Sham (KS) equations, 1 2 (9.2) − 2 ∇ + v eff [ρ] φi = εi φi with v eff [ρ] being the KS effective potential, which determines the KS eigenvalues (molecular orbital energies) εi and KS (molecular) orbitals φi . Since v eff [ρ] already contains the electron density, which is calculated as |φi |2 (9.3) ρ= i

these equations have to be solved iteratively until self-consistency is achieved. Using the Kohn–Sham energies εi , the total energy can be written22 occ

E[ρ] =

i

εi −

1 2

ρ(r)ρ(r ) dr dr + E xc [ρ] |r − r |

v xc (r)ρ(r) dr +

−

1 Zα Zβ 2 Rαβ

(9.4)

αβ

In the Harris-functional approach,7 an initial density ρ0 is constructed as a superposition of fragment densities ρ0α , ρ0α (9.5) ρ0 = α

and it can be shown that the total energy can be approximated in first order as E[ρ] =

occ i

−

εH i

1 − 2

ρ0 (r)ρ0 (r ) dr dr + E xc [ρ0 ] |r − r |

v xc (r)ρ0 (r) dr +

1 Zα Zβ 2 Rαβ

(9.6)

αβ

0 where the εH i are determined from Eq. (9.2) using ρ instead of the true density ρ, which would have to be determined self-consistently by iterating Eqs. (9.2) and (9.3). Any DFT method has to be initialized by choosing a proper initial density ρ0 , which is usually taken as a superposition of atomic densities. As pointed out by Harris,7 the KS equations (9.2) do not have to be solved iteratively if the starting density ρ0 is close to the ground-state density ρG (introducing an error of second order in the difference density δρ = ρ − ρ0 ). This non-self-consistent solution of the KS equations is the basis of the Harris functional approach, and proper implementation boils down to the question of how to find a good starting density ρ0 , which has been elaborated in particular in TB theory.

THEORY

291

9.2.2 Non-Self-Consistent TB Methods

To get started, consider a case where one already knows the ground-state density ρ0 to sufficient accuracy. In this case, one can omit the self-consistent solution of the KS equations and get the orbitals immediately through 1 2 (9.7) − 2 ∇ + v eff [ρ0 ] φi = εi φi (ρ0 stands for a properly chosen input density in the following). This saves a factor of 5 to 10 already; however, it is the starting point for further approximations. Consider a minimal basis set consisting of atomic orbitals: that is, ημ = 2s, 2px , 2py , and 2pz for first-row elements (core orbitals are usually omitted) and ημ = 1s for H. With the basis set expansion φi =

cμi ημ

μ

and the Hamiltonian Hˆ [ρ0 ] = Tˆ + v eff [ρ0 ] we find that

cμi Hˆ [ρ0 ]|ημ > = εi

μ

cμi |ημ >

(9.8)

μ

Multiplication with < ην | leads to cμi < ην |Hˆ [ρ0 ]|ημ > = εi cμi < ην |ημ > μ

(9.9)

μ

or equivalently, in matrix notation, H 0 C = SCε

(9.10)

This means that we just have to solve the eigenvalue equation once; that is, we 0 =< ην |Hˆ [ρ0 ]|ημ >. The superscript have to diagonalize the Hamilton matrix Hμν zero indicates that the matrix elements are evaluated using the reference density ρ0 . Diagonalization leads to the one-particle energies εi , that is, to the electronic energy: εi (9.11) E elec = i

Note that the basis set is nonorthogonal; that is, the overlap matrix Sμν =< ην |ημ > appears in the eigenvalue equations. In such a scheme, the Hamilton and overlap matrix elements have to be determined. Effectively, the Hamilton matrix

292

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

elements can be fitted to reproduce properties of well-chosen benchmark systems. Goringe et al.5 and Colombo6 discuss several examples. Since the general form of the Hamilton operator is always known, fitting determines implicitly a proper starting density, as pointed out by Foulkes and Haydock.8 The overlap matrix, however, is difficult to achieve if matrix elements are not computed from first principles but are fitted to experimental data. Therefore, orthogonal TB methods are usually employed. 9.2.2.1 Orthogonal Empirical Tight Binding (ETB) or Huckel Theory In ¨ empirical schemes, the basis functions are taken to be orthogonal (i.e., Sμν = δμν ). The background is the L¨owdin orthogonalization, where we get orthonormal orbitals through

η = S 1/2 η Introducing orthonormal orbitals means multiplying with S −1/2 and inserting a “1”: S −1/2 H S −1/2 S 1/2 C = S −1/2 S 1/2 S 1/2 Cε to get the orthonormal equations (C = S 1/2 C, H = S −1/2 H S −1/2 ): H C = Cε Introducing orthonormal orbitals means effectively changing the Hamiltonian. This is convenient, since in empirical schemes the Hamilton matrix is completely fitted to empirical data: for example, for carbon to the solid-state band structures of several crystal structures (e.g., diamond, graphite, body-centered cubic) or, in H¨uckel theory, to properties of hydrocarbons.5,6 9.2.2.2 Density Functional Tight Binding (DFTB) The derivation of parameters via fitting is a quite involved process. If one could derive the parameters from DFT calculations, one would gain much more flexibility and a simplified parameterization scheme. In a first step, one has to choose a basis set. In TB theory, basis functions are atomic orbitals ημ , and these can be calculated from the atomic KS equations:

1 2 − 2 ∇ + v eff [ρatom ] ημ = εμ ημ

(9.12)

The choice of a basis is to a large degree arbitrary, and several functional forms have been applied in quantum chemistry. Atomic orbitals have the disadvantage that they are very diffuse compared to the bonding situation in solids, molecules, or clusters, where atomiclike orbitals would be “compressed” due to interaction with the neighbors. Therefore, it would be wise to use orbitals, which anticipate this interaction/compression to some degree. One way to enforce this is to add

THEORY

293

an additional (harmonic) potential to the atomic Kohn–Sham equations, which leads to compressed atomic orbitals or optimized atomic orbitals (O-LCAO), as introduced by Eschrig23 : 2 1 2 eff atom (9.13) ημ = εμ ημ − 2 ∇ + v [ρ ] + rr0 A measure of the distance between neighbors is given by the covalent radius r 0 and is determined for all atoms empirically. This parameter enters the evaluation of the matrix elements and is, of course, of empirical nature. As a result of the atomic calculations, we get the orbitals ημ , the electron density at (the charge neutral) atom α, ρ0α = |ημ |2 (9.14) and the overlap matrix Sμν = < ην |ημ >. To solve the eigenvalue problem in Eq. (9.9) or (9.10), we only need the Hamiltonian matrix. This leads to further

0 approximations, since although we ρα , the Hamiltonian evaluation would have the complete input density ρ0 = be very complicated: Hμν =< ην |Hˆ [ρ0 ]|ημ > = < ην |Hˆ [ ρ0α ]|ημ > We therefore usually make the two-center approximation for μ = ν: Hμν = < ην |Hˆ [ρ0 ]|ημ > = < ην |Hˆ [ρ0α + ρ0β ]|ημ >

(9.15)

where the orbital ν is located on atom α and the orbital μ is located on atom β. The diagonal Hamiltonian elements Hμμ = εμ are taken from Eq. (9.13). The two-center approximation neglects two types of integrals which contain contributions of density ργ . The terms that would enter the diagonal Hμμ are crystal field terms, while the terms missing on the off-diagonal terms Hμν are three-center terms. These approximations are discussed in detail elsewhere.24,25 As can be shown, the neglect of crystal field terms becomes more severe for short interatomic distances, which, however, may be compensated for by a properly chosen repulsive potential.25 The missing crystal field terms may also be responsible for errors in the cohesive energies for highly coordinated systems, as has been described for some bulk silicon systems.26 In the context of semiempirical MO theory, the neglect of three-center terms has been discussed as being responsible for an underestimation of rotational barriers. In DFTB, this may have a similar consequence. Rotational barriers are slightly underestimated, which manifests itself in an underestimation of vibrational frequencies of the low-lying vibrational modes. In DFTB,10 – 13 Hμν and Sμν are tabulated for various distances between atom pairs up to 10 a.u., where they vanish (also due to compression!). For any molecular geometry, these matrix elements are read in based on the distance between two atoms and then oriented in space using the Slater–Koster sin/cos

294

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

combination rules27 (see, e.g., Ref. 6). Then the generalized eigenvalue problem (9.10) is solved and the electronic part of the energy, E elec , from Eq. (9.11) can be calculated. It should be emphasized that this is a nonorthogonal TB scheme, which is more transferable, due to the appearance of the overlap matrix. 9.2.3 Repulsive Energy E rep

Up to now, we have only discussed the first part of the total energy in DFT in Eq. (9.6), the sum over the Kohn–Sham energies εH i as calculated in Eq. (9.11): E[ρ] =

occ i

εH i

1 − 2

−

ρ0 (r)ρ0 (r ) dr dr + E xc [ρ0 ] |r − r |

v xc (r)ρ0 (r) dr +

1 Zα Zβ 2 Rαβ

(9.16)

αβ

In TB theory, the remaining terms, the DFT double-counting and core–core repulsion terms are put together into an energy term called repulsive energy, E rep , that the TB total energy reads: E TB [ρ] =

occ

rep εH i + E [ρ]

(9.17)

i

First, it is interesting to note that the double-counting terms depend on the 0 input/reference

0 density ρ only. If we introduce the atomic density decomposition, 0 ρ = α ρα , where the atomic densities are computed according to Eq. (9.14), the Coulomb contributions ρ0 (r)ρ0 (r ) 1 Zα Zβ α β − dr dr 2 Rαβ |r − r | αβ

decay exponentially with distance Rαβ , since the overlap of the atomic densities decays exponentially. The Coulomb terms therefore can be regarded as a sum of two-body interactions, which is not the case for the exchange–correlation part in Eq. (9.4). Foulkes and Haydock8 suggested applying a cluster expansion, E xc [ρ0 ] =

α

E xc [ρ0α ] +

1 xc 0 (E [ρα + ρ0β ] − E xc [ρ0α ] − E xc [ρ0β ]) + · · · 2 αβ

(9.18) The three-center terms are assumed to be small and are neglected. Therefore, the repulsive potential E rep is approximated as the sum of a set of pairwise atom–atom potentials. Because ρ0α corresponds to the charge density of a neutral atom, the electron–electron and nucleic–nucleic repulsions cancel for

THEORY

295

large interatomic distances. Therefore, E rep can be assumed to be short-ranged. However, due to the first term on right-hand side of Eq. (9.18), the repulsive potential does not approach zero for large interatomic distances R.28 Because in DFTB E rep is assumed to be short-range anyhow, an additive constant has to be taken into account for some applications (e.g., when computing proton affinities). Early ETB models had the form εi + 12 Uαβ E tot = αβ

i

with the two-body terms Uαβ being exponentials fitted to reproduce, for example, geometries, vibrational frequencies, and reaction energies of suitable systems. There are various approaches in the literature to treating this repulsive part, including attempts to account for the many-body nature of E rep . In DFTB, Uαβ E rep [ρ0 ] = 12 αβ

is calculated pointwise as follows: To get the repulsive potential for carbon, for example, one could take the carbon dimer C2 , stretch its bond, and for each

distance calculate the total energy with DFT and the electronic TB part i εi .UCC (RC—C ) is given pointwise for every RC—C by DFT (RC—C ) − UCC (RC—C ) = Etot

εi

(9.19)

i

Since for the varying RC—C in the carbon dimer a lot of state crossings appear in DFT calculations, this example becomes more complex. Another possibility is to include information of a C—C single, double, and triple bond.20 Here for various carbon–carbon distances, RC—C of the molecules ethyne, ethene, and ethane DFT calculations are performed and the resulting curves connected. This example is illustrated in Fig. 9.1. The repulsive potential is shifted so that it goes to zero at the cutoff distance. This shift makes the construction of repulsive potentials the most time-consuming part in a new parameterization. The shift affects the atomization energy and, consequently, the heat of formation of a molecule. More important, reaction energies are controlled by the relative shifts of two potentials. Additionally, no arbitrary shift of a potential is possible, due to restrictions at the cutoff radius. Further restrictions apply for the slope and the curvature of a potential which is directly connected to the description of bond lengths and harmonic vibrational frequencies. With this conventional approach, every repulsive potential was individually hand-constructed. For illustration, we take the example of the C—H bond. Practically, one C—H bond of methane is stretched and compressed, and the DFT total energy and DFTB electronic energy are recorded pointwise for a sufficient number of geometries. Then the difference in the energies according to Eq. (9.19)

296

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY 0.4

EDFT Eel Erep

0.3

energy [a.u.]

0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

distance [Å]

Fig. 9.1 E DFT shows the (shifted) total energy versus C—C distance for HC≡CH,

H2 C=CH2 , and H3 C—CH3 , E el represents i εi + shift for the same structures [the second term on right-hand side of Eq. (9.19)], and E rep is the difference of these two curves.

is fitted to a polynomial (or a spline), yielding the desired repulsive potential. At the end, the repulsive potential is shifted in order to match the atomization energy of methane. Practically, the potentials could not be shifted upward sufficiently; therefore, the potentials were constructed to yield a consistent overbinding for every bond type, as noted recently.29 Recent work has been carried out to find an automated approach. Knaup et al. use a genetic algorithm to reproduce reference forces and reaction barriers.30 Gaus et al. solve a linear equation system containing parameters for the repulsive potentials as unknowns in order to fit them to reference geometries, atomization energies, reaction energies, and vibrational frequencies.31 The resulting DFTB method works very well for homonuclear systems, where charge transfer between the atoms in the system does not occur or is very small. As soon as charge is flowing between atoms because of an electronegativity difference, the resulting density is no

longer well approximated by the superρα . As examples of the breakdown of position of the atomic densities ρ0 = the standard non-self-consistent method, the molecules CO2 and formamide have been discussed.9 However, the formalism works very well when the charge flow is small; therefore, an extension will try to start from the non-self-consistent scheme and augment the Hamiltonian with appropriate additional terms. 9.2.4 Second-Order Approximation of the DFT Total Energy: Self-Consistent-Charge Density Functional Tight-Binding Method

The problem with the charge transfer is that the effective Kohn–Sham potentials contain only the neutral reference density ρ0 , which does not account for charge

THEORY

297

transfer between atoms. Let’s try a Taylor series expansion (functional expansion) of the potential with the ground-state density ρ around the reference density ρ0 : v [ρ] = v [ρ ] + eff

eff

0

δv eff [ρ] δρ dr δρ

(9.20)

This potential could be inserted into Eqs. (9.9) and (9.10). The first term on the right-hand side of Eq. (9.20) would lead to the zero-order terms in Eqs. (9.9) and (9.10), Hμν [ρ0 ], depending on the reference density, while the second term on the right-hand side of Eq. (9.20) would lead to corrections for charge transfer. In a second step, one would have to find approximations for the functional derivatives. Since we need the total energy and not only the KS equations, it is better to start the functional expansion with the DFT total energy. The SCC-DFTB method is derived from density functional theory (DFT) by a second-order expansion of the DFT total energy functional with respect to the charge-density fluctuations

δρ around a given reference density ρ0 [ρ0 = ρ0 (r ), = d r ]: 2 xc 1 E δ 1 < i |Hˆ 0 |i > + ρ ρ E= + 2 |r − r | δρ δρ ρ0 i 0 0 1 ρ ρ xc 0 − V xc [ρ0 ]ρ0 + E cc (9.21) + E [ρ ] − 2 |r − r |

cμi ημ , the first term becomes After introducing an LCAO ansatz i = occ

< i |Hˆ 0 |i > =

0 cμi cνi Hμν

and can be evaluated as discussed above. The last four terms in Eq. (9.21) depend only on the reference density ρ0 and represent the repulsive energy contribution E rep , as discussed above. Therefore, we only have to deal with the second-order terms. Going from DFTB to SCC-DFTB, the second-order term E 2nd in the charge density fluctuations ρ [second term in Eq. (9.21)] is approximated by writing ρ as a superposition of atomic contributions: ρα ρ = α

To further simplify E 2nd , we apply a monopole approximation ρα ≈ qα Fα00 Y 00

(9.22)

Basically, ρα is assumed to look like an 1s orbital. Fα00 denotes the normalized radial dependence of the density fluctuation on atom α, which is constrained (approximated) to be spherical (Y 00 ) (i.e., the angular deformation of the charge

298

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

density change in second order is neglected): 1 δ2 E xc 1 2nd Fα00 Fβ00 (Y 00 )2 dr dr E ≈ + qα qβ 2 |r − r | δρ δρ n0 αβ

(9.23) This formula looks complicated but has a quite simple curve shape:

•

For large distances, Rαβ = |r − r | → ∞, the XC terms vanish and the integral describes the Coulomb interaction of two spherical normalized charge densities, which reduces basically to 1/Rαβ ; that is, we get E 2nd ≈

1 qα qβ 2 Rαβ αβ

•

For vanishing interatomic distance, Rαβ = |r − r | → 0, the integral describes the electron–electron interaction on atom α. We can approximate the integral as E 2nd ≈

1 2 ∂ 2 Eα 1 qα 2 = qα2 Uα 2 ∂ qα 2

Uα , known as the Hubbard parameter (which is twice the chemical hardness), describes how much the energy of a system changes upon adding or removing electrons. Now we need a formula γ to interpolate between these two cases. A very similar situation appears in semiempirical quantum chemical methods such as MNDO, AM1, or PM3, where γ has a simple form, as given, for example, by the Klopman–Ohno approximation, γαβ =

1 2 Rαβ

+ 0.25(1/Uα + 1/Uβ )2

(9.24)

To derive an expression analytically, we approximate the charge density fluctuations with spherical charge densities. Slater-like distributions Fα00 =

τα exp(−τα |r − Rα |) 8π

(9.25)

located at Rα allow for an analytical evaluation of the Hartree contribution of two spherical charge distributions. This leads to a function of γαβ , which depends on the parameters τα and τβ , determining the extension of the charge densities of atoms α and β. This function has a 1/Rαβ dependence for large Rαβ and

THEORY

299

approaches a finite value for Rαβ → 0. For zero interatomic distances (i.e., α = β) one finds that τα =

16 γαα 5

(9.26)

The function γαβ is shown schematically in Fig. 9.2. After integration, E 2nd becomes a simple two-body expression depending on atomic-like charges: qα qβ γαβ (9.27) E 2nd = 12 αβ

The diagonal terms γαα model the dependence of the total energy on charge density fluctuations (decomposed into atomic contributions) in second order. The monopole approximation restricts the change of the electron density considered and no spatial deformations are included; only the change of energy with respect to change of charge on atom α is considered. By neglecting the effect of the chemical environment on atom α, the diagonal part of γ can be approximated by the chemical hardness η of the atom, γαα = 2ηα = Uα =

∂ 2 Eα ∂ 2 qα

(9.28)

where Eα is the energy of the isolated atom α. Uα , the Hubbard parameter, is twice the chemical hardness of atom α, which can be estimated from the difference in

C-C H-H C-H

0.4

γ [a.u.]

0.3

0.2

0.1 0

2

4

6

8

10

r [a.u.]

Fig. 9.2 Function γCC for two carbon atoms with the Hubbard parameter UC = 0.3647 a.u. and γHH for two hydrogen atoms with UH = 0.4195 a.u. over the interatomic distance. The function γCH differs from γCC and γHH for short interatomic distances. Clearly, the case RC−H = 0 a.u. will not appear in a calculation.

300

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

the ionization potential and the electron affinity of atom α. For SCC-DFTB, it is calculated using Janak’s theorem, by taking the first derivative of the energy of the highest-occupied molecular orbital with respect to the occupation number. Therefore, Eq. (9.26) implies that the extension of the charge distribution is inversely proportional to the chemical hardness of the respective atom (i.e., the size of an atom is inversely related to its chemical hardness). This is an important finding which is discussed in more detail below. The total SCC-DFTB finally reads 0 cμi cνi Hμν + E 2nd + E rep (9.29) E SCC-DFTB = iμν

9.3 PERFORMANCE OF STANDARD SCC-DFTB 9.3.1 Timings

The substantial advantage of using SCC-DFTB is its time/performance efficiency. Before showing the performance of several properties in the following subsections, Table 9.1 shows benchmark calculations for the CPU time of a single-point energy calculation on C60 , polyanaline, and some water clusters. All calculations were carried out on a single processor of a standard desktop PC. For SCC-DFTB the DFTB+ code32 was used. The DFT values were obtained using the TURBOMOLE program package.33 For the PBE functional calculations the resolution of the identity (RI) integral evaluation has been used.34 As a basis set for the DFT methods we chose 6-31G(d), which is a rather small basis set for practical use. Table 9.1 shows that SCC-DFTB is at least 250 times faster than RI-PBE and more than 1000 times faster than B3LYP. This acceleration is due primarily to two issues: (1) the use of a minimal basis set within SCC-DFTB, and (2) the tabulation and neglect of integrals. For the water cluster (H2 O)48 , for example, N = 288 basis functions are needed for a minimal basis set and N = 864 basis functions for the 6-31G(d) basis set. The time-limiting step for obtaining the TABLE 9.1

Calculation Time (s) for Various Molecules with DFT and SCC-DFTB

Molecule

na

SCC-DFTB

RI-PBEb

B3LYPb,c

C60 d (Ala)10 e (Ala)20 e (H2 O)48 f (H2 O)123 f

60 112 212 144 369

1 4 12 3 15

1,112 966 3,418 769 5,488

9,398 6,655 27,605 3,466 30,822

a

Number of atoms. Basis set 6-31G(d). c B3LYP_Gaussian keyword in TURBOMOLE. d Buckminsterfullerene C . 60 e Polyalanine in α-helical form and including capping groups. f Water cluster. b

PERFORMANCE OF STANDARD SCC-DFTB

301

energy with all methods discussed here is a matrix diagonalization, which scales with N 3 . Thus, an acceleration just from using the minimal basis of the factor 27 is achieved. The remaining factor is due to the tabulation and neglect of integrals; in this example this factor is roughly 10 and 40, for comparison with RI-PBE and B3LYP, respectively. 9.3.2 Small Organic Molecules

SCC-DFTB has been tested for various properties of small organic molecules, such as heats of formations, geometries, vibrational frequencies, and dipole moments, as documented in several recent publications. It should be noted that all these test sets contain a large number of molecules, representative of many chemical bonding situations. In general, SCC-DFTB is excellent in reproducing geometries. Also, reaction energies are reproduced reasonably well on average,9,35 while heats of formation are overestimated, owing to the overbinding tendency of SCC-DFTB. Recently, the SCC-DFTB heats of formation have been tested systematically. It turned out that reparametrization of atomic contributions can improve the performance for heats of formation significantly; however, refined NDDO methods such as OM236 or PDDG/PM337 are still superior to SCC-DFTB in this respect.29,38 For a set of 622 neutral molecules containing the elements C, H, N, and O, Sattelmeyer et al. found a mean absolute error (MAE) in heats of formation for PDDG/PM3 of 3.2 and 5.8 kcal mol−1 for SCC-DFTB.38 Similarly, for a set of 140 CHNO-containing molecules, the respective mean absolute errors for OM2 and SCC-DFTB are 3.1 and 7.7 kcal mol−1 .29 The performance of SCC-DFTB for vibrational frequencies, although reasonable on average, is less satisfactory than for geometries. However, vibrational frequencies could also be improved significantly after reparametrization.39 The MAE for harmonic vibrational frequencies of 14 hydrocarbons drops from 59 cm−1 for the standard parameterization to 33 cm−1 for the reparameterized version. The MAE for the GGA-functional BLYP with the Dunning-type basis set cc-pVTZ is 25 cm−1 . Currently, parameters are available for O, N, C, H,9 S,40 Zn,28 Mg,41 and many transition metals.42 9.3.3 Peptides

A good performance for small molecules does not guarantee a good description for larger molecules. A good example are the structures and relative energies of peptides, which pose significant problems for semiempirical models such as AM143 and PM344 but are well described at the SCC-DFTB level,45,46 or more elaborate NDDO methods such as OM147 OM2.36,48 Therefore, the performance for small organic molecules does not necessarily tell much about the performance for larger complexes, and SE methods should be benchmarked carefully before applying them to new classes of molecules.

302

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

9.3.4 Hydrogen-Bonded Systems

Standard SCC-DFTB slightly underestimates the dipole moments of polar molecules, as discussed, for example, for peptides.45,46,49 This leads to a slight underestimation of binding energies of weak hydrogen-bonded complexes18,49 by 1 to 2 kcal mol−1 (e.g., the binding of the water dimer is found to be 3.3 kcal mol−1 , in contrast to 5 kcal mol−1 at a high computational level). Also, relative energies of peptide conformations are underestimated due to this error. It should be noted that this underestimation is quite systematic (i.e., the relative stability of different conformers is preserved).

9.4 EXTENSIONS OF STANDARD SCC-DFTB 9.4.1 Inclusion of Dispersion Forces

SCC-DFTB is derived from DFT and therefore inherits the well-known failures of the gradient-corrected (GGA) DFT functionals. This concerns the problem of overpolarizability,16 the problem of charge transfer and ionic excited states,50 and deficiencies in describing van der Waals interactions. These problems have been reviewed briefly by Elstner.20 Dispersion interactions become important for larger molecules, since they stabilize more complex structures. Therefore, we proposed to include them empirically on top of DFT and implemented this for SCC-DFTB.18 This approach was adopted to DFT later51,52 and has become increasingly available in many DFT codes. We have shown that DFT would fail to describe the stacking interaction between DNA bases without proper inclusion of dispersion interactions.18 DNA would not be stable. Surprisingly, dispersion interactions are also vital for stable peptide and protein structures. Neglecting dispersion forces, many peptide and protein conformations would not be stable; that is, standard DFT and SCC-DFTB are not able to describe the structure and dynamics of complex biological matter (and other materials, where dispersion forces are important). To include dispersion forces, simple two-body potentials with 1/R 6 dependence are added to the DFTB total energy. However, they have to be damped using a properly chosen damping function f (Rαβ ) for short distances18 : E SCC-DFTB-D = E SCC-DFTB −

α=β

f (Rαβ )

6 Cαβ

Rαβ

(9.30)

6 being properly chosen van der Waals parameters. Note that including with Cαβ such an extension to DFT leads to very different results, depending on the DFT functional used for exchange and dispersion.51 Only a properly chosen scaling function leads to quantitatively satisfying results.52 More details may be found elsewhere.20

EXTENSIONS OF STANDARD SCC-DFTB

303

9.4.2 Beyond Standard Second-Order DFTB

The approximation of the second derivatives of the total DFT energy by the γ function in order to model charge-transfer effects contains several approximations. As we have discussed in detail, the use of the γ function implicitly assumes that the size of an atom is represented by the inverse of the Hubbard (chemical hardness) parameter Uα , which enters the γ function.20,53 This relation holds quite well for many main-group elements but is completely wrong for the hydrogen atom.53 Therefore, the function γ has been modified to account for this irregularity. This leads to a significant improvement in hydrogen-bonding energies. The large error of 1 to 2 kcal mol−1 per hydrogen bond in the standard SCC-DFTB scheme can be reduced to about 0.5 kcal mol−1 using the modified γ function. Whereas for the description of hydrogen bonds a second-order expansion of total energy seems to be adequate, the calculation of proton affinities have been shown to be largely in error. This property is crucial, however, for an appropriate description of proton transfer reactions, and semiempirical methods in general have problems predicting this value accurately.54 The second-order approximation of DFTB works well for many systems, including charged systems, where the charge is delocalized over extended molecular fragments. For charged molecules, however, where the charge is localized, this approximation breaks down. It has been shown that for these cases the total energy [Eq. (9.21)] has to be expanded up to third order in the density fluctuations.20,53,55 This is crucial in particular for the calculation of deprotonation energies, where the inclusion of third-order terms leads to significant improvement. For example, the deprotonation energy of water is in error by nearly 30 kcal/mol in standard SCC-DFTB, whereas it has an error of a few kcal mol−1 in the third-order formulation. Formally, the expansion of the DFT total energy is carried out up to third order, and similar approximations are made as in the second-order case.53 In third order, the Hubbard parameter Uα becomes charge dependent. Since 1/Uα reflects the atom size, the charge dependence of Uα can account for the larger size of anions compared to neutral atoms or cations. In third-order DFTB, a new parameter occurs, the derivative of the Hubbard parameter, which can be calculated from DFT53 or fitted to minimize the error in the deprotonation energies of a suitably chosen reference set of molecules.55 9.4.3 Excited States via Time-Dependent DFT

The core of SCC-DFTB is an efficient approximation of the second derivatives of the total energy by the function γαβ . Such a second derivative also appears in the TD-DFT linear response formalism, which makes it possible to compute excited-state energies within the DFT framework. We have implemented this formalism for SCC-DFTB,40 finding surprisingly good results for singlet excitations at very low computational cost, while the problems of TD-DFT for

304

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

higher excitation, charge transfer, and ionic excited states are retained.50 More details are available in a recent review by Niehaus.56 9.4.4 QM/MM Methods

To effectively represent coupling between the environment and the quantum region, quantum mechanical methods have been coupled to empirical force-field methods in the QM/MM methods. Although introduced as early as in 1976,57 it was not until the early 1990s that QM/MM methods became widely used in the study of biological systems (a recent comprehensive review can be found in Ref. 2). Several QM/MM implementations with SCC-DFTB as the QM part have been realized up to now, incorporating it into various empirical force-field packages.58 – 62 But even for QM/MM approaches using SE methods as QM, the collective reorganization in the environment can become a computational bottleneck. Therefore, much effort is invested in developing multiscale methods, which combine QM/MM with continuum electrostatic methods (CM) for an integrated treatment of large systems. DFTB QM/MM coupling to CHARMM has been combined with a continuum approach,63,64 the generalized solvent-boundary potential developed originally by Roux and co-workers65 for classical simulations. The SCC-DFTB/MM methodology19,20 as well as the SCC-DFTB/MM/CM methodology63,66 has recently been reviewed. 9.5 CONCLUSIONS

SCC-DFTB is a semiempirical method derived from DFT-GGA. This means that all deficiencies of DFT-GGA are inherited directly. Note that SCC-DFTB applies pure GGA functionals (PBE) (i.e., no hybrid variant is available), which can ameliorate these failures to some degree. On the other hand, SCC-DFTB also inherits the merits of DFT, its conceptual simplicity in incorporating correlation effects, and its good performance for many molecular properties of interest. As a result, SCC-DFTB predicts molecular geometries surprisingly well; vibrational frequencies are also satisfactory. Reproduction of heats of formation for small organic molecules is comparable to the performance of modern semiempirical methods, although new variants such as PDDG-PM3 or OM2 are still slightly superior in this respect. It should be noted that approximate methods should be carefully benchmarked for classes of molecules and not applied blindly.† REFERENCES 1. Bowler, D. R.; Aoki, M.; Goringe, C. M.; Horsfield, A. P.; Pettifor, D. G. Model. Simul. Mater. Sci. Eng. 1997, 5 , 199. † This also applies to DFT methods (although to a lesser degree), since their approximate nature leads to a variety of problems and failures.

REFERENCES

305

2. Senn, H. M.; Thiel, W. Curr. Opin. Chem. Biol . 2007, 11 , 182. 3. Senn, H. M.; Thiel, W. Angew. Chem. Int. Ed . 2009, 48 , 1198. 4. Elstner, M.; Cui, Q. Multi-scale Methods for the Description of Chemical Events in Biological Systems, Multiscale Simulation Methods in Molecular Sciences, NIC-Serie, Publikationsreihe des John von Neumann-Instituts f¨ur Computing, J¨ulich, Germany, 2009. 5. Goringe, C. M.; Bowler, D. R.; Hernandez, E. Rep. Prog. Phys. 1997, 60 , 1447. 6. Colombo, L. Riv. Nuovo Cimento Soc. Ital. Fisi . 2005, 28 , 1. 7. Harris, J. Phys. Rev. B 1985, 31 , 1770. 8. Foulkes, W. M. C.; Haydock, R. Phys. Rev. B 1989, 39 , 12520. 9. Elstner, M.; Porezag, D.; Jungnickel, G.; Elstner, J.; Haugk, M.; Frauenheim, T.; Suhai, S.; Seifert, G. Phys. Rev. B 1998, 58 , 7260. 10. Porezag, D.; Frauenheim, T.; K¨ohler, T.; Seifert, G.; Kaschner, R. Phys. Rev. B 1995, 51 , 12947. 11. Seifert, G.; Eschrig, H.; Bieger, W. Z. Phys. Chem. (Leipzig) 1986, 267 , 529. 12. Widany, J.; Frauenheim, T.; K¨ohler, T.; Sternberg, M.; Porezag, D.; Jungnickel, G.; Seifert, G. Phys. Rev. B 1996, 53 , 4443. 13. Seifert, G. J. Phys. Chem. A 2007, 111 , 5609. 14. Witek, H. A.; K¨ohler, C.; Frauenheim, T.; Morokuma, K.; Elstner, M. J. Phys. Chem. A 2007, 111 , 5712. 15. Perdew, J. P.; Burke, K.; Ernzerhof, M. Phys. Rev. Lett. 1996, 77 , 3865. 16. Wanko, M.; Hoffmann, M.; Frauenheim, T.; Elstner, M. J. Comput. Aided Mol. Des. 2006, 20 , 511. 17. Wanko, M.; Hoffmann, M.; Strodel, P.; Koslowski, A.; Thiel, W.; Neese, F.; Frauenheim, T.; Elstner, M. J. Phys. Chem. B 2005, 109 , 3606. 18. Elstner, M.; Hobza, P.; Frauenheim, T.; Suhai, S.; Kaxiras, E. J. Chem. Phys. 2001, 114 , 5149. 19. Elstner, M.; Frauenheim, T.; Suhai, S. J. Mol. Struct . (Theochem) 2003, 632 , 29. 20. Elstner, M. Theor. Chem. Acc. 2006, 116 , 316. 21. Frauenheim, T.; Seifert, G.; Elstner, M.; Niehaus, T.; K¨ohler, C.; Amkreutz, M.; Sternberg, M.; Hajnal, Z.; Di Carlo, A.; Suhai, S. J. Phys. Condens. Matter 2002, 14 , 3015. 22. Parr, R. G.; Yang, W. Density-Functional Theory of Atoms and Molecules; Oxford University Press, New York, 1989. 23. Eschrig, H. Optimized LCAO Method and Electronic Structure of Extended Systems, Springer-Verlag, Berlin, 1989. 24. Seifert, G. J. Phys. Chem. A 2007, 111 , 5609. 25. Seifert, G.; Porezag, D.; Frauenheim, T. Int. J. Quantum Chem. 1996, 58 , 185. 26. Frauenheim, T.; Weich, F.; K¨ohler, T.; Uhlmann, S.; Porezag, D.; Seifert, G. Phys. Rev. B 1995, 52 , 11492. 27. Slater, J. C.; Koster, G. F. Phys. Rev . 1954, 94 , 1498. 28. Elstner, M.; Cui, Q.; Munih, P.; Kaxiras, E.; Frauenheim, T.; Karplus, M. J. Comput. Chem. 2003, 24 , 565. 29. Otte, N.; Scholten, M.; Thiel, W. J. Phys. Chem. A 2007, 111 , 5751.

306

AN EFFICIENT APPROXIMATION OF DENSITY FUNCTIONAL THEORY

30. Knaup, J. M.; Hourahine, B.; Frauenheim, T. J. Phys. Chem. A 2007, 111 , 5637. 31. Gaus, M.; Chou, C.; Witek, H.; Elstner, M. J. Phys. Chem. A 2009, 113 , 11866. 32. DFTB+, a development of Bremen Center of Computational Material Science (Prof. Frauenheim), available at http://www.dftb.org. 33. TURBOMOLE V6.1 2009, a development of University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989–2007, TURBOMOLE GmbH, since 2007; available at http://www.turbomole.com. 34. Ahlrichs, R. Phys. Chem. Chem. Phys. 2004, 6 , 5119. 35. Kr¨uger, T.; Elstner, M.; Schiffels, P.; Frauenheim, T. J. Chem. Phys. 2005, 122 , 114110. 36. Weber, W.; Thiel, W. Theor. Chem. Acc. 2000, 103 , 495. 37. Repasky, M. P.; Chandrasekhar, J.; Jørgensen, W. L. J. Comput. Chem. 2002, 23 , 1601. 38. Sattelmeyer, K. W.; Tirado-Rives, J.; Jorgensen, W. L. J. Phys. Chem. A 2006, 110 , 13551. 39. Małolepsza, E.; Witek, H. A.; Morokuma, K. Chem. Phys. Lett. 2005, 412 , 237. 40. Niehaus, T. A.; Suhai, S.; Della Sala, F.; Lugli, P.; Elstner, M.; Seifert, G.; Frauenheim, T. Phys. Rev. B 2001, 6308 , 085108. 41. Cai, Z.; Lopez, P.; Reimers, J. R.; Cui, Q.; Elstner, M. J. Phys. Chem. A 2007, 111 , 5743. 42. Zheng, G.; Witek, H. A.; Bobadova-Parvanova, P.; Irle, S.; Musaev, D. G.; Prabhakar, R.; Morokuma, K.; Lundberg, M.; Elstner, M.; Khler, C.; Frauenheim, T. J. Chem. Theory Comput. 2007, 3 , 1349. 43. Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107 , 3902. 44. Stewart, J. J. P. J. Comput. Chem. 1989, 10 , 209. 45. Elstner, M.; Jalkanen, K.; Knapp-Mohammady, M.; Frauenheim, T.; Suhai, S. Chem. Phys. 2000, 256 , 15. 46. Elstner, M.; Jalkanen, K.; Knapp-Mohammadi, M.; Frauenheim, T.; Suhai, S. Chem. Phys. 2001, 263 , 203. 47. Kolb, M.; Thiel, W. J. Comput. Chem. 1993, 14 , 775. 48. M¨ohle, K.; Hofmann, H.-J.; Thiel, W. J. Comput. Chem. 2001, 22 , 509. 49. Elstner, M.; Frauenheim, T.; Kaxiras, E.; Seifert, G.; Suhai, S. Phys. Status Solidi B 2000, 217 , 357. 50. Wanko, M.; Garavelli, M.; Bernardi, F.; Niehaus, T. A.; Frauenheim, T.; Elstner, M. J. Chem. Phys. 2004, 120 , 1674. 51. Wu, Q.; Yang, W. J. Chem. Phys. 2002, 116 , 515. 52. Grimme, S. J. Comput. Chem. 2004, 25 , 1463. 53. Elstner, M. J. Phys. Chem. A 2007, 111 , 5614. 54. Range, K.; Riccardi, D.; Elstner, M.; Cui, Q.; York, D. Phys. Chem. Chem. Phys. 2005, 7 , 3070. 55. Yang, Y.; Yu, H.; York, D.; Cui, Q.; Elstner, M. J. Phys. Chem. A 2007, 111 , 10861. 56. Niehaus, T. A. J. Mol. Struct . (Theochem) 2009, 914 , 38. 57. Warshel, A.; Levitt, M. J. Mol. Biol . 1976, 103 , 227.

REFERENCES

307

58. Han, W.; Elstner, M.; Jalkanen, K. J.; Frauenheim, T.; Suhai, S. Int. J. Quantum Chem. 2000, 78 , 459. 59. Cui, Q.; Elstner, M.; Kaxiras, E.; Frauenheim, T.; Karplus, M. J. Phys. Chem. B 2001, 105 , 569. 60. Seabra, G. D. M.; Walker, R. C.; Elstner, M.; Case, D. A.; Roitberg, A. E. J. Phys. Chem. A 2007, 111 , 5655. 61. Hu, H.; Elstner, M.; Hermans, J. Proteins Struct. Funct. Genet. 2003, 50 , 451. 62. Liu, H.; Elstner, M.; Kaxiras, E.; Frauenheim, T.; Hermans, J.; Yang, W. Proteins Struct. Funct. Genet. 2001, 44 , 484. 63. Riccardi, D.; Schaefer, P.; Yang, Y.; Yu, H.; Ghosh, N.; Prat-Resina, X.; K¨onig, P.; Li, G.; Xu, D.; Guo, H.; Elstner, M.; Cui, Q. J. Phys. Chem. B 2006, 110 , 6458. 64. K¨onig, P. H.; Ghosh, N.; Hoffmann, M.; Elstner, M.; Tajkhorshid, E.; Frauenheim, T.; Cui, Q. J. Phys. Chem. A 2006, 110 , 548. 65. Im, W.; Berneche, S.; Roux, B. J. Chem. Phys. 2001, 114 , 2924. 66. Cui, Q. Theor. Chem. Acc. 2006, 116 , 51.

10

Introduction to Effective Low-Energy Hamiltonians in Condensed Matter Physics and Chemistry BEN J. POWELL Centre for Organic Photonics and Electronics, School of Mathematics and Physics, The University of Queensland, Queensland, Australia

In this chapter I discuss some simple effective Hamiltonians that have widespread applications to solid-state and molecular systems. Although meant to be an introduction to a beginning graduate student, I hope that it may also help to break down the divide between the physics and chemistry literatures. After a brief introduction to second quantization notation (Section 10.1), which is used extensively, I focus on the “four H’s”: the H¨uckel (or tight binding; Section 10.2), Hubbard (Section 10.3), Heisenberg (Section 10.4), and Holstein (Section 10.6) models. These models play central roles in our understanding of condensed matter physics, particularly for materials where electronic correlations are important but are less well known to the chemistry community. Some related models, such as the Pariser–Parr–Pople model, the extended Hubbard model, multiorbital models, and the ionic Hubbard model, are also discussed in Section 10.6. As well as their practical applications, these models allow us to investigate electronic correlations systematically by “turning on” various interactions in the Hamiltonian one at a time. Finally, in Section 10.7, I discuss the epistemological basis of effective Hamiltonians and compare and contrast this approach with ab initio methods before discussing the problem of the parameterization of effective Hamiltonians. As this chapter is intended to be introductory, I do not attempt to make frequent comparisons to the latest research problems; rather, I compare the predictions of model Hamiltonians with simple systems chosen for pedagogical reasons. Similarly, references have been chosen for their pedagogical and historical value rather than on the basis of scientific priority. Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

309

310

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Given the similarity in the problems addressed by theoretical chemistry and theoretical condensed matter physics, surprisingly few advanced texts discuss the interface of two subjects. Unfortunately, this leads to many cultural differences between the fields. Nevertheless, some textbooks do try to bridge the gap, and the reader in search of more than the introductory material presented here is referred to a book by Fulde1 and several other chapters in this book: Chapter 6 describes the state of the art in using density functional theory and ab initio Hartree–Fockbased approaches to the a priori evaluation of properties of systems involving strongly correlated electrons, and Chapter 4 describes ab initio approaches based on quantum Monte Carlo. 10.1 BRIEF INTRODUCTION TO SECOND QUANTIZATION NOTATION

The models discussed in this chapter are easiest to understand if one employs the second quantization formalism. In this section we introduce its basic formalism briefly and informally. More details may be found in many textbooks (e.g., Schatz and Ratner2 or Mahan3 ). Readers already familiar with this notation may wish to skip this section, although the last two paragraphs do define some nomenclature that is used throughout the chapter. 10.1.1 Simple Harmonic Oscillator

Let us begin by considering a particle of mass m moving in a one-dimensional harmonic potential: V (x) = 12 kx 2

(10.1)

This may be familiar as the potential experienced by an ideal spring displaced from its equilibrium position by a distance x , in which context k is known as the spring constant.4 Equation (10.1) is also the potential felt by an atom as it is displaced (by a small amount) from its equilibrium position in a molecule.5 Classically, this problem is straightforward to solve,4 and as well as the trivial solution, one finds that the particle may oscillate with a resonant frequency √ ω = k/m. The time-independent Schr¨odinger equation for a simple harmonic oscillator is therefore 2 1 pˆ + mω2 xˆ 2 ψn = En ψn (10.2) Hˆ sho ψn ≡ 2m 2 where pˆ = (/i)(∂/∂x) is the particle’s momentum and ψn is the nth wavefunction or eigenfunction, which has energy, or eigenvalue, En . This problem is solved in many introductory texts on quantum mechanics6 using the standard methods of “first quantized” quantum mechanics. However, a more elegant way to solve this problem is to introduce the ladder operator:

BRIEF INTRODUCTION TO SECOND QUANTIZATION NOTATION

aˆ ≡ and its hermitian conjugate: aˆ † ≡

pˆ mω xˆ + i √ 2 2mω

mω pˆ xˆ − i √ 2 2mω

311

(10.3)

(10.4)

One of the most important features of quantum mechanics is that momentum and ˆ x] ˆ ≡ pˆ xˆ − xˆ pˆ = −i). From this commutaposition do not commute6 (i.e., [p, tion relation it is straightforward to show that 1 Hˆ sho = ω aˆ † aˆ + (10.5) 2 and [a, ˆ aˆ † ] ≡ aˆ aˆ † − aˆ † aˆ = 1

(10.6)

ˆ = ω(aˆ † aˆ + 12 ), aˆ = ω[aˆ † , a] ˆ aˆ = −ωa, ˆ in One can also show that [Hˆ sho , a] a similar manner. Therefore, [Hsho , a]ψ ˆ n = −ωaψ ˆ n , and hence Hˆ sho aψ ˆ n = (En − ω)aψ ˆ n

(10.7)

Equation (10.7) tells us that aψ ˆ n is an eigenstate of Hˆ sho with energy En − ω, provided that aψ ˆ n = 0. That is, the operator aˆ moves the system from one eigenstate to another whose energy is lower by ω; thus, aˆ is known as the lowering or destruction operator. Note that for any wavefunction φ, φ|pˆ 2 |φ ≥ 0 and φ|xˆ 2 |φ ≥ 0. Therefore, it follows from Eq. (10.2) that En ≥ 0 for all n. Hence, there is a lowest energy state, or ground state, which we will denote as ψ0 . Therefore, there is a limit to how often we can keep lowering the energy of the state, (i.e., aψ ˆ 0 = 0). We can now calculate the ground-state energy of the harmonic oscillator, (10.8) Hˆ sho ψ0 = ω aˆ † aˆ + 12 ψ0 = 12 ω In the same way as we derived Eq. (10.7), one can easily show that Hsho aˆ † ψn = (En + ω)aˆ † ψn . Therefore, aˆ † moves us up the ladder of states that aˆ moved us down. Hence aˆ † is known as a raising or creation operator. Thus, we have √ (10.9) aˆ † ψn = n + 1 ψn+1 and √ aψ ˆ n = nψn−1 (10.10) where the terms inside the radicals are required for the correct normalization of √ the wavefunctions.7 Therefore, ψn = (1/ n!)(aˆ † )n ψ0 and

312

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

En = ω n + 12

(10.11)

Notice that above we solved the simple harmonic oscillator (i.e., calculated the energies of all of the eigenstates) without needing to find explicit expressions for any of the first quantized eigenfunctions, ψn . This general feature of the second quantized approach is extremely advantageous when we are dealing with the complex many-body wavefunctions typical in condensed matter physics and chemistry. 10.1.2 Second Quantization for Light and Matter

We can extend the second quantization formalism to light and matter. Let us first consider bosons, which are not subject to the Pauli exclusion principle (e.g., phonons, photons, deuterium nuclei, 4 He atoms). We define the bosonic field ˆ annihilates a operator bˆ † (r) as creating a boson at position r; similarly, b(r) boson at position r. The bosonic field operators obey the commutation relations ˆ ˆ )] = 0, [bˆ † (r), bˆ † (r )] = 0, and [b(r), b(r ˆ [b(r), bˆ † (r )] = δ(r − r )

(10.12)

This is just the generalization of Eq. (10.6) for the field operators. We can create any state by acting products, or sums of products, of the bˆ † (r) on the vacuum state (i.e., the state that does not contain any bosons), which is usually denoted as |0. Many body wavefunctions for fermions (e.g., electrons, protons, neutrons, 3 He atoms) are complicated by the need for the antisymmetrization of the wavefunction (i.e., the wavefunction must change sign under the exchange of any two ˆ † (r) and fermions). Therefore, if we introduce the fermionic field operators ψ ˆ ψ(r), which, respectively, create and annihilate fermions at position r, we must make sure that any wavefunction that we can make by acting on some set of these operators on the vacuum state is properly antisymmetrized. This is ensured8 if one insists that the field operators anticommute, that is, if ˆ ψ ˆ † (r ) + ψ ˆ † (r )ψ(r) ˆ ˆ ˆ † (r )} ≡ ψ(r) = δ(r − r ) {ψ(r), ψ ˆ ˆ )} = 0 {ψ(r), ψ(r

ˆ†

ˆ†

{ψ (r), ψ (r )} = 0

(10.13) (10.14) (10.15)

This guarantee of an antisymmetrized wavefunction is one of the most obvious advantages of the second quantization formalism, as it is much easier than having to deal with the Slater determinants that are typically used to ensure the antisymmetrization of the many-body wavefunction in the first quantized formalism.2 For any practical calculation one needs to work with a particular basis set, {φi (r)}. The field operators can be expanded in an arbitrary basis set as

BRIEF INTRODUCTION TO SECOND QUANTIZATION NOTATION

ˆ ψ(r) =

313

cˆi φi (r)

(10.16)

cˆi† φ∗i (r)

(10.17)

i

ˆ † (r) = ψ

i

Thus, cˆi(†) annihilates (creates) a fermion in the state φi (r). These operators also obey fermionic anticommutation relations, {cˆi , cˆj† } = δij

(10.18)

{cˆi , cˆj } = 0

(10.19)

{cˆi† , cˆj† } = 0

(10.20)

As fermions obey the Pauli exclusion principle, there can be at most one fermion in a given state. We denote a state in which the i th basis function contains zero (one) particles by |0i (|1i ). Therefore, cˆi |1i = |0i cˆi |0i = 0 † cˆi |0i = |1i cˆi† |1i = 0

(10.21)

It is important to realize that the number 0 is very different from the state |0i . Any operator acting on a system of fermions can be expressed in terms of the cˆ operators. A particularly important example is the number operator, nˆ i ≡ cˆi† cˆi , which simply counts the number of particles in the state i , as can be confirmed by explicit calculation from Eqs. (10.21). The total number of particles

in the system is therefore simply the expectation value of the operator Nˆ = i nˆ i = i cˆi† cˆi . Importantly, because we can write any operator in terms of the cˆ operators, we can calculate any observable from the expectation value of some set of cˆ operators. Thus we have access to a complete description of the system from the second quantization formalism. Further, we can always write the wavefunction in terms of the cˆ operators if an explicit description of the wavefunction is required. For example, the sum of Slater determinants, φ (r ) φ2 (r1 ) + β φ3 (r1 ) φ4 (r1 ) (r1 , r2 ) = α 1 1 (10.22) φ1 (r2 ) φ2 (r2 ) φ3 (r2 ) φ4 (r2 ) describes the same state as | = (αcˆ1 cˆ2 + βcˆ3 cˆ4 )|0

(10.23)

where |0 = |01 , 02 , 03 , 04 , . . . is the vacuum state, as (r1 , r2 ) = r1 , r2 | (cf., e.g., Ref. 7). Often, in order to describe solid-state and chemical systems, one needs to describe a set of N electrons whose behavior is governed by a Hamiltonian of the form

314

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

H =

N n=1

⎤ 2 2 1 ∇ n ⎣− + U (rn ) + V (rn − rm )⎦ 2m 2 m=n ⎡

(10.24)

where V (rn − rm ) is the potential describing the interactions between electrons and U (ri ) is an external potential (including interactions with ions or nuclei, which may often be considered to be stationary on the time scales relevant to electronic processes, although we discuss effects due to the displacement of the nuclei in Section 10.6). In terms of our second quantization operators, this Hamiltonian may be written Hˆ = −

ij

tij cˆi† cˆj +

1 Vij kl cˆi† cˆk† cˆl cˆj 2

(10.25)

ij kl

where tij = −

d

Vij kl =

rφ∗i (r)

3

d 3r1

2 ∇ 2 − + U (r) φj (r) 2m

d 3 r2 φ∗i (r1 )φj (r1 )V (r1 − r2 )φ∗k (r2 )φl (r2 )

(10.26) (10.27)

and the labels i, j, k , and l are taken to define the spin as well as the basis function. This is exact, provided that we have an infinite complete basis. But practical calculations require the use of finite basis sets and often use incomplete basis sets. The simplest approach is simply to ignore this problem and calculate tij and Vij kl directly from the finite basis set. However, this is often not the best approach. We delay until Section 10.7 a detailed discussion of why this is and of the deep philosophical issues that it raises. We also delay until Section 10.7 discussion of how to calculate these parameters. Until then we simply assume that tij , Vij kl , and similar parameters required are known and focus instead on how to perform practical calculations using models of the form of Eq. (10.25) and closely related Hamiltonians. In what follows we assume that the states created by the cˆi† operators form an orthonormal basis. This greatly simplifies the mathematics but differs from the approach usually taken in introductory chemistry textbooks, as most quantum chemical calculations are performed in nonorthogonal bases for reasons of computational expedience. ¨ 10.2 HUCKEL OR TIGHT-BINDING MODEL

The simplest model with the form of Eq. (10.25) is usually called the H¨uckel model in the context of molecular systems9 and the tight-binding model in the context of crystals.10 In these models one makes the approximation that Vij kl = 0 for all i, j, k , and l . Therefore, these models explicitly neglect interactions between

¨ HUCKEL OR TIGHT-BINDING MODEL

315

electrons. The models are identical, but slightly different notation is standard in the different traditions. We assume that our basis set consists of orbitals centered on particular sites, as we will for all of the models considered in this chapter. These sites might, for example, be atoms in a molecule or solid, chemical groups within a molecule, p-d hybrid states in a transition metal oxide, entire molecules in a molecular crystal, or even larger structures. We will often use a nomenclature motivated by the case where the sites are atoms below; however, this does not mean that the mathematics is only applicable to that case. In the simplest case of only one orbital per spin state on each site † Hˆ tb = − tij cˆiσ cˆj σ (10.28) ij σ (†) annihilates (creates) an electron with spin σ in an orbital centered on where cˆiσ site i .

10.2.1 Molecules (the Huckel Model) ¨

The standard notation in this context is tii = −αi , tij = −βij if sites i and j are connected by a chemical bond, and tij = 0 otherwise. Note that the subscripts on α and β are also often dropped, but they are usually implicit; if the molecule contains more than one species of atom, the α’s will clearly be different on the different species and the β’s will depend on the species of each of the atoms between which the electron is hopping. Therefore, † † αi cˆiσ cˆiσ + βij cˆiσ cˆj σ (10.29) Hˆ H¨uckel = ij σ

iσ

where ij serves to remind us that the sum is only over those pairs of atoms joined by a chemical bond. Note that βij is typically negative. 10.2.1.1 Molecular Hydrogen Clearly, in H2 there is only a single atomic species. In this case one can set αi = α for all i without loss of generality. Further, as there is also only a single bond, we may choose βij = β, giving

Hˆ H¨uckel = α

σ

(nˆ 1σ + nˆ 2σ ) + β

σ

† † (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ )

(10.30)

where we have labeled the two atomic sites 1 and 2. This Hamiltonian has two eigenstates: one is known as the bonding orbital , 1 † † + cˆ2σ )|0 |ψbσ = √ (cˆ1σ 2 and the other is known as the antibonding orbital ,

(10.31)

316

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

antibonding orbital β β atomic orbital

atomic orbital bonding orbital

Fig. 10.1 (color online) Energy levels of the atomic and molecular orbitals in the H¨uckel description of H2 . The bonding orbital is |β| lower in energy than the atomic orbital, whereas the antibonding orbital is |β| higher in energy than the atomic orbital. Therefore, neutral H2 is stabilized by 2|β| relative to 2H.

1 † † |ψaσ = √ (cˆ1σ − cˆ2σ )|0 2

(10.32)

The bonding orbital has energy α + β, whereas the antibonding orbital has energy α − β. Recall that β < 0; therefore, every electron in the bonding state stabilizes the molecule by an amount |β|, whereas electrons in the antibonding state destabilize the molecule by an amount |β|, hence the nomenclature.† This is sketched in Fig. 10.1. Because Vij kl = 0, the electrons are noninteracting, so the molecular orbitals are not dependent on the occupation of other orbitals. Therefore to calculate the total energy of the ground state of the molecule, one simply fills up the states, starting with the lowest-energy states and respecting the Pauli exclusion principle. If the two protons are infinitely separated, β = 0 and the system has total energy N α, where N is the total number of electrons. H2 + has only one electron, which, in the ground state, will occupy the bonding orbital, so H2 + has a binding energy of β. H2 has two electrons; in the ground state these electrons have opposite spin and therefore can both occupy the bonding orbital. Thus, H2 has a binding energy of 2β. H2 − has three electrons, so while two can occupy the bonding state, one must be in the antibonding state; therefore, the binding energy is only β. Finally, H2 2− has four electrons, so one finds two in the each molecular orbital. Therefore, the bonding energy is zero: the molecule is predicted to be unstable. Thus, the H¨uckel model makes several predictions: neutral H2 is predicted to be significantly more stable than any of the ionic states; the two singly ionic species are predicted to be equally stable; and the doubly cationic species is predicted to be unstable. Further, the lowest optical absorption is expected to correspond to the transition between the bonding orbital and the antibonding † Note that in a nonorthogonal basis, the antibonding orbital may be destabilized by a greater amount than the bonding orbital is stabilized.

¨ HUCKEL OR TIGHT-BINDING MODEL

317

orbital. The energy gap for this transition is 2|β|. Therefore, the lowest optical absorption is predicted to be the same in the neutral species as in the singly cationic species. Further, this absorption is predicted to occur at a frequency with the same energy as the heat of formation for the neutral species. Although these predictions do capture qualitatively what is observed experimentally, they are certainly not within chemical accuracy (i.e., within kB T ∼ 1 kcal mol−1 ∼ 0.03 eV for T = 300 K). For example, the experimentally determined binding energies9 are 2.27 eV for H2 + , 4.74 eV for H2 , and 1.7 eV for H2 − , while H2 2− is indeed unstable. 10.2.1.2 π-Huckel Theory of Benzene For many organic molecules a model ¨ known as π-H¨uckel theory is very useful. In π-H¨uckel theory one considers only the π-electrons. A simple example is a benzene molecule. The hydrogen atoms have no π-electrons and therefore are not represented in the model. This leaves only the carbon atoms, so again we can set αi = α and βij = β. Because of the ring geometry of benzene (and assuming that the molecule is planar), the Hamiltonian becomes † † nˆ iσ + β (cˆiσ cˆi+1σ + cˆi+1σ cˆiσ ) (10.33) Hˆ H¨uckel = α iσ

iσ

where the addition in the site index is defined modulo six (i.e., site seven is site one). For benzene we have six solutions per spin state: 1 † † † † † † |ψA2u = √ (cˆ1σ + cˆ2σ + cˆ3σ + cˆ4σ + cˆ5σ + cˆ6σ )|0 6 1 † † † † † † + εcˆ2σ + ε2 cˆ3σ − cˆ4σ − εcˆ5σ − ε2 cˆ6σ )|0 |ψE1g = √ (cˆ1σ 6 1 † † † † † † = √ (c ˆ1σ − ε2 cˆ2σ − εcˆ3σ − cˆ4σ + ε2 cˆ5σ + εcˆ6σ )|0 |ψE1g 6 1 † † † † † † + ε2 cˆ2σ − εcˆ3σ + cˆ4σ + ε2 cˆ5σ − εcˆ6σ )|0 |ψE2u = √ (cˆ1σ 6 1 † † † † † † = √ (c + ε2 cˆ3σ + cˆ4σ − εcˆ5σ + ε2 cˆ6σ )|0 |ψE2u ˆ1σ − εcˆ2σ 6 and

1 † † † † † † − cˆ2σ + cˆ3σ − cˆ4σ + cˆ5σ − cˆ6σ )|0 |ψB2g = √ (cˆ1σ 6

where ε = eiπ/3 . These wavefunctions are sketched in Fig. 10.2. The energies of = α − |β|, EE these states are EA2u = α − 2|β|, EE1g = EE1g 2u = EE2u = α + 11,12 for the group |β|, and EB2g = α + 2|β|. The subscripts are symmetry labels D6h ; one should recall that because we are dealing with π-orbitals, all of the orbitals sketched here are antisymmetric under reflection through the plane of

318

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Fig. 10.2 (color online) Molecular orbitals for benzene from π-H¨uckel theory. Different colors indicate a change in sign of the wavefunction. In the neutral molecule the A2u and both E1g states are occupied, while the B2g and E2u states are virtual. Note that we have taken real superpositions9 of the twofold degenerate states in these plots.

the page. The degenerate (E1g and E2u ) orbitals are typically written or drawn rather differently (see Lowe and Peterson9 ). However, any linear combination of degenerate eigenstates is also an eigenstate; this representation was chosen as it highlights the symmetry of the problem. For a more detailed discussion of this problem, see Coulson’s Valence.13 10.2.1.3 Electronic Interactions and Parameterization of the Huckel Model ¨ As noted above, the H¨uckel model does not explicitly include interactions between electrons. This leads to serious qualitative and quantitative failures of the model, some of which we have seen above and discuss further below. However, given the (mathematical and conceptual) simplicity and the computational economy of the method, one would like to improve the method as far as possible. So far we have treated the theory as parameter free. However, if we treat the model as a semiempirical method instead, we can include some of the effects due to electron–electron interactions without greatly increasing the computational cost of the method. For example, one can make α dependent on the charge on the atom. This is reasonable, as the more electrons we put on an atom, the more difficult it is to add another, due to the additional Coulomb repulsion from the extra electrons. The simplest way to account for this is by use of the ω technique,9 where one replaces

αi → αi = αi + ω(q0 − qi )β

(10.34)

¨ HUCKEL OR TIGHT-BINDING MODEL

319

where qi is the charge on atom i, q0 is a (fixed) reference charge, and ω is a parameter. The ω technique suppresses the unphysical fluctuations of the electron density, which are often predicted by the H¨uckel model (cf. the discussion of H2 above). Similar techniques can also be applied to β. These parameterizations only slightly complicate the model and do not lead to a major inflation of the computational cost, but can significantly improve the accuracy of the predictions of the H¨uckel model.14 10.2.2 Crystals (the Tight-Binding Model)

For infinite systems it is necessary to work with a fixed chemical potential rather than a fixed particle number. Therefore, before we discuss the tight-binding model, we briefly review the chemical potential (see also the discussion by Aktins and de Paula5 of the chemical potential in a chemical context). 10.2.2.1 Chemical Potential When one is dealing with a large system, keeping track of the number of particles can become difficult. This is particularly true in the thermodynamic limit, where the number of electrons Ne ≡ Nˆ → ∞ and the volume of the system V → ∞ in such a way as to ensure that the electronic density, ne = Ne /V , remains constant. Lagrange multipliers15 are a powerful and general method for imposing constraints on differential equations (such as the Schr¨odinger equation) without requiring the solution of integrodifferential equations. Briefly, consider a function, f (x, y, z, . . .) that we wish to extremize (minimize or maximize) subject to a constraint which means that x, y, z, . . . are no longer independent. In general, we may write the constraint in the form φ(x, y, z, . . .) = 0. This allows us to define the function g(x, y, z . . . , λ) ≡ f (x, y, z, . . .) + λφ(x, y, z, . . .), where λ is known as a Lagrange multiplier. One may show15 that the extremum of g(x, y, z, . . . , λ) with respect to x, y, z, . . . and λ is the extremum of f (x, y, z, . . .) with respect to x, y, z, . . . subject to the constraint that φ(x, y, z, . . .) = 0. Typically, the problem we wish to solve in chemistry and condensed matter physics is to minimize the free energy, F (which reduces to the energy, E , at T = 0) subject to the constraint of having a fixed number of electrons (determined by the chemistry of the material in question). This suggests that one should simply introduce a Lagrange multiplier to resolve the difficulty of constraining the number of electrons in the thermodynamic limit. A suitable constraint could be introduced by adding the term λ(N0 − Nˆ ) to the Hamiltonian, where N0 is the chemically required number of electrons, and requiring that the free energy is an extremum with respect to λ. However, one can also impose the same constraint and achieve additional physical insight by subtracting the term μNˆ from the Hamiltonian and requiring that

N0 = −

∂F ∂μ

(10.35)

320

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

The chemical potential (for electrons), μ, is then given by μ=−

∂F ∂Ne

(10.36)

Therefore, specifying a system’s chemical potential is equivalent to specifying the number of electrons, but provides a far more powerful approach for bulk systems. Physically, this approach is equivalent to thinking of the system as being attached to an infinite bath of electrons (i.e., one is working in the grand canonical ensemble).16 Thus, the Fermi distribution for the system is given by f (E, T ) =

1 1+

e(E−μ)/kB T

(10.37)

Therefore, at T = 0 all of the states with energies lower than the chemical potential are occupied, and all of the states with energies greater than the chemical potential are unoccupied. Therefore, the Fermi energy, EF = μ(T = 0). Note that as F is temperature dependent, Eq. (10.36) shows that, in general, μ will also be temperature dependent.† Nevertheless, Eq. (10.37) gives a clear interpretation of the chemical potential at any nonzero temperature: μ(T ) is the energy of a state with a 50% probability of occupation at temperature T . 10.2.2.2 Tight-Binding Model For periodic systems (crystals) one usually refers to the H¨uckel model as the tight-binding model. Often, one only considers models with nearest-neighbor terms; that is, one takes tii = −εi , tij = t if i and j are at nearest-neighbor sites, and tij = 0 otherwise. Thus, for nearest-neighbor hopping only, † † Hˆ tb − μNˆ = −t cˆiσ cˆj σ + (εi − μ)cˆiσ cˆiσ (10.38) ij σ

iσ

where μ is the chemical potential and ij indicates that the sum is over nearest neighbors only. Further, if we consider materials with only a single atomic species, we can set εi = 0, yielding † † Hˆ tb − μNˆ = −t cˆiσ cˆj σ − μ cˆiσ cˆiσ (10.39) ij σ

iσ

10.2.2.3 One-Dimensional Chain The simplest infinite system is a chain with nearest-neighbor hopping only. As we are on a chain, the sites have a natural ordering and the Hamiltonian may be written as

† In

contrast, as EF is only defined at T = 0, it is not temperature dependent.

¨ HUCKEL OR TIGHT-BINDING MODEL

Hˆ tb − μNˆ = −t

† † (cˆiσ cˆi+1σ + cˆi+1σ cˆiσ ) − μ

iσ

† cˆiσ cˆiσ

321

(10.40)

iσ

We can solve this model exactly by performing a lattice Fourier transform. We begin by introducing the reciprocal space creation and annihilation operators:

and

1 cˆkσ eikRi cˆiσ = √ N k

(10.41)

1 † −ikRi † cˆiσ =√ cˆkσ e N k

(10.42)

where k is the lattice wavenumber or crystal momentum and Ri is the position of the i th lattice site. Therefore, 1 † cˆ cˆk σ ei(k −k)Ri [−t (eik a + e−ika ) − μ] Hˆ tb − μNˆ = N kσ

(10.43)

ikk σ

where a is the lattice constant (i.e., the distance between neighboring sites Ri and Ri+1 ). As (1/N ) i ei(k −k)Ri = δ(k − k),17 therefore, † † Hˆ tb − μNˆ = [−2t cos(ka)cˆkσ cˆkσ − μcˆkσ cˆkσ ] kσ

=

† (εk − μ)cˆkσ cˆkσ

(10.44)

kσ

where εk = −2t cos ka is known as the dispersion relation. Notice that Eq. (10.44) is diagonal (i.e., it depends only on number operator terms, † cˆkσ ). Therefore, the energy is just the sum of εk for the states kσ that nkσ = cˆkσ are occupied, and we have solved the problem. We plot the dispersion relation in Fig. 10.3a. For a tight-binding model, calculating the dispersion relation is equivalent to solving the problem. The chemical potential, μ, must be chosen to ensure that there are the physically required number of electrons. Changing the chemical potential has the effect of moving the Fermi energy up or down the band and hence changing the number of electrons in the system. For example (cf. Fig. 10.3b to d), in the problem above, the half-filled band corresponds to μ = 0, the quarter-filled band corresponds to μ = −t, and the three-quarter-filled band corresponds to μ = t. 10.2.2.4 Square, Cubic, and Hypercubic Lattices In more than one dimension the notation becomes slightly more complicated, but the mathematics does not necessarily become any more difficult. The simplest generalization of the chain we have solved above is the two-dimensional square lattice, where † † cˆiσ cˆj σ − μ cˆiσ cˆiσ (10.45) Hˆ tb − μNˆ = −t ij σ

iσ

322

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

(a) 2t

εk

t

0

–t

–2t –3

–2

–1

0 ka

1

2

3

–3

–2

–1

0 ka

1

2

3

–3

–2

–1

0 ka

1

2

3

(b) 2t

εk

t 0

–t

–2t

(c) 2t

εk

t

0

t

–2t

Fig. 10.3 (color online) (a) The dispersion relation, εk = −2t cos(ka), of the onedimensional tight-binding chain with nearest neighbour hopping only. (b) Shaded area shows the filled states for μ = t. (c) Shaded area shows the filled states for μ = t. (d) Shaded area shows the filled states for μ = t.

¨ HUCKEL OR TIGHT-BINDING MODEL

323

(d) 2t

εk

t

0

–t

–2t –3

–2

–1

0 ka

1

2

3

Fig. 10.3 (color online) (continued )

Recall that ij indicates that the sum is over nearest neighbors only. To solve this problem we simply generalize our reciprocal lattice operators to 1 cˆkσ eik·Ri cˆiσ = √ N k

(10.46)

1 † −ik·Ri † =√ cˆkσ e cˆiσ N k

(10.47)

where k = (kx , ky ) is the lattice wavevector or crystal momentum and Ri = (xi , yi ) is the position of the i th lattice site. We then simply repeat the process we used to solve the one-dimensional chain. As the lattice only contains bonds in perpendicular directions, the calculations for the x and y directions go through independently and one finds that Hˆ tb − μNˆ =

† (εk − μ)cˆkσ cˆkσ

(10.48)

kσ

where the dispersion relation is now εk = −2t (cos kx ax + cos ky ay ) and aν represents the lattice constants in the ν direction. A three-dimensional cubic lattice is not any more difficult. In this case, k = (kx , ky , kz ) and the solution is of the form of Eq. (10.48) but with εk = −2t (cos kx ax + cos ky ay + cos kz az ). Indeed, as long as we keep all the bonds mutually perpendicular, we can keep generalizing this solution to higher dimensions. This may sound somewhat academic, as no materials live in more than three dimensions, but the infinite-dimensional hypercubic lattice has become important in recent years because many models that include interactions can be solved exactly in infinite dimensions, as we discuss in Section 10.3.4.2.

324

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

(a)

(b)

(c)

(d)

Fig. 10.4 (color online) (a) Hexagonal (triangular), (b) anisotropic triangular, (c) honeycomb, and (d) kagome lattices. The hexagonal lattice contains two inequivalent types of lattice site, some of which are labeled A and B. The sets of equivalent sites are referred to as sublattices.

10.2.2.5 Hexagonal and Honeycomb Lattices Even if the bonds are not all mutually perpendicular the solution to the tight-binding model can still be found by Fourier-transforming the Hamiltonian. Three important examples of such lattices are the hexagonal lattice (which is often referred to as the triangular lattice, although this is formally incorrect), the anisotropic triangular lattice, and the honeycomb lattice, which are sketched in Fig. 10.4. For each lattice the solution is of the form of Eq. (10.48). For the hexagonal lattice,

√ kx ax 3 ky ay cos εk = −2t cos kx ax − 4t cos 2 2

(10.49)

For the anisotropic triangular lattice, εk = −2t (cos kx ax + cos ky ay ) − 2t cos(kx ax + ky ay )

(10.50)

The honeycomb lattice has an important additional subtlety: that there are two inequivalent types of lattice site (see Fig. 10.4c), which it is worthwhile to work through. We begin by introducing new operators, cˆiνσ , which annihilate an electron with spin σ on the νth sublattice in the i th unit cell, where ν = A or B.

¨ HUCKEL OR TIGHT-BINDING MODEL

325

Therefore, we can rewrite Eq. (10.45) as Hˆ tb = −t

† cˆiAσ cˆj Bσ + cˆj†Bσ cˆiAσ

ij σ

† cˆ 0 1 cˆiAσ iAσ = −t 1 0 cˆiBσ cˆiBσ ij σ

† cˆ 0 kAσ = −t cˆkBσ h∗k kσ

√ 3ky )a/2

where hk = eikx a + e−i(kx + εk = ±t|hk |

= ±t 3 + 2 cos

√

hk 0

√ 3ky )a/2 .

+ e−i(kx −

√ 3 ky a + 4 cos

cˆkAσ cˆkBσ

(10.51)

Therefore,

3 ky a 3kx a cos 2 2

(10.52)

We plot this dispersion relation in Fig. 10.5. The most interesting features of this band structure are the Dirac points. The Dirac points are √ located at k = nK + mK , where √ n and m are integers, K = (2π/3a, 2π/3 3a), and K = (2π/3a, −2π/3 3a). To see why these points are interesting, consider a point K + q in the neighborhood of K. Recalling that cos(K + q) = cos K − q sin K + 12 q 2 cos K + · · ·, one finds that for small |q|, εK+q = vF |q| + · · ·

(10.53)

3 2 1 εk t

0 –1 –2 –3 3

2

1 ky

0 –1 –2 –3

–3

–2

–1

1

0

2

kx

Fig. 10.5 Dirac dispersion of the honeycomb lattice.

3

326

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

where vF = 3ta/2 is known as the Fermi velocity. This result should be compared with the relativistic result Ek2 = m2 c4 + 2 c2 |k|2

(10.54)

where m is a particle’s rest mass and c is the speed of light. This reduces to the famous E = mc2 for k = 0, but for massless particles such as photons, one finds that Ek = c|k|. Thus, the low-energy electronic excitations on a honeycomb lattice behave as if they are massless relativistic particles, with the Fermi velocity playing the role of the speed of light in the theory. Therefore, much excitement18 has been caused by the recent synthesis of atomically thick sheets of graphene,19 in which carbon atoms form a honeycomb lattice. In graphene vF 1 × 106 m s−1 , two orders smaller than the speed of light in the vacuum. This has opened the possibility of exploring and controlling “relativistic” effects in a solid-state system.18 10.3 HUBBARD MODEL

So far we have neglected electron–electron interactions. In real materials the electrons repel each other, due to the Coulomb interaction between them. The most obvious extension to the tight-binding model that describes some of the electron–electron interactions is to allow only on-site interactions (i.e., if Vij kl = 0 if and only if i, j, k, and l all refer to the same orbital). For one orbital per site we then have the Hubbard model, Hˆ Hubbard = −t

ij σ

† cˆiσ cˆj σ + U

† † cˆi↑ cˆi↑ cˆi↓ cˆi↓

(10.55)

i

where we have assumed nearest-neighbor hopping only. It follows from Eq. (10.27) that U > 0 (i.e., electrons repel one another). 10.3.1 Two-Site Hubbard Model: Molecular Hydrogen H2

The two-site Hubbard model is a nice context in which to consider some of the basic properties of the chemical bond. The two-body term in the Hubbard model greatly complicates the problem relative to the tight-binding model. Therefore, the Hubbard model also presents a nice context in which to introduce one of the most important tools in theoretical physics and chemistry: mean-field theory. 10.3.1.1 Mean-Field Theory, the Hartree–Fock Approximation, and Molecular Orbital Theory To construct a mean-field theory of any two as-yet-unspecified physical quantities, m = m + δm and n = n + δn, where n(m) is the mean value of n (m) and δn (δm) are the fluctuations about the mean, which are assumed to be small, one notes that

HUBBARD MODEL

327

mn = (m + δm)(n + δn) = m n + m δn + δmn + δm δn ≈ m n + m δn + δmn

(10.56)

Thus, mean-field approximations neglect terms that are quadratic in the fluctuations. Hartree theory is a mean field in the electron density; that is, cˆα† cˆβ cˆγ† cˆδ = [cˆα† cˆβ + (cˆα† cˆβ − cˆα† cˆβ )][cˆγ† cˆδ + (cˆγ† cˆδ − cˆγ† cˆδ )] ≈ cˆα† cˆβ cˆγ† cˆδ + cˆα† cˆβ cˆγ† cˆδ − cˆα† cˆβ cˆγ† cˆδ

(10.57)

However, it was quickly realized that this does not allow for electron exchange; that is, one should also include averages such as cˆα† cˆδ . Therefore, a better mean-field theory is Hartree–Fock theory, which includes these terms. However, because of the limited interactions included in the Hubbard model, Hartree theory is identical to Hartree–Fock theory if one assumes that spin-flip terms are † cˆi↓ = 0), which we will. negligible (i.e., that cˆi↑ The Hartree–Fock approximation to the Hubbard Hamiltonian is therefore † † † † † cˆiσ cˆj σ + U cˆi↓ + cˆi↑ cˆi↑ cˆi↓ cˆi↓ Hˆ HF = −t cˆi↑ cˆi↑ cˆi↓ ij σ

= −t

ij σ

i

† cˆiσ cˆj σ

+U

† † cˆi↑ cˆi↓ cˆi↓ − cˆi↑ † ni↑ cˆi↓ cˆi↓

+

† ni↓ cˆi↑ cˆi↑

− ni↑ ni↓

(10.58)

i

† where niσ = cˆiσ cˆiσ . Thus, we have a Hamiltonian for a single electron moving in the mean field of the other electrons. Note that this Hamiltonian is equivalent to the ω-method parameterization of the H¨uckel model [see Section 10.2.1.3, particularly Eq. (10.34)] if we set ω = U/β. Thus, the ω method is just a parameterization of the Hubbard model solved in the Hartree–Fock approximation. The Hubbard model with two sites and two electrons can be taken as a model 0 , the two elecfor molecular hydrogen. In the Hartree–Fock ground state, |HF trons have opposite spin and each occupies the bonding state, which we found to be the ground state of the H¨uckel model in Section 10.2.1.1: † † † † 0 |HF = |ψb↓ ⊗ |ψb↑ = 12 (cˆ1↑ + cˆ2↑ )(cˆ1↓ + cˆ2↓ )|0 † † † † † † † † cˆ1↓ + cˆ1↑ cˆ2↓ − cˆ1↓ cˆ2↑ + cˆ2↑ cˆ2↓ )|0 = 12 (cˆ1↑

(10.59) (10.60)

0 is just a product of two single-particle wavefunctions [one for Notice that |HF the spin-up electron and another for the spin-down electron; cf. Eq. (10.59)].

328

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Thus, we say that the wavefunction is uncorrelated and that the two electrons are unentangled. An important prediction of the Hartree–Fock theory is that if we pull the protons apart, we are equally likely to get two hydrogen atoms (H + H) or two hydrogen ions (H+ + H− ). This is not what is observed experimentally. In reality the former is far more likely. 10.3.1.2 Heitler–London Wavefunction and Valence-Bond Theory Just a year after the appearance of Schr¨odinger’s wave equation,20 Heitler and London21 proposed a theory of the chemical bond based on the new quantum mechanics. Explaining the nature of the chemical bond remains one of the greatest achievements of quantum mechanics. Heitler and London’s theory led to the valence-bond theory of the chemical bond.22 The two-site Hubbard model of H2 is the simplest context in which to study this theory. The Heitler–London wavefunction is

1 † † † † 0 = √ (cˆ1↑ cˆ2↓ − cˆ1↓ cˆ2↑ )|0 |HL 2

(10.61)

Notice that the wavefunction is correlated, as it cannot be written as a product of a wavefunction for each of the particles. Equivalently, one can say that the two electrons are entangled. The Heitler–London wavefunction overcorrects the physical errors in the Hartree–Fock molecular orbital wavefunction, as it predicts zero probability of H2 dissociating to an ionic state but is, nevertheless, a significant improvement on molecular orbital theory. 10.3.1.3 Exact Solution of the Two-Site Hubbard Model The Hilbert space of the two-site, two-electron Hubbard model is sufficiently small that we can solve it analytically; nevertheless, this problem can be greatly simplified by using the symmetry properties of the Hamiltonian. First, note that the total spin operator commutes with the Hamiltonian equation (10.55), as none of the terms in the Hamiltonian cause spin flips. Therefore, the energy eigenstates must also be spin eigenstates. For two electrons this means that all of the eigenstates will be either singlets (S = 0) or triplets (S = 1). Let us begin with the triplet states, |1m . Consider a state with two spin-up electrons, |11 . Because there is only one orbital per site, the Pauli exclusion principle ensures that there will be exactly one electron per site † † cˆ2↑ 0). The electrons cannot hop between sites, as the (i.e., |11 = cˆ1↑ presence of the other electron and the Pauli principle forbid it. Therefore, † † cˆ2σ )|11 = 11 |(−t cˆ2σ cˆ1σ )|11 = 0 for σ =↑ or ↓. There is exactly 11 |(−t cˆ1σ

† † cˆi↑ cˆi↓ cˆi↓ |11 = 0. Thus, the total one electron on each site, so 11 |U i cˆi↑ 1 energy of this state is E1 = 0. † † cˆ2↓ |0 and E1−1 The same chain of reasoning shows that |1−1 = cˆ1↓ √ = 0. It then follows from spin rotation symmetry that |10 = (1/ 2) † † † † cˆ2↓ + cˆ1↓ cˆ2↑ )|0 and E10 = 0. (cˆ1↑

HUBBARD MODEL

329

As the Hilbert space contains six states, this leaves three singlet states. A convenient basis for these is formed by state and √ the † Heitler–London † † † the two charge-transfer states: |HL = (1/ 2)(cˆ1↑ cˆ2↓ − cˆ1↓ cˆ2↑ )|0, |ct+ = √ √ † † † † † † † † cˆ1↓ + cˆ2↑ cˆ2↓ )|0, and |ct− = (1/ 2)(cˆ1↑ cˆ1↓ − cˆ2↑ cˆ2↓ )|0. Note (1/ 2)(cˆ1↑ † that |HL and |ct+ are even under “inversion” symmetry, which swaps the site labels 1 ↔ 2, whereas |ct− is odd under inversion symmetry. As the Hamiltonian is symmetric under inversion the eigenstates will have a definite parity, so |ct− is an eigenstate, with energy Ect− = U . The other two singlet states are not distinguished by any symmetry of the Hamiltonian, so they do couple, yielding the Hamiltonian matrix HL |Hˆ Hubbard |HL HL |Hˆ Hubbard |ct+ H = ct+ |Hˆ Hubbard |HL ct+ |Hˆ Hubbard |ct+ 0 −2t = (10.62) −2t U √ This has eigenvalues, ECF = 12 (U − U 2 + 16t 2 ) √ U 2 + 16t 2 ). The corresponding eigenstates are |CF = cos θ|HL + sin θ|ct+ cos θ † † † † = √ (cˆ1↑ cˆ2↓ − cˆ1↓ cˆ2↑ ) 2 sin θ † † † † + √ (cˆ1↓ cˆ1↑ + cˆ2↓ cˆ2↑ ) |0 2

and

ES 2 = 12 (U +

(10.63)

and |S2 = sin θ|HL + cos θ|ct+ sin θ † † † † cˆ2↓ − cˆ1↓ cˆ2↑ ) = √ (cˆ1↑ 2 cos θ † † † † + √ (cˆ1↓ cˆ1↑ + cˆ2↓ cˆ2↑ ) |0 (10.64) 2 √ where tan θ = (U − U 2 + 16t 2 )/4t. For U > 0, as is required physically, the state |CF is the ground state for all values of U /t. |CF is often called the Coulson–Fischer wavefunction. Inspection of Eq. (10.63) reveals that for U/t → ∞, the Coulson–Fischer state tends to the Heitler–London wavefunction, while for U/t → 0 we regain the molecular orbital picture (Hartree–Fock wavefunction). It may not be immediately obvious |HL is even under √ inversion symmetry, but this is eas√ that † † † † † † † † cˆ1↓ − cˆ2↓ cˆ1↑ )|0 = (1/ 2)(−cˆ1↓ cˆ2↑ + cˆ1↑ cˆ2↓ )|0 = |HL , ily confirmed as Iˆ|HL = (1/ 2)(cˆ2↑ ˆ where I is the inversion operator, which swaps the labels 1 and 2. †

330

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

10.3.2 Mott Insulators and the Mott–Hubbard Metal–Insulator Transition

In 1949, Mott23 asked an apparently simple question with a profound and surprising answer. As we have seen above, for the two-site Hubbard model both the molecular orbital (Hartree–Fock) and valence-bond (Heitler–London) wavefunctions are just approximations of the exact (Coulson–Fischer) wavefunction. Mott asked whether the equivalent statement is true in an infinite solid and, surprisingly, found that the answer is no. Further, Mott showed that the Hartree–Fock and Heitler–London wavefunctions predict very different properties for crystals. One of the most important properties of a crystal is its conductivity. In a metal the conductivity is high and increases as the temperature is lowered, whereas in a semiconductor or an insulator the conductivity is low and decreases as the temperature is lowered. These behaviors arise because of fundamental differences between the electronic structures of metals and semiconductors/insulators.10 In metals there are excited states at arbitrarily low energies above the Fermi energy. This means that even at the lowest temperatures, electrons can move in response to an applied electric field. In semiconductors and insulators there is an energy gap between the highest occupied electronic state and the lowest unoccupied electronic state at zero temperature. This means that a thermal activation energy must be provided if electrons are to move in response to an applied field. The difference between semiconductors and insulators is simply the size of the gap; therefore, we will not distinguish between the two below and will refer to any material with a gap as an insulator. Consider a Hubbard model at half-filling, that is, with the same number of electrons as lattice sites. For a macroscopic current to flow, an electron must move from one lattice site (leaving an empty site with a net positive charge) to a distant site (creating a doubly occupied site with a net negative charge). The net charges may move through the collective motions of the electrons. One could keep track of this by describing the movement of all the electrons, but it is easier to introduce an equivalent description where we treat the net charges as particles moving in a neutral background. Therefore, we refer to the positive charge as a holon and the negative charge as a doublon. In the ground state of valencebond theory, all of the sites are neutral and there are no holons or doublons [cf. Eq. (10.61)]. However, it is reasonable to postulate that there are low-lying charge-transfer excited states and hence thermal states that contain a few doublons and holons. These doublons and holons interact via the Coulomb potential, V (r) = −e2 /κr, where κ is the dielectric constant of the crystal. We know from the theory of the hydrogen atom (or, better, positronium; see Gasiorowicz7 ) that this potential gives rise to bound states. Therefore, one expects that in valencebond theory, holons and doublons are bound and that separating holon–doublon pairs costs a significant amount of energy. Thus, one expects the number of distant holon–doublon pairs to decrease as the temperature is lowered. Therefore, valence-bond theory predicts that a half-filled Hubbard model is an insulator. In contrast, molecular orbital theory has large numbers of holons and doublons [cf. Eq. (10.60), which suggests that for an N -site model there will be N /2 neutral sites, N /4 empty sites, and N /4 doubly occupied sites]. Mott reasoned

HUBBARD MODEL

331

that if there are many holon–doublon pairs “it no longer follows that work must necessarily be done to form some more.” This is because the holon and doublon now interact via a screened potential, V (r) = −(e2 /κr) exp(−qr), where q is the Thomas–Fermi wavevector (see Ashcroft and Mermin10 ). For sufficiently large q there will be no bound states, hence molecular orbital theory predicts that the half-filled Hubbard model is metallic. Thus, Mott argued that there are two (local) minima of the free energy in a crystal (see Fig. 10.6). One of the minima corresponds to a state with no holon–doublon pairs that is well approximated by a valence-bond wavefunction and is now known as the Mott insulating state. The second minimum corresponds to a state with many doublon–holon pairs that is well approximated by a molecular orbital wavefunction and is metallic. As we saw above, valencebond theory works well for U t and molecular orbital theory works well for U t. Therefore, in the half-filled Hubbard model we expect a Mott insulator for large U /t and a metal for small U /t. Further the “double-well” structure of the energy predicted by Mott’s argument (Fig. 10.6) suggests that there is a first-order metal–insulator phase transition, known as the Mott transition. Mott predicted that this metal–insulator transition can be driven by applying pressure to a Mott insulator. This has now been observed in a number of systems; perhaps the purest examples are the organic charge-transfer salts (BEDT-TTF)2 X.24 It is interesting to note that this infusion of chemical ideas into condensed matter physics has remained important in studies of the Mott transition. Of particular note is Anderson’s resonating valence-bond theory of superconductivity in high-temperature superconductors,26,27 which describes superconductivity in a doped Mott insulator in terms of a generalization of the valence-bond theory discussed above. This theory can also be modified to describe superconductivity on the metallic side of the Mott transition for a half-filled lattice. This theory then

Fig. 10.6 (color online) Mott’s proposal for the energy of the Hubbard model as a function of the number of holon–doublon pairs, np , at low (zero) temperature(s) for large and small U /t.

332

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

provides a good description of the superconductivity observed in (BEDT-TTF)2 X salts.28 Note that theories such as Hartree–Fock and density functional25 that do not include the strong electronic correlations present in the Hubbard model do not predict a Mott insulating state. Thus, weakly correlated theories make the qualitatively incorrect prediction that materials such as NiO, V2 O3 , La2 CuO4 , and κ-(BEDT-TTF)2 Cu[N(CN)2 ]Cl are metals, whereas experimentally, all are insulators. We will discuss a quantitative theory of the Mott transition in Section 10.3.3.2. 10.3.3 Mean-Field Theories for Crystals 10.3.3.1 Hartree–Fock Theory of the Hubbard Model: Stoner Ferromagnetism In a manner similar to that in which we constructed the Hartree–Fock meanfield theory for the two-site Hubbard model in Section 10.3.1.1, we can also construct a Hartree–Fock theory of the infinite lattice Hubbard model. Again, we simply replace the number operators in the two-body term by their mean † † values, niσ ≡ cˆiσ cˆiσ , plus the fluctuations about the mean, (cˆiσ cˆiσ − niσ ), and neglect terms that are quadratic in the fluctuations:

U

† † cˆi↑ cˆi↑ cˆi↓ cˆi↓ = U

i

† † [ni↑ + (cˆi↑ cˆi↑ − ni↑ )][ni↓ + (cˆi↓ cˆi↓ − ni↓ )]

i

U

† † [ni↓ cˆi↑ cˆi↑ + ni↑ cˆi↓ cˆi↓ − ni↑ ni↓ ]

(10.65)

i

If we make the additional approximation that niσ = nσ for all i (i.e., that the system is homogeneous and does not spontaneously break translational symmetry), we find that the Hartree-Fock Hamiltonian for the Hubbard model is † † cˆiσ cˆj σ + (U nσ − μ)cˆiσ cˆiσ − UN n↑ n↓ (10.66) Hˆ HF − μNˆ = −t ij σ

iσ

where N is the number of lattice sites and σ is the opposite spin to σ. It is convenient to write this Hamiltonian in terms of the total electron density, n = n↑ + n↓ , and the magnetization density, m = n↑ − n↓ , which gives Hˆ HF − μNˆ = −t

ij σ

† cˆiσ cˆj σ − μ

† cˆiσ cˆiσ

iσ

1 1 † (n − +U + (n + m)cˆi↓ cˆi↓ − (n + m)(n − m) 2 2 4 i Um Un NU 2 (n − m2 ) nˆ kσ − = nˆ kσ − μ − ε0k + σ 2 2 4 1

kσ

† m)cˆi↑ cˆi↑

kσ

(10.67)

333

HUBBARD MODEL

where ε0k is the dispersion relation for U = 0 and σ = ±1 =↑↓. The last term is just a constant and will not concern us greatly. The penultimate term is the “renormalized” chemical potential; that is, the chemical potential, μ, of the system with U = 0 is decreased by Un/2 due to the interactions. The first term is just the renormalized dispersion relation; in particular, we find that if the magnetization density is nonzero the dispersion relation for spin-up electrons is different from that for spin-down electrons (see Fig. 10.7). It is important to note that the Hartree–Fock approximation has reduced the problem to a single-particle (singledeterminant) theory. Thus, we can write Hˆ HF − μNˆ =

(ε∗kσ − μ∗ )nˆ kσ −

kσ

NU 2 (n − m2 ) 4

(10.68)

where ε∗kσ = ε0k − 12 σU m and μ∗ = μ − 12 U n. We can now calculate the magnetization density (magnetic moment): m = n ↑ − n↓ 0 = dε[D↑ (ε − μ∗ ) − D↓ (ε − μ∗ )] =

−∞ 0 −∞

dε D0 ε − 12 U m + 12 U n − μ − D0 ε + 12 U m + 12 U n − μ

≡ f (m) = D0 (0)U m + O(m2 )

(10.69)

where D0 (ε) = ∂N0 (ε)/∂ε|ε is the density of states (DOS; see Ashcroft and Mermin10 ) per spin for U = 0, N0 (ε) is the number of electrons (per spin species)

Fig. 10.7 (color online) Dispersion relations for spin-up and spin-down electrons in the Hartree–Fock theory of the Hubbard chain (Stoner model of ferromagnetism) with m = 0.8t/U .

334

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Fig. 10.8 How to find the self-consistent solution of Eq. (10.69). If the convergence works well, one can take α = 1, but for some problems convergence can be reached more reliably with a small value of α (often a value as small as ∼ 0.05 is used).

for which ε0k ≤ ε for U = 0, Dσ (ε) = ∂Nσ (ε)/∂ε|ε is the full interacting DOS for spin σ electrons, and Nσ (ε) is the number of electrons with spin σ for which εkσ ≤ ε. The standard way to solve mean-field theories, known as the method of self-consistent solution, is illustrated in Fig. 10.8. The major difficulty with self-consistent solutions is that it is not possible to establish whether or not one has found all of the self-consistent solutions, and therefore it is not possible to establish whether or not one has found the global minimum. Therefore, it is prudent to try a wide range of initial guesses for m (or whatever variable the initial guess is made in). Clearly, m = 0 is always a solution of Eq. (10.69), and for U D0 (0) < 1 this turns out to be the only solution. But for U D0 (0) > 1 there are additional solutions with m = 0. This is easily understood from the sketch in Fig. 10.9. Furthermore, the m = 0 solutions typically have lower energy than the m = 0 solution, and therefore for U D0 (0) > 1 the ground state is ferromagnetic. U D0 (0) ≥ 1 is known as the Stoner condition for ferromagnetism. For the Stoner condition to be satisfied, a system must have narrow bands [small t, and hence large D(0)] and strong interactions (large U ). There are three elemental ferromagnets, Fe, Co, and Ni, each of which is also metallic. As the Hartree–Fock theory of the Hubbard model predicts metallic magnetism if the Stoner criterion is satisfied and these materials have narrow bands of strongly interacting electrons, it is natural to ask whether this is a good description of these materials. However, if one extends the treatment above to finite temperatures,29 one finds that the Hartree–Fock theory of the Hubbard model does not provide a good theory of the three elemental magnets. The Curie temperatures, TC (i.e., the temperature at which the material becomes ferromagnetic) of Fe, Co, and Ni are ∼ 1000 K (see, e.g., Table 33.1 of Ashcroft and Mermin10 ). Hartree–Fock theory predicts

335

f(m)

HUBBARD MODEL

m

Fig. 10.9 (color online) Graphical solution of the self-consistency equation [Eq. (10.69)] for the Stoner model of ferromagnetism.

that Tc ∼ U m0 , where m0 is the magnetization at T = 0. If the parameters in the Hubbard model are chosen so that Hartree–Fock theory reproduces the observed m0 , the predicted critical temperature is ∼ 10, 000 K. This order-of-magnitude disagreement with experiment results from the failure of the mean-field Hartree–Fock approximation to account properly for the fluctuations in the local magnetization. This is closely related to the (incorrect) prediction of the Hartree–Fock approximation that there are no local moments above Tc . (Experimentally local moments are observed above Tc .) However, for weak ferromagnets, such as ZrZn2 (Tc ∼ 30 K) the Hartree–Fock theory of the Hubbard model provides an excellent description of the behavior observed.30 The effects missed by Hartree–Fock theory are referred to as electronic correlations. The dramatic failure of Hartree–Fock theory in Fe, Co, and Ni shows that electron correlations are very important in these materials, as do other comparisons of theory and experiment.31 However, it is important to note that mean-field theory is not limited to Hartree–Fock theory (although the terms are often, but imprecisely, used synonymously). Rather, Hartree–Fock theory is the mean-field theory of the electronic density. By constructing mean-field theories of other properties it is possible to construct mean-field theories that capture (some) electronic correlations. We now consider an example of a rather different mean-field theory. 10.3.3.2 Gutzwiller Approximation, Slave Bosons, and the Brinkman–Rice Metal–Insulator Transition In 1963, Gutzwiller32 proposed a variational wavefunction for the Hubbard model: (1 − αnˆ i↑ nˆ i↓ )|0 |G = i nˆ i↑ nˆ i↓ |0 (10.70) = exp −g i

336

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

where g = − ln(1 − α) is a variational parameter and |0 is the ground state for uncorrelated electrons. One should note that the Gutzwiller wavefunction is closely related to the coupled cluster ansatz,1 which is widely used in both physics and chemistry. Gutzwiller used this ansatz to study the problem of itinerant ferromagnetism. This leads to an improvement over the Hartree–Fock theory discussed above. However, in 1970, Brinkman and Rice33 showed that this wavefunction also describes a metal–insulator transition, now referred to as a Brinkman–Rice transition. Rather than studying this wavefunction in detail, we use an equivalent technique known as slave bosons. This has the advantage of making it clear that the Brinkman–Rice theory is just a mean-field description of the Mott transition. The i th site in a Hubbard model has four possible states: the site can be empty, |ei ; can contain a single spin σ (=↑ or ↓) electron |σi ; or can contain two electrons, |di . The Kotliar–Ruckenstein slave boson technique introduces an overcomplete description of these states: |ei = eˆi† |0i

(10.71)

† † cˆiσ |0i |σi = pˆ iσ

(10.72)

† † cˆi↓ |0i |di = dˆi† cˆi↑

(10.73)

† , and dˆi† are bosonic creation operators which correspond to empty, where eˆi† , pˆ iσ partially filled, and doubly occupied sites. |0i is a state with no fermions and no bosons on site i ; note that this is not a physically realizable state. This transformation is not only kosher, but also exact, as long as we also introduce the constraints

eˆi† eˆi +

† pˆ iσ pˆ iσ + dˆi† dˆi = 1

(10.74)

σ

which ensures that there is exactly one boson per site and therefore that each site is either empty, partially occupied, or doubly occupied, and † † cˆiσ cˆiσ − pˆ iσ pˆ iσ − dˆi† dˆi = 0

(10.75)

which ensures that if a site contains a spin σ electron, it is either singly occupied (with spin σ) or doubly occupied. Writing the Hubbard Hamiltonian in terms of the slave bosons yields † † † dˆi dˆi zˆ iσ cˆiσ cˆj σ zˆ j σ + U (10.76) Hˆ Hubbard = −t ij σ

i

where zˆ j σ = eˆj† pˆ j σ + pˆ j†σ dˆj . We now make a mean-field approximation and replace the bosonic operators by the expectation values: ei = e, pi↑ = pi↓ = p, di = d. Note that we have

HUBBARD MODEL

337

additionally assumed that the system is homogeneous (the expectation values do not depend on i ) and paramagnetic (pi↑ = pi↓ ). Therefore, the constraints reduce to |e|2 + 2|p|2 + | d|2 = 1

(10.77)

and † |p|2 + | d|2 = cˆiσ cˆiσ =

n 2

(10.78)

where n is the average number of electrons per site. This amounts only to enforcing the constraints, on average. This theory does not reproduce the correct result for U = 0. However, this deficiency can be fixed if zˆ j σ is replaced by the “renormalized” quantity, z˜ j σ , defined such that ˜zj†σ z˜ j σ =

(n/2) − | d|2 d + 1 − n + | d|2 (1 − n/2) (n/2)

(10.79)

Let us specialize to a half-filled band, n = 1. The constraints now allow us to eliminate |p|2 = 12 − |d|2 and |e|2 = |d|2 . Thus, we find that Hˆ Hubbard −t

1 2 8 (|d|

† − 2|d|4 )cˆiσ cˆj σ + UN0 |d|2

ij σ

= 18 (| d|2 − 2| d|4 )

ε0k nˆ kσ + UN0 |d|2

(10.80)

kσ

where ε0k is the dispersion for U = 0 and N is the number of lattice sites. Recall that |d|2 = di† di (i.e., |d|2 is the probability of site being doubly occupied). We construct a variational theory by ensuring that the energy is minimized with respect to |d|, which yields ∂E ε0k nˆ kσ + 2U N0 |d| = 0 = 14 (| d| − 4| d|3 ) ∂| d| kσ

(10.81)

Equation (10.81) allows one to solve the problem self-consistently (see Fig. 10.8). For small U this equation has more than one minimum and the lowest-energy state has |d|2 > 0, which corresponds to a correlated metallic state (the details of this minimum depend on ε0k ). But above some critical U the ground-state solution has |d|2 = 0, which corresponds to no doubly occupied states (i.e., the Mott insulator). Thus, the dependence of the energy on the number holon-doublon pairs (np = |d|2 ) calculated from the mean-field slave boson theory is exactly as Mott predicted on rather general grounds (shown in Fig. 10.6).

338

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

10.3.4 Exact Solutions of the Hubbard Model 10.3.4.1 One Dimension Lieb and Wu34 famously solved the Hubbard chain at T = 0 using the Bethe ansatz.35,36 Lieb and Wu found that the half-filled Hubbard chain is a Mott insulator for any nonzero U . Nevertheless, the Bethe ansatz solution is not straightforward to understand, and weighty textbooks have been written on the subject.35,36 10.3.4.2 Infinite Dimensions: Dynamical Mean-Field Theory As one increases the dimension of a lattice, the coordination number (the number of nearest neighbors for each lattice site) also increases. In infinite dimensions each lattice site has infinitely many nearest neighbors. For a classical model, mean-field theory becomes exact in infinite dimensions, as the environment (the infinite number of nearest neighbors) seen by each site is exactly the same as the mean field. However, quantum mechanically, things are complicated by the internal dynamics of the site. In the Hubbard model each site can contain zero, one, or two electrons, and a dynamic equilibrium between the different charge and spin states is maintained. However, the environment is still described by a mean field, even though the dynamics are not. Therefore, although the Hartree–Fock theory of the Hubbard model does not become exact in infinite dimensions, it is possible to construct a theory that treats the on-site dynamics exactly and the spatial correlations at the mean-field level; this theory is known as dynamic mean-field theory (DMFT).37 The importance of DMFT is not in the somewhat academic limit of infinite dimensions. Rather, DMFT has become an important approximate theory in the finite numbers of dimensions relevant to real materials.37 It has been found that DMFT captures a great deal of the physics of strongly correlated electrons. Typically, the most important correlations are on-site and therefore are described correctly by DMFT. These include the correlations that are important in metallic magnetism38 and many other strongly correlated materials.24,37 Cluster extensions to DMFT, such as cellular dynamical mean-field theory (CDMFT) and the dynamical cluster approximation (DCA), which capture some of the nonlocal correlations, have led to further insights into strongly correlated materials.39 Considerable success has also been achieved by combining DMFT with density functional theory.40 10.3.4.3 Nagaoka Point The Nagaoka point in the phase diagram of the Hubbard model is the U → ∞ limit when we add one hole to a half-filled system. Nagaoka rigorously proved41,42 that at this point the state that maximizes the total spin of the system [i.e., the state with Sz = (N − 1)/2, for an N -site lattice] is an extremum in energy (i.e., either the ground state or the highest-lying excited state). On most bipartite lattices (cf. Fig. 10.11a) one finds that this “Nagaoka state” is indeed the ground state.42 However, on frustrated lattices (Fig. 10.11b) the Nagaoka state is typically only the ground state for one sign of t.43 It is quite straightforward to understand why the Nagaoka state is often the ground state. As we are considering the U → ∞ limit there will strictly be no

HEISENBERG MODEL

339

double occupation of any sites. One therefore need only consider the subspace of states with no double occupation. As none of these states contain any potential energy (i.e., terms proportional to U ), the ground state will be the state that minimizes the kinetic energy (the term proportional to t). Thus, the ground state is the state that maximizes the magnitude of the kinetic energy with a negative sign. In the Nagaoka state all of the electrons align, which means that the holon can hop unimpeded by the Pauli exclusion principle, thus maximizing the magnitude of the kinetic energy. It is a simple matter to check whether this is the ground state or the highest-lying excited state, as we just compare the energy of the Nagaoka state with that of any other state satisfying the constraint of no double occupation. Nagaoka’s rigorous treatment has not been extended to doping by more than one hole and it remains an outstanding problem to further understand this interesting phenomenon, which shares important features with the magnetism observed in the elemental magnets38 and many strongly correlated materials.43 10.4 HEISENBERG MODEL

Like the Stoner ferromagnetism we discussed above in the context of the Hartree–Fock solution for the Hubbard model (Section 10.3.3.1) and Hund’s rules (which we discuss in Section 10.5.2), the Heisenberg model is an important paradigm for understanding magnetism. The Heisenberg model does not provide a realistic description of the three elemental ferromagnets (Fe, Co, and Ni) as they are metals, whereas the Heisenberg model only describes insulators. However, as we will see in Section 10.4.3, the Heisenberg model is a good description of Mott insulators such as La2 CuO4 (the parent compound of the high-temperature superconductors) and κ-(BEDT-TTF)2 Cu[N(CN)2 ]Cl (the parent compound for the organic superconductors). The Heisenberg model also plays an important role in the valence-bond theory of the chemical bond.44 In the Heisenberg model one assumes that there is a single (unpaired) electron localized at each site and that the charge cannot move. Therefore, the only degrees of freedom in the Heisenberg model are the spins of each site (the model can also be generalized to spin > 12 ). The Hamiltonian for the Heisenberg model is Hˆ Heisenberg = Jij Sˆ i · Sˆ j (10.82) ij

y † σ αβ cˆiβ is the spin operator on site i, σ = where Sˆi = (Sˆix , Sˆi , Sˆiz ) = 12 αβ cˆiα (σx , σy , σz ) is the vector of Pauli matrices, and Jij is the exchange energy between sites i and j .

10.4.1 Two-Site Model: Classical Solution

In the classical Heisenberg model one replaces the spin operator, Sˆ i , with a classical spin (i.e., a real vector, Si ). Thus, on two sites, with J12 = J , the energy of the model is

340

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS (2) EHeisenberg = J S1 · S2 = J |S1 ||S2 | cos φ

(10.83)

where φ is the angle between the two spins (vectors). The classical energy is minimized by φ = π for J > 0 and φ = 0 for J 0 the lowest-energy solution is for the two spins to point antiparallel (i.e., in opposite directions to one another); we refer to this as the antiferromagnetic solution. For J 0, we cannot optimize the energy of each bond individually. When this is the case one says that the lattice is frustrated . For a frustrated lattice with S = 12 , we expect the solution for J > (3) > −3J /4, and thus one expects the difference in 0 to have energy EHeisenberg energy between this state and the ferromagnetic state to be <JNz /4. The concept of frustration can also be generalized to itinerant systems where a similar reduction in the bandwidth of the itinerant electrons is found.43 Having outlined our expectations, let us now consider the three-site Heisenberg model more carefully. The energy is given by (3) =J EHeisenberg

ij

Si · Sj

(10.97)

HEISENBERG MODEL

345

Without loss of generality we can choose S1 = S1 (1, 0, 0), S2 = S2 (cos φ2 , sin φ2 , 0), and S3 = S3 (cos θ3 cos φ3 , cos θ3 sin φ3 , sin θ3 ). Thus, for S1 = S2 = S3 = 12 , (3) = EHeisenberg

J [cos φ2 + cos θ3 cos(φ2 − φ3 ) + cos θ3 cos φ3 ] 4

(10.98)

Physically, we seek the minimum energy, which yields the conditions (3) ∂EHeisenberg

∂θ3 (3) ∂EHeisenberg ∂φ3 (3) ∂EHeisenberg

∂φ2

=

J sin θ3 [cos(φ2 − φ3 ) + cos φ3 ] = 0 4

=

J cos θ3 [sin(φ2 − φ3 ) − sin φ3 ] = 0 4

J = − [cos θ3 sin(φ2 − φ3 ) + sin φ2 ] = 0 4

For J > 0 the global minimum is, unsurprisingly, θ3 = φ2 = φ3 = 0 (i.e., ferromagnetism). The energy of the ferromagnetic state is 3J /4. For J < 0 there are several degenerate minima, which all show the same physics. For simplicity we will just consider the minimum θ3 = 0, φ2 = 2π/3, and φ3 = 4π/3. In this solution each of the spins points 120◦ away from each of the other spins; hence, this is known as the 120◦ state. It is left as an exercise to the reader to identify the other solutions, to show that there are none with lower energy than those discussed above, and to show that all of the degenerate solutions are physically equivalent. The energy of the 120◦ state is −3J /8 and hence the energy difference between the ferromagnetic state and the 120◦ state is just 9J /8, less than we would expect (JNz /4 = 3J /2 for N = 3, z = 2) for a bipartite lattice. 10.4.5 Three-Site Model: Exact Quantum Mechanical Solution

Group theory, the mathematics of symmetry, allows one to solve the quantum spin- 12 three-site Heisenberg model straightforwardly. Unfortunately, space does not permit an introduction to the relevant group theory. Therefore, the reader who is not familiar with the mathematics is advised either to refer to one of the many excellent textbooks on the subject (e.g., Tinkham11 or Lax12 ) or, failing that, simply to check that the wavefunctions derived by the group-theoretic arguments below are indeed eigenstates. The Hamiltonian is Hˆ (3) Sˆ i · Sˆ j =J Heisenberg

ij

=J

1 ij

2

(Sˆi+ Sˆj− + Sˆi− Sˆj+ ) + Sˆiz Sˆjz

(10.99)

346

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

We begin by noting that 2 ⊗ 2 ⊗ 2 = 2 ⊕ 2 ⊕ 4† ; that is, a system formed from three spin- 12 particles will have two doublets (with twofold-degenerate spin- 12 eigenstates) and one quadruplet (with fourfold-degenerate spin- 32 eigenstates). There are only four possible quadruplet states consistent with C3 point-group symmetry‡ of the model. Each of these belongs to the A irreducible representation of C3 . They are 3/2

|ψ3/2 = |↑↑↑ 1 1/2 |ψ3/2 = √ (|↓↑↑ + |↑↓↑ + |↑↑↓) 3 1 −1/2 |ψ3/2 = √ (|↑↓↓ + |↓↑↓ + |↓↓↑) 3 −3/2

|ψ3/2 = |↓↓↓ where |αβγ = |S1z , S2z , S3z with α, β, and γ = ↑ or ↓. Each of these states has energy E = 3J /4, and they are the (degenerate) ground states for J < 0. We are left with the four doublet states. These belong to the two-dimensional E irreducible representation of C3 , and as the Hamiltonian is time-reversal symmetric, all four doublet states are degenerate. Explicitly the states are 1 1/2 |ψ1/2 = √ (|↓↑↑ + ei2π/3 |↑↓↑ + e−i2π/3 |↑↑↓) 3 1 −1/2 |ψ1/2 = √ (|↑↓↓ + ei2π/3 |↓↑↓ + e−i2π/3 |↓↓↑) 3 ˜ 1/2 = √1 (|↓↑↑ + e−i2π/3 |↑↓↑ + ei2π/3 |↑↑↓) |ψ 1/2 3 1 ˜ −1/2 = √ (|↑↓↓ + e−i2π/3 |↓↑↓ + ei2π/3 |↓↓↑) |ψ 1/2 3 Each of these states has energy E = −5J /4 and they are the (degenerate) ground states for J > 0. Thus, the energy difference between the highest spin state and the lowest spin state is 2J . From the solution to the two-site model (Section 10.4.2), we expected each of the three bonds to yield an energy difference of J between the lowest and highest spin states. Thus, the frustration has a similar effect on both the quantum and classical models (i.e., frustration lowers the energy difference between the highest spin and lowest spin states). †

In this notation the integers are the degeneracy of the state. might, reasonably, take the view that the model has either D3h or C3v . In fact, the arguments in this section go through almost identically for either of these symmetries (with appropriate changes in notation), due to the homomorphisms from these groups to C3 . We use C3 notation for simplicity.

‡ One

HEISENBERG MODEL

347

10.4.6 Heisenberg Model on Infinite Lattices

The Heisenberg model can be solved exactly in one dimension, and we discuss this further below, but not in any other finite dimension. However, in more than one dimension, physics of the Heisenberg model is typically very different from that in one dimension, so we will begin by discussing, qualitatively, the semiclassical spin-wave approximation for the Heisenberg model, which captures many important aspects of magnetism. A quantitative formulation of this theory can be found in many textbooks (e.g., Ashcroft and Mermin10 or R¨ossler29 ). In inelastic neutron scattering experiments a neutron may have its spin flipped by its interaction with the magnet; this causes a spin 1 excitation in the material. The conceptually simplest spin 1 excitation would be to flip one (spin- 12 ) spin; in a one-dimensional ferromagnetic Heisenberg model, this state has energy 2|J | greater than the ground state. However, a much lower energy excitation is a “spin wave,” where each spin is rotated a small amount from its nearest neighbors (see Fig. 10.12). In a one-dimensional ferromagnetic Heisenberg model, spin waves have excitation energies of ωk = 2|J |(1 − cos ka), where a is the lattice constant.29 Note, in particular, that the excitation energy vanishes for long-wavelength (small-k ) spin waves. This spin-wave spectrum can indeed be observed directly in neutron-scattering experiments from suitable materials,47 and the spectrum is found to be in good agreement with the predictions of the semiclassical theory in many materials. One can also quantize the semiclassical theory by making a Holstein–Primakoff transformation.29 This yields a description of the low-energy physics of the Heisenberg model in terms of noninteracting bosons, known as magnons, which have the same dispersion relation as the classical spin waves. Similar spin-wave and magnon descriptions can be constructed straightforwardly for the antiferromagnetic Heisenberg model.29 The effective low-energy physics of the one-dimensional Heisenberg model is, as noted above, rather different from the semiclassical approximation. To understand this, it is helpful to think of the Heisenberg model as a special case of the XXZ model :

(a)

(b)

Fig. 10.12 (color online) (a) Classical ground state of a ferromagnetic Heisenberg chain; (b) spin-wave excitation with wavelength λ = 1/k in the same model.

348

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

HXXZ = Jxy

y y

x (Six Si+1 + Si Si+1 ) + Jz

i

z Siz Si+1

(10.100)

i

which reduces to the Heisenberg model for Jxy = Jz = J . For Jz < Jxy < 0, the model displays an exotic quantum phase known as a Luttinger liquid . (At Jxy = Jz the model undergoes a quantum phase transition from the Luttinger liquid to an ordered phase.48 ) On the energy scales relevant to chemistry, one does not need to worry about the fact that protons and neutrons are made up of smaller particles (quarks). This is because the quarks are confined within the proton or neutron.49 Similarly, in a normal magnet it does not matter that the material is made up of spin- 12 particles (electrons). As described above, on the energy scales relevant to magnets, the spins are confined into spin-1 particles, magnons. However, magnons can be described in terms of two spin- 12 spinons, which are confined inside the magnon. In the Luttinger liquid the spinons are deconfined; that is, the spinons can move independent of one another (see Fig. 10.13). As the magnon is a composite particle made from two spinons, this is often referred to as fractionalization. A key prediction of this theory is that the spinons display a continuum of excitations in neutron-scattering experiments (as opposed to the sharp dispersion predicted for magnons). The two-spinon continuum has indeed been observed in a number of quasi-one-dimensional materials.50

(a)

(b)

(c)

(d)

Fig. 10.13 (color online) Spinons in a one-dimensional spin chain. (a) Local antiferromagnetic correlations. (b) A neutron scattering off the chain causes one spin (circled) to flip. (c,d) Spontaneous flips of adjacent pairs of spins due to quantum fluctuations allow the spinons (circled) to propagate independently. A key open question is: Can this free propagation occur in two-dimensions, or do interactions confine the spinons? (Modified from Ref. 81.)

OTHER EFFECTIVE LOW-ENERGY HAMILTONIANS

349

An open research question is: Does fractionalization occur in higher dimensions? Because of the success of spin-wave theory (implying confined spinons) in describing magnetically ordered materials, one does not expect fractionalization in materials with magnetic order. Therefore, one would like to investigate quasi-two- or three-dimensional materials whose low-energy physics is described by spin Hamiltonians (such as the Heisenberg model) but that do not order magnetically even at the lowest temperatures. Such materials are collectively referred to as spin liquids. There is a long history of theoretical contemplation of spin liquids, which suggests that frustrated magnets and insulating systems near to the Mott transition are strong candidates to display spin-liquid physics. However, evidence for real materials with spin-liquid ground states has been scarce until very recently,51 but there is now evidence for spin liquids in the triangular lattice compound κ-(BEDT-TTF)2 Cu(CN)3 ,24,52 the kagome lattice (see Fig. 10.4) compound ZnCu3 (OH)6 Cl2 ,53 and the hyperkagome lattice compound Na4 Ir3 O8 .54 It remains to be seen whether any of these materials support fractionalized excitations.

10.5 OTHER EFFECTIVE LOW-ENERGY HAMILTONIANS FOR CORRELATED ELECTRONS 10.5.1 Complete Neglect of Differential Overlap, the Pariser–Parr–Pople Model, and Extended Hubbard Models

We now consider another model for which the quantum chemistry and condensed matter physics communities have different names. These models belong to class of models known as complete neglect of differential overlap (CNDO). For a pair of orthogonal states, φ(x) and ψ(x), the ∞integral over all space of the overlap of the two wavefunctions vanishes [i.e., −∞ φ(x)ψ(x)dx = 0]. If the differential overlap vanishes, the overlap of the two wavefunctions vanishes at every point x +δ in space [i.e., limδ→0 x00 φ(x)ψ(x)dx = 0 for all x0 ]. The CNDO approximation is simply to assume that the differential overlap between all basis states is negligible. Thus CNDO implies that Vij kl = Viikk δij δkl (cf. Section 10.1.2) and the general CNDO Hamiltonian is Hˆ CNDO = −

† tij cˆiσ cˆj σ +

ij σ

Vij nˆ iσ nˆ j σ

(10.101)

ij σσ

† cˆiσ . The Pariser–Parr–Pople where Vij ≡ Viijj and the number operator nˆ iσ ≡ cˆiσ (PPP) model is the CNDO approximation in a basis that includes only the πelectrons. Often, a H¨uckel-like notation is used with Vij = γij ; thus,

Hˆ PPP =

iσ

† αi cˆiσ cˆiσ +

ij σ

† βij cˆiσ cˆj σ +

ij σσ

γij nˆ iσ nˆj σ

(10.102)

350

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

The extended Hubbard model, as with the plain Hubbard model, is typically studied in a basis with one orbital per site. Further, one often makes the approximation that Vii = U, Vij = V , if i and j are nearest neighbors and Vij = 0 otherwise. This yields Hˆ eH = −

† tij cˆiσ cˆj σ + U

ij σ

nˆ i↑ nˆ i↓ + V

i

nˆ iσ nˆ j σ

(10.103)

ij σσ

One can, of course, go beyond CNDO. The most general possible model for two identical sites with a single orbital per site is Hˆ eH2 = −

† † cˆ2σ + cˆ2σ cˆ1σ ) t − X(nˆ 1σ + nˆ 2σ ) (cˆ1σ σ

+U

nˆ i↑ nˆ i↓ + V nˆ 1 nˆ 2 + J S1 · S2

i † † † † + P (cˆ1↑ cˆ1↓ cˆ2↑ cˆ2↓ + cˆ2↑ cˆ2↓ cˆ1↑ cˆ1↓ )

(10.104)

† σ αβ cˆiβ , σ αβ is the vector of Pauli matrices, J is where nˆ i = σ nˆ iσ , Sˆ i = αβ cˆiα the direct exchange interaction, X is the correlated hopping amplitude, and P is the pair hopping amplitude. 10.5.2 Larger Basis Sets and Hund’s Rules

Thus far we have focused mainly on models with one orbital per site. Often, this is not appropriate: for example, if one were interested in chemical bonding or materials containing transition metals. Many of the models discussed in this chapter can be extended straightforwardly to include more than one orbital per site. However, while writing down models with more than one orbital per site is not difficult, these models do contain significant additional physics. Some of the most important effects are known as Hund’s rules.1 These rules have important experimental consequences, from atomic physics to biology. To examine Hund’s rules, let us consider the atomic limit (t = 0) of an extended Hubbard model with two electrons in two orbitals per site: Hˆ eH1s2o = U

nˆ μ↑ nˆ μ↓ + V0 nˆ 1 nˆ 2 + JH Sˆ 1 · Sˆ 2

(10.105)

μ

† cˆ , n where μ = 1 or 2 labels the orbitals, nˆ μσ = cˆμσ ˆ μσ , Sˆ μ = μσ ˆ μ = σn

† αβ cˆμβ , U is the Coulomb repulsion between two electrons in the same αβ cˆμα σ orbital, V0 is the Coulomb repulsion between two electrons in different orbitals, and JH is the Hund’s rule coupling between electrons in different orbitals. Notice that the Hund’s rule coupling is an exchange interaction between orbitals.

OTHER EFFECTIVE LOW-ENERGY HAMILTONIANS

351

Further, if we compare the Hamiltonian with the definition given in Eq. (10.28), we find that 3 −JH = d r1 d 3 r2 φ∗1 (r1 )φ2 (r1 )V (r1 − r2 )φ∗2 (r2 )φ1 (r2 ) 3 ∼ d r1 d 3 r2 |φ1 (r1 )|2 V (r1 − r2 )|φ2 (r2 )|2 ≥0

(10.106)

as V (r1 − r2 ) is positive semidefinite. Therefore, typically, JH < 0; that is, the Hund’s rule coupling favors the parallel alignment of the spins in a half-filled system. U is the largest energy scale in the problem, so, for simplicity, let us consider the √ case U → ∞. For JH = 0 there are four degenerate ground states: a singlet, (1/ 2)(| ↑↓ − | ↓↑) (where the first arrow refers to the spin of the electron in orbital 1 and the √ second arrow refers to the spin in orbital 2), and a triplet: | ↑↑, | ↓↓, and (1/ 2)(| ↑↓ − | ↓↑). But for J > 0 the energy of the triplet states is JH lower than that of the singlet state. Indeed, even if we relax the condition U → ∞, the triplet state remains lower in energy than the singlet state, as physically we require that U > JH . One can repeat this argument for any number of electrons in any number of orbitals, and one always finds that the highest spin state has the lowest energy. However, if one studies models with more than one site and moves away from the atomic limit (t = 0), one finds that there is a subtle competition between the kinetic (hopping) term and the Hund’s rule coupling which means that the high spin state is not always the lowestenergy state. Many such interesting effects can be understood on the basis of a two-site generalization of this two-orbital model.55 10.5.3 Ionic Hubbard Model

Thus far we have assumed that all sites are identical. Of course, this is not always true in real materials. In a compound, more than one species of atom may contribute to the low-energy physics,56 or different atoms of the same species may be found at crystallographic distinct sites.43,57 A simple model that describes this situation is the ionic Hubbard model: † cˆiσ cˆj σ + U nˆ i↑ nˆ i↓ + εi nˆ iσ (10.107) Hˆ iH = −t ij σ

i

iσ

where εi = tii is the site energy, which will be taken to be different on different sites. Note that in the standard form of the ionic Hubbard model, all sites are assumed to have the same U . An important application of the ionic Hubbard model is in describing transition metal oxides.56 Typically, εi is larger on the transition metal site than on the oxygen site; therefore, the oxygen orbitals are nearly filled. This means that there

352

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

is a low hole density in the oxygen orbitals and hence that electronic correlations are less important for the electrons in the oxygen orbitals than for electrons in transition metal orbitals. If the difference between εi on the oxygen sites and εi on the transition metal sites is large enough, the oxygen orbitals are completely filled in all low-energy states and therefore need not feature in the low-energy description of the material. However, just because the oxygen orbitals do not appear explicitly in the effective low-energy Hamiltonian of the material does not mean that the oxygen does not have a profound effect on low-energy physics. To see this, consider a toy model with two metal sites (labeled 1 and 2) and one oxygen site (labeled O), whose Hamiltonian is Hˆ iH3 = −t

σ

† † † † (cˆ1σ cˆOσ + cˆOσ cˆ1σ + cˆ2σ cˆOσ + cˆOσ cˆ2σ ) +

iσ

2

(nˆ 1σ + nˆ 2σ − nˆ Oσ ) (10.108)

as sketched in Fig. 10.14, which is just the ionic Hubbard model with U = 0 and = ε1 − εO = ε2 − εO > 0. With three electrons in the system and t = 0, the ground state is fourfold degenerate, the ground states have two electrons on the O atom and the other electron on one of the metal atoms. If we now consider finite, but small t , we can construct a perturbation in t/. One √ theory † † † † finds that there is a splitting between the bonding, (1/ 2)(cˆ1σ + cˆ2σ )cˆO↑ cˆO↓ |0 √ † † † † and antibonding, (1/ 2)(cˆ1σ − cˆ2σ )cˆO↑ cˆO↓ |0, states. The processes that lead to this splitting are sketched in Fig. 10.15. Therefore, our effective low-energy Hamiltonian is a tight-binding model involving just the metal atoms: Hˆ eff = −t ∗

σ

† † (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ )

(10.109)

where, to second order in t/, the effective metal-to-metal hopping integral is given by t∗ = −

t2

(10.110)

Fig. 10.14 (color online) Toy model for a transition metal oxide, Hamiltonian equation (10.108), with two transition metal sites (1 and 2) and a single oxygen site (O).

HOLSTEIN MODEL

353

E=E0

t

E–E0=

t

E=E0

Fig. 10.15 (color online) Processes described by Hamiltonian equation (10.108) that give rise to the effective hopping integral between the two transition metal atom sites.

Note that even though t is positive, t ∗ < 0 (or, equivalently, β∗ > 0), in contrast to our naive expectation that hopping integrals are positive (β < 0; cf. Section 10.2).

10.6 HOLSTEIN MODEL

So far we have assumed that the nuclei or ions form a passive background through which the electrons move. However, in many situations this is not the case. Atoms move and these lattice/molecular vibrations interact with the electrons via the electron–phonon/vibronic interaction. One of the simplest models of such effects is the Holstein model, which we discuss below. Electron–vibration interactions play important roles across science. In physics, electron–phonon interactions can give rise to superconductivity,58 spin and charge density waves,59 polaron formation,60 and piezoelectricity.58 In chemistry, vibronic interactions affect electron-transfer processes,61 Jahn–Teller effects, spectroscopy, stereochemistry, activation of chemical reactions, and catalysis.62 In biology the vibronic interactions play important roles in photoprotection,63 photosynthesis,64 and vision.65 It is therefore clear that one of the central tasks for condensed matter theory and theoretical chemistry is to describe electron–vibration interactions.

354

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

In general, one may write the Hamiltonian of a system of electrons and nuclei as Hˆ = Hˆ e + Hˆ n + Hˆ en

(10.111)

where Hˆ e contains those terms that affect only the electrons, Hˆ n contains those terms that affect only the nuclei, and Hˆ en describes the interactions between the electrons and the nuclei. Hˆ e might be any of the Hamiltonians we have discussed above. However, for the Holstein model one assumes a tight-binding form for Hˆ e . In the normal-mode approximation,62 which we will make, one treats molecular and lattice vibrations as harmonic oscillators (cf. Section 10.1.1). As the ions carry a charge, any displacement of the ions from their equilibrium positions will change the potential felt by the electrons. The Holstein model assumes that each vibrational mode is localized on a single site. For this to be the case, the site must have some internal structure (i.e., the site cannot correspond to a single atom). Therefore, the Holstein model is more appropriate for molecular solids than for simple crystals. For small displacements, xiμ , of the μth mode of the i th lattice site, we can perform a Taylorexpansion in the dimensionless normal coordinate of the vibration, Qiμ = xiμ miμ ωiμ /, where miμ and ωiμ are, respectively, the mass and the frequency of the μth mode on the i th site, and we find that ∂tij † Qiμ (cˆiσ cˆj σ + cˆj†σ cˆiσ ) + · · · . (10.112) Hˆ en = ∂Qiμ ij σμ

In the Holstein model one assumes that the derivative vanishes for i = j . We may quantize the vibrations in the usual way (cf. Section 10.1.1), which yields † † Hˆ en = giμ (aˆ iμ + aˆ iμ )cˆiσ cˆiσ (10.113) iσμ (†) destroys (creates) a quantized vibration in the μth mode on the i th where aˆ iμ

† site, giμ = 2−1/2 ∂tii /∂Qiμ , and Hˆ n = iμ ωiμ aˆ iμ aˆ iμ . Thus,

Hˆ Holstein = −t

ij σ

† cˆiσ cˆj σ +

† ωiμ aˆ iμ aˆ iμ +

iμ

† † giμ (aˆ iμ + aˆ iμ )cˆiσ cˆiσ

iσμ

(10.114) 10.6.1 Two-Site Holstein Model

If we assume that there is only one electron and one mode per site, the Holstein model simplifies to † † † † (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ ) + ω aˆ i aˆ i + g (aˆ i + aˆ i )nˆ i Hˆ Holstein = −t σ

i

i

(10.115)

HOLSTEIN MODEL

355

† on two symmetric sites, where nˆ i = σ nˆ iσ = σ cˆiσ cˆiσ . It is useful to change the basis in√which we consider the phonons to that of in-phase (symmetric), sˆ = √ (aˆ 1 + aˆ 2 )/ 2, and out-of-phase (antisymmetric), bˆ = (aˆ 1 − aˆ 2 )/ 2, vibrations. In this basis one finds that Hˆ Holstein = Hˆ s + Hˆ be

(10.116)

g Hˆ s = ωˆs † sˆ + √ (ˆs † + sˆ )(nˆ 1 + nˆ 2 ) 2

(10.117)

where

and Hˆ be = −t

σ

g † † ˆ nˆ 1 − nˆ 2 ) (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ ) + ωbˆ † bˆ + √ (bˆ † + b)( 2

(10.118)

Note that nˆ 1 + nˆ 2 = N , the total number of electrons in the problem. As N is a constant of the motion, the dynamics of the electrons cannot affect the symmetric vibrations, and vice versa. Hence all of the interesting effects are contained in Hˆ be and we need only study this Hamiltonian below. 10.6.1.1 Diabatic Limit, –hω t In the diabatic limit the vibrational modes are assumed to adapt themselves instantaneously to the particle’s position. Thus,

g ˆ nˆ 1 − nˆ 2 ) = ωbˆ † bˆ ± √g (bˆ † + b) ˆ ωbˆ † bˆ + √ (bˆ † + b)( 2 2

(10.119)

The plus sign is relevant when the electron is located on site 1 and the minus sign is relevant when the electron is on site 2. We now introduce the displaced oscillator transformation, 1 g † = bˆ † ± √ bˆ± 2 ω

(10.120)

Therefore, we find that Hˆ be = −t

σ

† † † ˆ † ˆ (cˆ1σ cˆ2σ + cˆ2σ cˆ1σ ) + ω(bˆ+ b+ + bˆ− b− ) −

g2 2 ω2

(10.121)

It is important to note that the operators bˆ+ and bˆ− satisfy the same commutation relations as the bˆ operator; therefore, they describe bosonic excitations. We define the ground states of the displaced oscillators by bˆ− |0− = 0 and bˆ+ |0+ = 0. Therefore, ˆ + = − √1 g |0+ b|0 2 ω

(10.122)

356

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

and hence √

2g |0+ bˆ− |0+ = − ω

(10.123)

Similarly, √ bˆ+ |0− =

2g |0− ω

(10.124)

√ that is, |0± is an eigenstate of bˆ∓ with eigenvalue ∓ 2g/ω. The eigenstates of bosonic annihilation operators are known as coherent states.66 Equations (10.122) to (10.124) therefore show that the ground state of one of the bˆ± operators may be written as a coherent state of the other operator67 : √ 2g 1 ˆ † ± b∓ |0∓ |0± = exp − ω 2

(10.125)

Therefore, g2 0+ |0− = exp − 2 2 ω

(10.126)

which is known as the Franck–Condon factor. The Franck–Condon factor describes the fact that in the diabatic limit, the bosons cause a “drag” on the electronic hopping. That is, we can describe the solution of the diabatic limit in terms of an effective two-site tight-binding model if we replace t by g2 t ∗ = t0+ |0− = t exp − 2 2 ω

(10.127)

Thus, the hopping integral is renormalized by the interactions of the electron with the vibrational modes (cf. Section 10.7). This renormalization is also found in the solution for an electron moving on a lattice in the diabatic limit. In this context the exponential factor is known as polaronic band narrowing.60 The exponential factor results from the small overlap of the two displaced operators and may be thought of as an increase in the effective mass of the electron. – ω t We begin by noting that as there is only one 10.6.1.2 Adiabatic Limit, h electron, the spin of the electron only leads to a trivial twofold degeneracy and therefore can be neglected without loss of generality. A useful notational change † † cˆ1σ − cˆ2σ cˆ2σ is to introduce a pseudospin notation where we define σˆ z = cˆ1σ

HOLSTEIN MODEL

357

† † and σˆ x = cˆ1σ cˆ2σ + cˆ2σ cˆ1σ . Therefore, the one-electron two-site Holstein model Hamiltonian becomes

g ˆ σz Hˆ sb = −t σˆ x + ωbˆ † bˆ + √ (bˆ † + b)ˆ 2

(10.128)

which is often referred to as the spin-boson model . Let us now replace the bosonic operators by position and momentum operators for the harmonic oscillator defined as ˆ† ˆ (b + b) (10.129) xˆ = 2mω and pˆ = i

mω ˆ † ˆ (b − b) 2

(10.130)

Therefore, mω 1 pˆ 2 2 ˆ + mωxˆ + g xˆ σˆ z Hsb = −t σˆ x + 2m 2

(10.131)

The adiabatic limit is characterized by a sluggish bosonic bath that responds only very slowly to the motion of the electron (i.e., pˆ 2 /2m → 0), which it is often helpful to think of as the m → ∞ limit. Further, in the adiabatic limit the Born–Oppenheimer approximation2,67 holds, which implies that the total wavefunction of the system, |, is a product of a electronic (pseudospin) wavefunction, |φe , and a vibrational (bosonic) wavefunction, |ψv (i.e., | = |φe ⊗ |ψv ). Therefore, the harmonic oscillator will be in a position eigenstate and we may replace the position operator, x, ˆ by a classical position x , yielding

1 mω x σˆ z + mωx 2 Hˆ sb = −t σˆ x + g 2 mω 1 g x −t = + mωx 2 x −t −g mω 2

(10.132) (10.133)

where in the second line we have simply switched to the matrix representation of the Pauli matrices. This is easily solved and one finds that the eigenvalues are 1 E± = mωx 2 ± 2 ≈

mω 2 2 g x t2 +

mωg 2 x 2 1 mωx 2 ± ±t 2 2t

(10.134) (10.135)

358

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Fig. 10.16 (color online) Energies of the ground and excited states for a single electron in the two-site Holstein model in the adiabatic weak coupling limit (t g ω), calculated from Eq. (10.134). x is the position of the harmonic oscillator describing out-of-phase vibrations.

where Eq. (10.135) holds in the weak-coupling limit, gx t. We plot the variation of these eigenvalues with x in this limit in Fig. 10.16. Notice that for the electronic ground state, E− , the lowest-energy states have x = 0. This is an example of spontaneous symmetry breaking,68 as the ground state of a system has a lower symmetry than the Hamiltonian of the system. Thus, the system must “choose” either the left well or the right well (but not both) in order to minimize its energy.

10.7 EFFECTIVE HAMILTONIAN OR SEMIEMPIRICAL MODEL?

The models discussed in this chapter are generally known as either empirical or semiempirical models in a chemical context and as effective Hamiltonians in the physics community. Here the difference is not just nomenclature but is also indicative of an important difference in the epistemological status awarded to these models by the two communities. In this section I describe two different attitudes toward semiempirical models and effective Hamiltonians and discuss the epistemological views embodied in the work of two of the greatest physicists of the twentieth century. 10.7.1 Diracian Worldview

Paul Dirac famously wrote69 that “the fundamental laws necessary for the mathematical treatment of a large part of physics and the whole of chemistry are thus

EFFECTIVE HAMILTONIAN OR SEMIEMPIRICAL MODEL?

359

completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved.” There is clearly a great deal of truth in the statement. In solid-state physics and chemistry we know that the Schr¨odinger equation provides an extraordinarily accurate description of the phenomena observed. Gravity, the weak and strong nuclear forces, and relativistic corrections are typically unimportant; thus, all of the interactions boil down to nonrelativistic electromagnetic effects. Dirac’s world view is realized in the ab initio approach to electronic structure, wherein one starts from the Hartree–Fock solution to the full Schr¨odinger equation in some small basis set. One then adds in correlations via increasingly complex approximation schemes and increases the size of the basis set, in the hope that with a sufficiently large computer one will find an answer that is “sufficiently close” to the exact solution (full CI in an infinite complete basis set). In the last few decades rapid progress has been made in ab initio methods due to an exponential improvement in computing technology, methodological progress, and the widespread availability of implementations of these methods.70 However, this progress is unsustainable: The complexity recognized by Dirac eventually limits the accuracy possible from ab initio calculations. Indeed, solving the Hamiltonian given in Eq. (10.24) is known to be computationally difficult. Feynman proposed building a computer that uses the full power of quantum mechanics to carry out quantum simulations.71 Indeed, the simplest of all quantum chemical problems, the H2 molecule in a minimal basis set, has been solved on a prototype quantum computer.72 But while even a rather small scale quantum computer (containing just a few hundred qubits72 ) would provide a speed-up over classical computation, it is believed that the solution of Hamiltonian (10.24) remains difficult even on a quantum computer [i.e., it is believed that even a quantum computer could not solve Hamiltonian equation (10.24) in a time that grows only polynomially with the size of the system73 ]. Further, simple extensions of these arguments provide strong reasons to believe that there is no efficiently computable approximation to the exact functional in density functional theory.73 Therefore, it appears that the equations will always remain “too complex to be solved” directly. This suggests that semiempirical models will always be required for large systems. 10.7.2 Wilsonian Project

Typically, one is only interested in a few low-energy states of a system, perhaps the ground state and the first few excited states. Therefore, as long as our model gives the correct energies for these low-energy states, we should regard it as successful. This apparently simple realization, particularly as embodied by Wilson’s renormalization group,74 has had profound implications throughout modern physics from high-energy particle physics to condensed matter physics. The basic idea of renormalization is remarkably simple. Imagine starting with some system that has a large number of degrees of freedom. As we have noted,

360

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

for practical purposes we care only about the lowest-energy states. Therefore, one might be tempted to simplify the description of the system by discarding the highest-energy states. However, simply discarding such states will cause a shift in the low-energy spectrum. Therefore, one must remove the high-energy states that complicate the description and render the problem computationally intractable in such a way as to preserve the low-energy spectrum. This is often referred to as “integrating out” the high-energy degrees of freedom (because of the way this process is carried out in the path-integral formulation of quantum mechanics75 ). Typically, integrating out the high-energy degrees of freedom causes the parameters of the Hamiltonian to “flow” or “run” (i.e., change their values). When this happens, one says that the parameters are renormalized. A simple example is the Coulomb interaction between the two electrons in a neutral helium atom. For simplicity, let’s imagine trying to calculate just the ground-state energy. We begin by analyzing the problem in the absence of a Coulomb interaction between the two electrons. In the ground state both electrons occupy the 1s orbital. We would like to work in as small a basis set as possible. The simplest approach is just to work in the minimal basis set, which in this case is just the two 1s spin-orbitals, φ1sσ (r). The total energy of a He atom neglecting the interelectron Coulomb interaction is −108.8 eV (relative to the completely ionized state). Now we restore the Coulomb repulsion between electrons. A simple question is: How much does this change the total energy of the He atom? In the minimal basis set the solution seems straightforward: 1s2 |V |1s2 =

∞ −∞

d 3 r1

∞ −∞

d 3 r2

e2 |φ1s↑ |2 |φ1s↓ |2 4πε0 |r1 − r2 |

34.0 eV

(10.136)

Therefore, it is tempting to conclude that we can model the He atom by a one-site Hubbard model with U = 1s2 |V |1s2 . However, this yields a total energy for the He atom of −74.8 eV, which is not particularly close to the experimental value of −78.975 eV.7 Let us then continue to consider the problem in the basis set of the hydrogenic atom, which is complete due to the spherical symmetry of the Hamiltonian. One can now straightforwardly carry out a perturbation theory around the noninteracting electron solution, where we take H0 =

2

i=1

2 ∇i2 e2 − − 2m πε0 |ri |

(10.137)

and H1 =

e2 4πε0 |r1 − r2 |

(10.138)

EFFECTIVE HAMILTONIAN OR SEMIEMPIRICAL MODEL?

361

A detailed description of this perturbation theory is given in Chapter 18 of Gasiorowicz.7 However, for our discussion, the key point is that in this perturbation theory, the term 1s2 |V |1s2 is simply the first-order correction to the ground-state energy. It is therefore clear why the minimal basis set gives such a poor result: It ignores all the higher-order corrections to the total energy. The failure of the simple minimal basis set calculation does not, however, mean that the effective Hamiltonian approach also fails, despite the fact that the effective Hamiltonian is also in an extremely small basis set. Rather, one must realize that as well as the first-order contributions, U also contains contributions from higher orders in perturbation theory. It is therefore possible, although extremely computationally demanding, to calculate the parameters for effective Hamiltonians from this type of perturbation theory.76 A more promising approach, which has been applied to a number of molecular crystals,77,78 is to use atomistic calculations to parameterize an effective Hamiltonian. For example, density functional theory gives quite reasonable values for the total energy of the ground state of many molecules. Therefore, one approach to calculating the Hubbard U is to calculate the ionization energy, I = E0 (N − 1) − E0 (N ), and the electron affinity, A = E0 (N ) − E0 (N + 1), of the molecule, where E0 (n) is the ground-state energy of the molecule when it contains n electrons and N is the filling corresponding to a half-filled band. One finds that U = I − A = E0 (N + 1) + E0 (N − 1) − 2E0 (N ). A simple way to see this is that if we assume the molecule is neutral when it contains N electrons, then U corresponds to the energy difference in the charge disproportionation reaction 2M M+ + M− for two well-separated molecules, M. A more extensive discussion of this approach is given by Scriven et al.77 It is worth noting that we have actually carried out this program of parameterizing effective Hamiltonians three times in the discussion above. In Section 10.4.3 we showed that the Heisenberg model is an effective low-energy model for the half-filled Hubbard model in the limit t/U → 0. In Section 10.5.3 we derived an effective tight-binding model that involved only the metal sites from an ionic Hubbard model of a transition metal oxide. Finally, in Section 10.6.1.1 we showed that vibronic interactions lead to an effective tight binding model describing the low-energy physics of the Holstein model in the diabatic limit, and that in this model the quasiparticles (electron-like excitations) are polarons, a bound state of electrons and vibrational excitations with a mass enhanced over that of the bare electron. However, to date, the most important method for parameterizing effective Hamiltonians has been to fit the parameters to a range of experimental data—whence the name semiempirical . Of course, experimental data contain all corrections to all orders; therefore, this is indeed an extremely sensible thing to do. But it is important to understand that empiricism is not a dirty word. Indeed, empiricism is what distinguishes science from other belief systems. Further, this empirical approach is exactly the approach that the mathematics tells one to take. It is also important to know that no quantum chemical or solid-state calculation is truly ab initio—the nuclear and electronic masses and the charge

362

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

on the electron are all measured rather than calculated. Indeed, the modern view of the “standard model” of particle physics is that it, too, is an effective low-energy model.49 For example, in quantum electrodynamics (QED), the quantum field theory of light and matter, the bare charge on the electron is, for all practical purposes, infinite. But the charge is renormalized to the value seen experimentally in a manner analogous to the renormalization of the Hubbard U of He discussed above. Therefore, as we do not at the time of writing know the correct mathematical description of processes at higher energies, all of theoretical science should, perhaps, be viewed as the study of semiempirical effective low-energy Hamiltonians.79 Finally, the most important point about effective Hamiltonians is that they promote understanding. Ultimately, the point of science is to understand the phenomena we observe in the world around us. Although the ability to perform accurate numerical calculations is important, we should not allow this to become our main goal. The models discussed above provide important insights into the chemical bond, magnetism, polarons, the Mott transition, electronic correlations, the failure of mean-field theories, and so on. All of these effects are much more difficult to understand simply on the basis of atomistic calculations. Further, many important effects seen in crystals, such as the Mott insulator phase, are not found methods such as density functional theory or Hartree–Fock theory, while post-Hartree–Fock methods are not practical in infinite systems. Thus effective Hamiltonians have a vital role to play in developing the new concepts that are required if we are to understand the emergent phenomena found in molecules and solids.80 Acknowledgments

I would like to thank Balazs Gy¨orffy, who taught me that “you can’t not know” many of things discussed above. I also thank James Annett, Greg Freebairn, Noel Hush, Anthony Jacko, Bernie Mostert, Seth Olsen, Jeff Reimers, Edan Scriven, Mike Smith, Eddy Yusuf, and particularly, Ross McKenzie, for many enlightening conversations about the topics discussed and for showing me that chemistry is a beautiful and rich subject with many simplifying principles. I would also like to thank Bernd Braunecker, Karl Chan, Sergio Di Matteo, Anthony Jacko, Ross McKenzie, Seth Olsen, Eddie Ross, and Kristian Weegink for their insightful comments on an early draft of the chapter. I am supported by a Queen Elizabeth II fellowship from the Australian Research Council (project DP0878523).

REFERENCES 1. Fulde, P. Electron Correlations in Molecules and Solids, Springer-Verlag, Berlin, 1995. 2. Schatz, G. C.; Ratner, M. A. Quantum Mechanics in Chemistry, Prentice Hall, Englewoods Cliffs, NJ, 1993.

REFERENCES

363

3. Mahan, G. D.; Many-Particle Physics, Kluwer Academic, New York, 2000. 4. Goldstein, H.; Poole, C.; Safko, J. Classical Mechanics, Addison-Wesley, Reading, MA, 2002. 5. Atkins, P.; de Paula, J. Atkins’ Physical Chemistry, Oxford University Press, Oxford, UK, 2006. 6. See, e.g., Rae, A. I. M. Quantum Mechanics, Institute of Physics Publishing, Bristol, UK, 1996. 7. See, e.g., Gasiorowicz, S. Quantum Physics, Wiley, Hoboken, NJ, 2003. 8. Jordan, P.; Wigner, E. Z. Phys. 1928, 47 , 631–651. 9. Lowe, J. P.; Peterson, K. A. Quantum Chemistry, Elsevier, Amsterdam, 2006. 10. Ashcroft, N. W.; Mermin, N. D. Solid State Physics, Holt, Rinehart and Winston, New York, 1976. 11. Tinkham, M. Group Theory and Quantum Mechanics, McGraw-Hill, New York, 1964. 12. Lax, M. Symmetry Principles in Solid State and Molecular Physics, Wiley, New York, 1974. 13. McWeeny, R. Coulson’s Valence, Oxford University Press, Oxford, UK, 1979. 14. Brogli, F.; Heilbronner, E. Theor. Chim. Acta 1972, 26 , 289–299. 15. See, e.g., Arfken, G. Mathematical Methods for Physicists, 3rd ed., Academic Press, Orlando, FL, 1985. 16. Mandl, F. Statistical Physics, Wiley, Chichester, UK, 1998. 17. See pp. 799–800 in Ref. 15. 18. (a) Castro Neto, A. H.; Guinea, F.; Peres, N. M. R.; Novoselov, K. S.; Geim, A. K. Rev. Mod. Phys. 2009, 81 , 109–162. (b) Castro Neto, A. H.; Guinea, F.; Peres, N. M. R. Phys. World 2006, 19 , 33–37. 19. (a) Novoselov, K. S.; Geim, A. K.; Morozov, S. V.; Jiang, D.; Zhang, Y.; Dubonos, S. V.; Gregorieva, I. V.; Firsov, A. A. Science 2004, 306 , 666–669. (b) Choucair, M.; Thordarson, P.; Stride, J. A. Nature Nanotechnol . 2009, 4 , 30–33. 20. Schr¨odinger, E. Ann. Phys. 1926, 79 , 361–428. 21. Heitler, W.; London, F. Z. Phys. 1927, 44 , 455–472. 22. Pauling, L. The Nature of the Chemical Bond and the Structure of Molecules and Crystals, Cornell University Press, Ithaca, NY, 1960. 23. Mott, N. F. Proc. R. Soc. A 1949, 62 , 416–422. 24. Powell, B. J.; McKenzie, R. H. J. Phys. Condens. Matter 2006, 18 , R827–R865. 25. Cohen, A. J.; Mori-Sanchez, P.; Yang, W. T. Science 2008, 321 , 792–794. 26. (a) Anderson, P. W. Science 1987, 235 , 1196–1198. (b) Zhang, F. C.; Gross, C.; Rice, T. M.; Shiba, H. Supercond. Sci. Technol . 1988, 1 , 36–46. 27. Anderson, P. W. Phys. Today 2008, 61 (4), 8–9. 28. Powell, B. J.; McKenzie, R. H. Phys. Rev. Lett. 2005, 94 , 047004; Gan, J. Y.; Chen, Y.; Su, Z. B.; Zhang, F. C. Phys. Rev. Lett. 2005, 94 , 067005; Liu, J.; Schmalian, J.; Trivedi, N. Phys. Rev. Lett. 2005, 94 , 127003. 29. R¨ossler, U. Solid State Theory, Springer-Verlag, Berlin, 2004. 30. Mohn, P.; Wohlfarth, E. P. J. Magn. Magn. Mater. 1987, 68 , L283–L285. 31. Jacko, A. C.; Fjærestad, J. O.; Powell, B. J. Nature Phys. 2009, 5 , 422–425.

364

32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.

45. 46. 47.

48. 49. 50. 51. 52. 53. 54. 55. 56. 57.

58. 59.

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

Gutzwiller, M. C. Phys. Rev. Lett. 1963, 10 , 159–162. Brinkmann, W. F.; Rice, T. M. Phys. Rev. B 1970, 2 , 4302–4304. Lieb, E. H.; Wu, F. Y. Phys. Rev. Lett. 1968, 20 , 1445–1448. Essler, F. H. L.; Frahm, H.; G¨ohmann, F.; Kl¨umper, A.; Korepin, V. E. The OneDimensional Hubbard Model , Cambridge University Press, Cambridge, UK, 2005. Tsvelik, A. M. Quantum Field Theory in Condensed Matter Physics, Cambridge University Press, Cambridge, UK, 1996. Kotliar, G.; Vollhardt, D. Phys. Today 2004, 57 (3), 53–59. Kollar, M.; Strack, R.; Vollhardt, D. Phys. Rev. B 1996, 53 , 9225–9231. Maier, T.; Jarrell, M.; Pruschke, T.; Hettler, M. H. Rev. Mod. Phys. 2005, 77 , 1027–1080. Kotliar, G.; Savrasov, S. Y.; Haule, K.; Oudovenko, V. S.; Parcollet, O.; Marianetti, C. A. Rev. Mod. Phys. 2006, 78 , 865–951. Nagaoka, Y. Phys. Rev . 1966, 145 , 392–405. Tian, G. J. Phys. A 1990, 23 , 2231–2236. Merino, J.; Powell, B. J.; McKenzie, R. H. Phys. Rev. B 2006, 73 , 235107. Shaik, S.; Hiberty, P. C. Valence bond theory: its history, fundamentals, and applications—a primer. In Reviews in Computational Chemistry, Lipkowitz, K. B., Larter, R., and Cundari, T. R., Eds., Wiley-VCH, Hoboken, NJ, 2004, pp. 1–100. Sakurai, J. J. Modern Quantum Mechanics, Addison-Wesley, Reading, MA, 1994. Chao, K. A.; Spałek, J.; Ole´s, A. M. J. Phys. C 1977, 10 , L271–L276. Brockhouse, B. N. Slow neutron spectroscopy and the grand atlas of the physical world. In Nobel Lectures in Physics, 1991–1995 , Ekspong, G., Ed.; World Scientific, Singapore, 1997. Also available at http://nobelprize.org/nobel_prizes/physics/ laureates/1994/brockhouse-lecture.html. Zaliznyak, I. A. Nature Mater. 2005, 4 , 273–275. Griffiths, D. Introduction to Elementary Particles, Wiley-VCH, Weinheim, Germany, 2008. (a) Coldea, R.; Tennant, D. A.; Tylczynski, Z. Phys. Rev. B 2003, 68 , 134424. (b) Lake, B.; Tennant, D. A.; Frost, C. D.; Nagler, S. E. Nature Mater. 2005, 4 , 329–334. Lee, P. A. Science 2008, 321 , 1306–1307. Shimizu, Y.; et al. Phys. Rev. Lett. 2003, 91 , 107001. Helton, J.; et al. Phys. Rev. Lett. 2007, 98 , 107204. Okamoton, Y.; et al. Phys. Rev. Lett. 2007, 99 , 137207. Raczkowski, M.; Fr´esard, R.; Ole´s, A. M. J. Phys. Condens. Matter 2006, 18 , 7449–7469. Sarma, D. D. J. Solid State Chem. 1990, 88 , 45–52. (a) Merino, J.; Powell, B. J.; McKenzie, R. H. Phys. Rev. B 2009, 79 , 161103(R). (b) Merino, J.; McKenzie, R. H.; Powell, B. J. Phys. Rev. B 2009, 80 , 045116. (c) Powell, B. J.; Merino, J.; McKenzie, R. H. Phys. Rev. B 2009, 80 , 085113. See, e.g., Ziman, J. M. Electrons and Phonons, Oxford University Press, Oxford, UK, 1960. For a review, see Gr¨uner, G. Density Waves in Solids, Perseus Publishing, Cambridge, UK, 1994.

REFERENCES

365

60. See, e.g., Alexandrov, A. S.; Mott, N. F. Polarons and Bipolarons, World Scientific, Singapore, 1995. 61. For a review, see Marcus, R. A. Rev. Mod. Phys. 1993, 65 , 599–610. 62. See, e.g., Bersuker, I. B. The Jahn–Teller Effect and Vibronic Interactions in Modern Chemistry, Plenum Press, New York, 1984. 63. (a) Olsen, S.; Riesz, J.; Mahadevan, I.; Coutts, A.; Bothma, J. P.; Powell, B. J.; McKenzie, R. H.; Smith, S. C.; Meredith, P. J. Am. Chem. Soc. 2007, 129 , 6672–6673. (b) Meredith, P.; Powell, B. J.; Riesz, J.; Nighswander-Rempel, S.; Pederson, M. R.; Moore, E. Soft Matter 2006, 2 , 37–44. 64. Reimers, J. R.; Hush, N. S. J. Am. Chem. Soc. 2004, 126 , 4132–4144. 65. Hahn, S.; Stock, G. J. Phys. Chem. B 2000, 104 , 1146–1149. 66. Walls, D. F.; Milburn, G. J. Quantum Optics, Springer-Verlag, Berlin, 2006. 67. Weiss, U. Quantum Dissipative Systems, World Scientific, Singapore, 2008. 68. For an introductory discussion of broken symmetry, see, e.g., Blundell, S. J. Magnetism in Condensed Matter , Oxford University Press, Oxford, UK, 2001. For a more advanced discussion, see, e.g., Anderson, P. W. Basic Notions of Condensed Matter Physics, Benjamin-Cummings, Menlo Park, CA, 1984. 69. Dirac, P. Proc. R. Soc. A 1929, 123 , 714–733. 70. (a) Pople, J. A. Rev. Mod. Phys. 1999, 71 , 1267–1274. (b) Truhlar, D. G. J. Am. Chem. Soc. 2008, 130 , 16824–16827. 71. Feynman, R. P. Int. J. Theor. Phys. 1982, 21 , 467–488. 72. Lanyon, B. P.; Whitfield, J. D.; Gillet, G. G.; Goggin, M. E.; Almeida, M. P.; Kassal, I.; Biamonte, J. D.; Mohseni, M.; Powell, B. J.; Barbieri, M.; Aspuru-Guzik, A.; White, A. G. Nature Chem. 2010, 2 , 106–111. 73. Schuch, N.; Verstraete, F. Nature Phys. 2009, 5 , 732–735. 74. Goldenfeld, N. D. Lectures on Phase Transitions and the Renormalisation Group, Addison-Wesley, Reading, MA, 1992. 75. See, e.g., Wen, X.-G. Quantum Field Theory of Many-Body Systems, Oxford University Press, Oxford, UK, 2004. 76. (a) Freed, K. F. Acc. Chem. Res. 1983, 16 , 137–144. (b) Gunnarsson, O. Phys. Rev. B 1990, 41 , 514–518. (c) Iwata, S.; Freed, K. F. J. Chem. Phys. 1976, 65 , 1071–1088. (d) Graham, R. L.; Freed, K. F. J. Chem. Phys. 1992, 96 , 1304–1316. (e) Martin, C. M.; Freed, K. F. J. Chem. Phys. 1994, 100 , 7454–7470. (f) Stevens, J. E.; Freed, K. F.; Arendt, F.; Graham, R. L. J. Chem. Phys. 1994, 101 , 4832–4841. (g) Finley, J. P.; Freed, K. F. J. Chem. Phys. 1995, 102 , 1306–1333. (h) Stevens, J. E.; Chaudhuri, R. K.; Freed, K. F. J. Chem. Phys. 1996, 105 , 8754–8768. (i) Chaudhuri, R. K.; Freed, K. F. J. Chem. Phys. 2003, 119 , 5995–6002. (j) Chaudhuri, R. K.; Freed, K. F. J. Chem. Phys. 2005, 122 , 204111. 77. (a) Scriven, E.; Powell, B. J. J. Chem. Phys. 2009, 130 , 104508. (b) Phys. Rev. B . 2009, 80, 205107. 78. (a) Martin, R. L.; Ritchie, J. P. Phys. Rev. B 1993, 48 , 4845–4849. (b) Antropov, V. P.; Gunnarsson, O.; Jepsen, O. Phys. Rev. B 1992, 46 , 13647–13650. (c) Pederson, M. R.; Quong, A. A. Phys. Rev. B 1992, 46 , 13584–13591. (d) Brocks, G.; van den Brink, J.; Morpurgo, A. F. Phys. Rev. Lett. 2004, 93 , 146405. (e) Cano-Cort´es, L.; Dolfen, A.; Merino, J.; Behler, J.; Delley, B.; Reuter, K.; Koch, E. Eur. Phys. J. B 2007, 56 , 173–176.

366

INTRODUCTION TO EFFECTIVE LOW-ENERGY HAMILTONIANS

79. For an accessible and highly outspoken discussion of these ideas, see Laughlin, R. B.; Pines, D. Proc. Natl. Acad. Sci. USA 2000, 97 , 28–31; Laughlin, R. B. A Different Universe, Basic Books, New York, 2005. 80. Anderson, P. W. Science 1972, 177 , 393–396. 81. Powell, B. J. Chem. Aust. 2009, 76 , 18–21.

PART D Advanced Applications

11

SIESTA: Properties and Applications MICHAEL J. FORD School of Physics and Advanced Materials, University of Technology, Sydney, NSW, Australia

SIESTA provides access to the usual set of properties common to most DFT implementations:

• • • • • • • • • • • • •

Total energy, charge densities, and potentials Atomic forces and unit cell stresses Geometry specification in Cartesian and/or internal z -matrix coordinates Geometry optimization using the conjugate gradient, modified Broyden and Fire algorithms, and simulated annealing Total and partial densities of states Band dispersions Constant energy, temperature, or pressure molecular dynamics Simulation of scanning tunneling microscope images according to the Tersoff–Hamann approximation Electron transport properties using the nonequilibrium Green’s function approach Optical properties and the frequency-dependent dielectric function within the random phase approximation and using first-order time-dependent perturbation theory Phonon spectrum and vibrational frequencies Mulliken population analysis Born charges

In this chapter a number of these properties are discussed through examples relevant to nanoscience and technology. The SIESTA methodology is described Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

369

370

SIESTA: PROPERTIES AND APPLICATIONS

in detail in Chapter 2; the present chapter is intended as an accompaniment. The first three examples illustrate the general capabilities of the SIESTA code for problems containing relatively small numbers of atoms and that are amenable to standard diagonalization to solve the self-consistent problem. The last example illustrates the divide-and-conquer linear-scaling capabilities to tackle problems containing large numbers of atoms.

11.1 ETHYNYLBENZENE ADSORPTION ON AU(111)

There has been considerable interest for some time in self-assembled monolayers (SAMs) in nanotechnology. They are relatively easy to prepare on a variety of surfaces, gold being the most common, with a wide range of molecules forming ordered molecular layers.1 – 3 They are a useful platform for controlling surface properties and providing functionality with applications in, for example, molecular electronics.4,5 The alkynyl group as method of anchoring SAMs to gold surfaces is a promising candidate to study. It should provide an unbroken conjugated pathway to the gold surface, unlike thiol linkers, and a wide range of terminal alkynes can be synthesized.6 Ethynylbenzene is a simple representative example of this class of molecule; there is some experimental evidence that it binds to gold surfaces and nanoparticles, although these studies are inconclusive about the nature of the bond.7,8 The calculations described below attempt to answer the question of whether this molecule is likely to form SAMs and the likely adsorption geometries and energetics.9,10 The computational conditions first have to be established and an appropriate representation of the semi-infinite surface in terms of a multilayer slab needs to be determined. The slab needs to contain sufficient layers that the center of the slab is relatively bulklike, or in this particular case so that a molecule adsorbed on one side of the slab is not influenced by the other surface. Conversely, the slab should not be too big, such that the calculations are prohibitively large. Figure 11.1 shows the convergence of surface charge density above the slab layer and convergence of the workfunction for an Au(111) slab as a function of the number of layers. Convergence of the workfunction with two computational parameters, reciprocal space grid (k -grid), and orbital confinement (energy shift) are also shown in Fig. 11.1A. The workfunction is calculated as the difference between the electrostatic potential in vacuum (i.e., at a position in the unit cell far above the surface) and the Fermi level. The charge density and density difference are extracted from the density matrix (saved to file at each SCF step) using the DENCHAR utility at the points of a userspecified plane, or volume. Charge densities can then be visualized using standard plotting packages. Alternatively, the charge densities and potentials evaluated over the real space grid used to represent the density matrix can be written to file directly from SIESTA by setting the appropriate input flags. These are written unformatted and need to be processed for plotting. The GRID2CUBE utility

ETHYNYLBENZENE ADSORPTION ON AU(111)

371

3 RMS MAX

dq (e– Bohr–3)

2.5 2 1.5 1 0.5 0

0

2

4

6 8 Number of layers

10

12

14

(A) 1

Au(111) work function (eV)

5

2

7x7

4

0.1

0.02

6

8

23 x 23

15 x 15

19 x 19

10

5 13 x 13

3

13

4 20 3

2

Layers Energy shift K-points

50

(B)

˚ above the Au(111) slab surface. Values are maximum Fig. 11.1 (A) Charge density 1 A and the RMS difference is with respect to a 13-layer slab. (B) Convergence of workfunction with number of slab layers, energy shift parameter (mRy), and k -point grid. [From Ref. 13 and R. C. Hoft, N. Armstrong, M. J. Ford, and M. B. Cortie, J. Phys. Condens. Matter, 19 215206 (2007), with permission. Copyright © IOP Publishing.]

will generate formatted output from these files in the format of a GAUSSIAN cube file. The calculations in Fig. 11.1 are for a 1 × 1 unit cell in the plane of the surface, that is, one atom per layer. The equivalent of a double-zeta plus polarization

372

SIESTA: PROPERTIES AND APPLICATIONS

(DZP) basis set is used. A generalized-gradient approximation to the exchangecorrelation functional according to Perdew–Burke–Ernzerhof (GGA-PBE)11 and a real-space integration grid with a 300-Ry cutoff are employed (1 Ry = 0.5 atomic unit of energy = ca.13.6 eV). It is often advisable to use a fine real-space grid to avoid numerical errors; the time penalty for such a grid is not generally a limiting factor. A cutoff of 300 Ry is well converged. A Troullier–Martins pseudopotential12 with scalar relativistic corrections is used to represent the core Au electrons, with a valence of 5d10 6s. Cutoff radii for each of the angular ˚ for s and p, 1.48 A ˚ for d momentum channels of the pseudopotential are 2.32 A and f. The quality of these pseudopotentials has been checked in the usual way by comparing against all electron calculations for the atom; they reproduce well the bulk properties of gold (lattice parameter, cohesive energy, and bulk modulus).13 It is interesting to note that values for the total and cohesive energies of bulk gold do not vary much between a single-zeta plus polarization (SZP) and a DZP basis set, while DZ is considerably worse. Where computational cost is a limiting factor, an SZP basis may be acceptable, although for adsorption energies DZP is probably necessary. √ The Au(111) surface is unusual in that it reconstructs to form a 3 × 22 struc˚ 14 although there is evidence that this reconstructure with a period of about 63 A, tion is lifted in the presence of adsorbed molecules.14,15 More recently, experimental measurements and calculations suggest that thiolate adsorption drives an alternative gold adatom structure and that these adatoms are an integral part of the adsorption motif.16 – 18 A detailed analysis of these points is beyond the scope of the present chapter, where we are more interested in demonstrating the utility of the SIESTA methodology. Accordingly, a bulk terminated Au(111) surface is assumed. Temperature smearing of the electron occupation is employed in these calculations to assist convergence of the SCF steps. Both the standard Fermi–Dirac function and the function proposed by Methfessel and Paxton19 are implemented in SIESTA. In this case it is the free energy F (T ) that is minimized during selfconsistency. The total energy in the athermal limit is then approximated by the expression Etot (T = 0) = 12 [Etot (T ) + F (T )]

(11.1)

The degree of smearing is determined by specifying a fictitious temperature to the electron distribution; in this case, a temperature corresponding to 25 meV is used. Charge density close to the slab surface has converged by four layers and thereafter oscillates slightly. The charge density should be a reasonable indicator of how the adsorption properties will converge. The workfunction is less sensitive to the number of slab layers and the k -grid. Again four layers and a 15 × 15 kgrid are reasonably converged. Only one k -point is required perpendicular to the surface because there is no periodicity in this direction. The workfunction is very sensitive to the energy shift, with values as small as 0.1 mRy required for good

ETHYNYLBENZENE ADSORPTION ON AU(111)

373

convergence. This level is impractical for realistic surface adsorption calculations, as it is extremely time intensive. It is worth noting that the converged value of the workfunction calculated here is 5.13 eV, compared with an experimental value of 5.31 eV.20 The conclusion from the data in Fig. 11.1 is that a four-layer slab is the minimum for obtaining reasonably converged results. Calculations of ethynylbenzene adsorption support this conclusion, the binding energy is converged to within about 0.05 eV for four layers and is essentially fully converged at seven layers. Two additional factors need to be considered when assessing adsorption calculations: basis set superposition errors (BSSE) and dipole corrections. BSSE is inherent in the use of atom-centered basis sets. The binding energy, EB , is determined from calculations of the total energies of slab + adsorbate, ET , slab alone, ES , and adsorbate alone, EA , according to EB = ET − (ES + EA )

(11.2)

The numbers of basis functions used to describe the two fragments, slab and adsorbate, are smaller than for the total system, leading to fewer variational degrees of freedom and hence overestimates of the total energies. Although this error is small for the total energies, it can amount to about 10% of the binding energy calculated from the difference of total energies according to Eq. (11.2). Here the established method of counterpoise correction is used to remove this effect.21 The same set of basis functions are used in the two fragment calculations, with zero charge assigned to those basis functions associated with the missing atoms, a procedure commonly referred to as ghosting. This is implemented in SIESTA by assigning the corresponding negative atomic number to ghosted atoms. The efficacy of counterpoise corrections has been debated in the literature and demonstrated to “correct” the binding energy in the wrong direction in certain circumstances22 ; it is however, a well-established and widely used technique. Dipole corrections are an artifact of periodic boundary conditions and arise in situations where an asymmetric geometry is used.23 Periodicity perpendicular to the slab surface imposes the condition that the potential must be identical at the cell boundary above and below the slab. However, if the slab is asymmetric, as is the case where adsorption occurs on only one slab surface, physically the potential is not identical and approaches different asymptotic values above and below. This leads to the presence of an additional unphysical potential that can distort optimized geometries and binding energies. One solution to this problem is to introduce a fictitious dipole charge layer in the vacuum portion of the unit cell parallel to the slab surface that can be included in the self-consistent field. This is not implemented in SIESTA. The problem can obviously be avoided by always using symmetric geometries, at the expense of requiring more atoms. In the present application this dipole layer is neglected, having little effect on optimized geometries and contributing less than 1% to binding energies. For more polar bonds between surface and adsorbate, one might expect the situation to be considerably worse.

374

SIESTA: PROPERTIES AND APPLICATIONS

Figure 11.2 shows the convergence of binding energy for ethynylbenzene on Au(111) against the number of k -points and energy shift. An energy shift of 5 mRy and 15 k -points gives well-converged values with binding energies reliable to better than 0.05 eV. The number of k -points corresponds to a 5 × 5 grid giving 15 symmetry unique points. SIESTA uses inversion symmetry in the reciprocal

Relative Binding Energy (eV)

0.5

0

–0.5

–1

–1.5

-2

0

20

40 60 Number of k-points

80

100

(A)

Relative Binding Energy (eV)

0.05

0

–0.05

–0.1

–0.15

–0.2 0.1

1 Energy Shift (mRy)

10

(B)

Fig. 11.2 Convergence of binding energy with (A) the number of k -points and (B) the energy shift. Binding energies are relative to value at the largest k -point grid and smallest energy shift.

ETHYNYLBENZENE ADSORPTION ON AU(111)

375

cell to generate the k -grid. Fewer k -points (by a factor of 3) are needed here compared with the previous analysis because the unit cell is now a 3 × 3 supercell in order to accommodate the adsorbate and reduce interactions between periodic images. The use of strictly localized orbitals is an advantage in this regard because multipole interactions between periodic images of the molecule tend to zero quite rapidly with increasing unit cell size. The interaction here is essentially zero. The likely adsorption motifs for ethynylbenzene on the gold surface are shown in Fig. 11.3. For the ethynylbenzene radical (Fig. 11.3A) the terminal C—H bond has been cleaved and the H atom removed. One might expect this to be the

(A)

(B)

(C)

Fig. 11.3 Potential configurations of surface-bound ethynylbenzene molecule: (A) ethnylbenzene radical with terminal H atom removed; (B) vinylidene; (C) flat configuration. (From Ref. 10.)

376

SIESTA: PROPERTIES AND APPLICATIONS

most promising candidate for SAM formation. Two additional configurations are also possible, one where a 1,2 hydrogen shift has occurred to give vinylidene (Fig. 11.3B) and a second where the C—C triple bond opens up to give the flat configuration (Fig. 11.3C). The latter two configurations are potential intermediates to the final state of the strongly bound radical by removal of the hydrogen atom. Reactions of metals with ethynylbenzene are known to proceed via a 1,2 hydrogen shift to form metal vinylidenes.24 The likely absorption sites are first identified by scanning the adsorbate across the surface with the adsorbate geometry held rigid. This involves a large number of single-point energy calculations and is therefore carried out at a low computational level. Once the potential energy surface has been mapped out roughly in this way, full geometry optimizations are carried out at a higher level using a four-layer slab, a 3 × 3 × 1 k-grid, and a 5-mRy energy shift. Both adsorbate and the first layer of Au surface atoms are optimized to 0.04 eV/Ang. Although this is a relatively weak force tolerance, binding energies do not change appreciably when the tolerance is improved to 0.01 eV/Ang. Final binding energies are calculated using optimum geometries from the previous step, calculated at a higher level (seven slab layers, 5 × 5 × 1 k-grid) and are converged to better than 0.05 eV. Further relaxation at the final step is not necessary, as it does not affect the binding energies or geometries appreciably. Table 11.1 gives the final binding energies and adsorption sites for the three motifs in Fig. 11.3. All three motifs form strong covalent bonds to the surface, in contrast to thiol molecules where the interaction is weaker if the terminal hydrogen is not removed. Mulliken overlap populations give an indication of the character of the bond, and for both the ethynylbenzene radical and vinylidene there is considerable overlap (greater than 0.12) between three of the surface Au atoms and the nearest C atom. Adsorption heights, optimum adsorption sites, and binding energies are also nearly the same for these two motifs, suggesting they both interact with the surface in a similar manner. The flat geometry is bound through two C atoms, each forming a single bond with a surface Au atom. Again, Mulliken overlap populations suggest a covalent bond. Overall energies in going from the gas-phase molecule in its relaxed geometry to the surface-bound species are exothermic for vinylidene and energy neutral for the flat geometry. The latter value is below the reliability of the calculations.

TABLE 11.1 Binding Energies and Adsorption Sites Energy (eV)

Vinylidene Flat geometry Ethynylbenzene a

Site

Binding

Overalla

fcc atop fcc

−2.45 −1.84 −2.99

−0.24 0.03 2.54

Overall energies are energies of the surface-bound species relative to the relaxed, isolated molecule and slab.

DIMERIZATION OF THIOLS ON AU(111)

377

This is despite a relatively large geometry change upon absorption. These two configurations are therefore likely intermediates to the formation of a SAM. Indeed, previous surface-enhanced Raman (SERS) experiments suggest the possibility that ethynylbenzene can adsorb onto a gold surface in the flat geometry.7 For ethynylbenzene, C—H bond cleavage is calculated for the gas-phase molecule and leads to a very endothermic overall energy upon adsorption. Reaction energies for formation of a SAM can be estimated from the calculations described above. C6 H5 C2 H + Aun → C6 H5 C2 —Aun + 12 H2 C6 H5 C2 H + Aun → [C6 H5 C2 —Aun ]− + H+ As well as C—H bond cleavage (first reaction), deprotonation (second reaction) also needs to be considered. Either of the two reactions can proceed directly or through the vinylidene or flat intermediates. Thus, calculating reaction energies for all three pathways gives a check on the reliability of the estimates since they should all give the same value. The first reaction is slightly endothermic, with an energy of about 0.5 eV; the range of values for the three pathways is 0.4 eV. Using a value for the proton solvation energy of25 11.4 eV gives a more endothermic reaction in the second case, with a value of 1.7 eV, but with more consistent values for the three pathways varying only by 0.1 eV. These calculations demonstrate that the ethynylbenzene moiety is indeed a promising alternative to thiols for formation of SAMs on Au(111). It is strongly bound to the surface, yet has a small diffusion barrier, less than 0.2 eV,9 between hollows, a site that will allow ordering of the molecules. This linkage scheme may be more oxidatively stable than sulfur, and preparation of monolayers with double-ended molecules should be possible without the problem of forming multilayers. The vinylidene intermediate is a candidate pathway, although from these calculations it is difficult to determine whether subsequent C—H bond cleavage or deprotonation will lead to the surface-bound radical. The latter is known to be the case in the synthesis of metal complexes of ethynylbenzene.24

11.2 DIMERIZATION OF THIOLS ON AU(111)

This example serves to illustrate the advantage of internal coordinates in surface adsorption studies. Geometries can be specified in the z -matrix format in SIESTA,26 where one atom is specified in Cartesian coordinates and the remaining molecule is specified in terms of bond lengths, bond angles, and torsion angles relative to this atom. The objective in this example is to map out the potential energy surface (PES) for adsorption of methanethiolate and benzenethiolate on the Au(111) surface in detail and to estimate the dissociation barrier of the dimer, dimethyldisulfide, on this surface.27 Previous computational studies have already reported the energetics28,29 of dimerization, but not the dynamics. They find that

378

SIESTA: PROPERTIES AND APPLICATIONS

dissociation of the surface-bound disulfide is favored, although agreement with available experimental data is limited. Even for these relatively simple molecules there are sufficient degrees of freedom that mapping out the complete PES is not trivial. Generally, PES maps have been limited to a small subset of degrees of freedom and have been created by scanning rigid molecules across the surface.30,31 Using internal coordinates to describe the molecule, it is possible to perform constrained optimizations at each point on the PES and hence map this surface more completely. Figure 11.4A shows the two thiolate molecules calculated here; note that the terminal hydrogen has been removed, and as a consequence, the sulfur is strongly chemisorbed to the surface. It has been pointed out in the literature that the term thiolate is misleading, as it implies an ionic bond to the surface, whereas it is actually closer to a covalently bound “thiyl.”31 Here we use the nomenclature prevalent in the literature. Mixed coordinates are used, with a z -matrix to specify

(A)

(B)

Fig. 11.4 (A) Adsorption of benzenethiolate (left) and methanethiolate onto the Au(111) surface; (B) path for the PES scan relative to surface Au atoms. Second and third layers of gold atoms are depicted by successively smaller spheres. (From Ref. 27.)

DIMERIZATION OF THIOLS ON AU(111)

379

the adsorbate and Cartesian coordinates for the Au slab. For each adsorbate the PES is mapped along the atop–bridge–atop path shown in Fig. 11.4B. At each step in the PES a constrained optimization is performed with the position of the sulfur atom fixed relative to the Au surface while its height above the surface is allowed to vary. The rest of the molecule and the surface layer of Au atoms are fully relaxed. Mapping the PES in this much detail using Cartesian coordinates is not practicable. It is also possible to decouple optimization of the bond lengths and bond angles with the z -matrix approach and to specify different force tolerances for each. This is particularly advantageous where the PES is very flat in one coordinate compared to the other. This is the case for many molecular adsorption problems, where the PES is quite flat with respect to tilting of the molecular axis relative to the surface. With Cartesian coordinates it can be difficult to find the minimum of such a surface. Provided that there is little or no coupling between coordinates, such as in cyclic molecules, internal coordinates can also lead to efficiency gains in the optimization process, as they lead to better preconditioning of the optimization algorithm. Table 11.2 compares geometry optimizations using z -matrix and Cartesian coordinates within the SIESTA code for some simple molecules.26 The conjugate gradient algorithm is used in all cases, with the optimization being performed to three levels of force convergence and with different numbers of degrees of freedom. In the z -matrix optimization for N atoms, an unconstrained optimization can be achieved with 3N − 6 variables whereas 3N − 3 are required for Cartesian coordinates. This is because in addition to fixing the coordinates of one atom (the reference atom), in the z -matrix approach it is also possible to fix the three rotational degrees of freedom for the entire molecule. The z -matrix approach performs better for both the simple water molecule and acyclic hexanedithiol molecule. In the latter case, fixing either three or six degrees of freedom reduces the number of CG steps for z -matrix optimization very considerably. Conversely, fixing degrees of freedom in Cartesian coordinates increases the number of steps. This is because the method used (there is no Hessian matrix) is not sensitive to the translational invariance. For the cyclic benzene molecule, Cartesian coordinates improve optimization because internal coordinates are coupled to each other. The same final geometries are obtained irrespective of the coordinates used and number of degrees of freedom in the optimization. The computational conditions used here are essentially the same as those used for the geometry optimizations of ethynylbenzene described above. The force ˚ for bond lengths and 0.0009 eV/deg for angles. tolerances are set to 0.04 eV/A Optimizations are performed using the conjugate gradient (CG) method. The forces are calculated by direct differentiation of the energy and are generated in the same section of code within SIESTA. The CG method is a variant of steepest descent but avoids its pitfall of successive steps being perpendicular to each other. Instead, they are constructed to be conjugate to the previous gradient and as far as possible from all previous steps. In this method it is only necessary to store information from the last CG step rather than building up the full

380

SIESTA: PROPERTIES AND APPLICATIONS

TABLE 11.2 Number of Conjugate Gradient Steps Required to Optimize the Geometry of Three Molecules in Z-Matrix and Cartesian Coordinates

Molecule Water

No. of Atoms 3

Coordinates Cartesian z -matrix

Benzene

12

Cartesian z -matrix

Hexanedithiol

22

Cartesian z -matrix

No. of CG Stepsa

No. of Variables

I

II

III

6 9 2 3 6 9 33 36 2 11 30 33 36 63 66 60 63 66

15 35 6 3 3 4 25 7 7 12 47 45 44 76 44 20 24 32

15 37 8 6 6 19 33 9 11 14 57 58 55 108 46 33 39 397

15 40 8 9 9 21 36 9 18 20 69 63 66 171 81 44 115

Source: Ref. 26. a Columns I, II, and III represent progressively stricter convergence criteria for lengths and angles: ˚ 0.0009 eV/deg); II, (0.02 eV/A, ˚ 0.0004 eV/deg); and III, (0.01 eV/A, ˚ namely, I, (0.04 eV/A, 0.0002 eV/deg). For the Cartesian coordinate optimizations the angle tolerance is to be ignored.

Hessian matrix for the entire optimization. SIESTA writes the previous step to disk at every CG step, allowing for easy restarts of optimizations. In principle, for M nuclei, the CG method should converge in less than 3M steps. However, due to numerical errors and the fact that the potential energy surface does not necessarily have the assumed quadratic form, more steps are often required. Both Fletcher–Reeves and Polak–Ribiere CG algorithms are implemented in SIESTA, although the latter is the default and preferred option, as it reportedly performs better where the minimum is not quadratic (details of the implementations are given elsewhere32 ). The modified Broyden33 method is also available in SIESTA. In principle, the modified Broyden method, a quasi-Newton–Raphson method, would be extremely efficient if the Jacobian were known and could easily be inverted. However, this is not the case in practice; rather, the Jacobian is updated over successive steps. It is also possible to find optimum geometries using molecular dynamics (MD), and SIESTA has implemented both simulated annealing, where the temperature of the MD simulation is gradually reduced to a target temperature, and quenching, where the velocity components of the nuclei are set to zero if they are opposite the corresponding force. Although relatively easy to implement, these MD-based schemes are often not competitive compared

DIMERIZATION OF THIOLS ON AU(111)

381

with the sophisticated line search–based algorithms mentioned previously. More recently, FIRE34 (scheme for fast inertial relaxation engine), a new MD-based optimization method has been reported that is competitive and can be used easily for systems containing millions of degrees of freedom. The PESs for the two monomers are shown in Fig. 11.5. It is interesting to note that with the current z -matrix constrained optimization, the hexagonal close-packed (hcp) and face-centered cubic (fcc) hollow adsorption sites are local maxima for both PESs. By contrast, a Cartesian coordinate–based scan will yield local minima at these two sites; previous studies find this result.28 Bili´c et al.31 also find the hollow sites to be saddle points for two-layer slab calculations, but minima for a four-layer calculation. There is also no barrier to diffusion at the bridge site, in contrast to some previous calculations where the PES is mapped by scanning a rigid molecule.28,30 The PES in this region is sensitive to the tilt angle of the molecule and also its orientation. The minimum on both sides of the bridge site is with the tail group tilted back over the bridge (i.e., as the bridge is traversed from one side to the other, the tail of the molecule swings around rather than remaining fixed in orientation). Adsorption energies of 1.85 and 1.43 eV are calculated for the optimum sites for methanethiolate and benzenethiolate, respectively. This is in good agreement with previous calculations.28,31 Optimum geometries for adsorption of the dimers are shown in Fig. 11.6; here the entire dimer and surface layer are relaxed. The SIESTA implementation of z -matrix coordinates is particularly convenient for this example. Multiple z matrix blocks can be defined, making it possible to have separate sets of internal 0.4 Methanethiolate Benzenethiolate

Relative energy (eV)

0.3 fcc hcp 0.2 atop

atop

0.1 bridge

0 –3

–2

–1

0

1

2

3

Coordinate relative to bridge-site (A)

Fig. 11.5 PES for methanethiolate and benzenethiolate along the atop–bridge–atop path on the Au(111) surface.

382

SIESTA: PROPERTIES AND APPLICATIONS

(A)

(B)

Fig. 11.6 Relaxed geometries for the two thiol dimers diphenyldisulfide (left) and dimethyldisulfide (right). Two different perspectives for each are shown in (A) and (B). (From Ref. 27.)

coordinates centered around each S atom. Adsorption occurs through the sulfur atoms, with each S atom in the dimer adsorbed near the atop site and displaced slightly toward the bridge site. The two S atoms are at similar heights above the surface. Previous studies using Cartesian coordinates find a different optimum geometry with S atoms nearer the bridge sites and at different heights above the surface.28,35,36 If the calculations here are repeated using Cartesian coordinates, this previously reported minimum appears to become a local minimum. This result further illustrates the robustness of internal coordinate descriptions for molecular adsorption. Both dimers are energetically unfavorable on the surface relative to two isolated monomers, by 0.41 and 0.62 eV for dimethyldisulfide and diphenyldisulfide, respectively. This is despite the fact that geometry optimizations find a local minimum and do not dissociate the dimer. This would suggest that there is an activation barrier to dissociation. To explore this point the PES for dissociation of dimethyldisulfide was mapped and is shown in Fig. 11.7A. One S atom is fixed at its optimum site while the other is scanned over the surface with a constrained optimization of the molecule performed at each point. The PES in Fig. 11.7A

DIMERIZATION OF THIOLS ON AU(111)

383

(A)

(B)

Fig. 11.7 (A) Spin-restricted PES for dissociation of dimethyldisulfide. Contours are in 0.05-eV intervals relative to energy minimum; position of surface atop and bridge sites are shown; one S atom is fixed at x = 1.05 and y = 2.27 A. (B) Spin-unrestricted PES along the dissociation path shown in (A). Units of spin are number of electrons. (From Ref. 27.)

384

SIESTA: PROPERTIES AND APPLICATIONS

was mapped using spin-restricted calculations for computational efficiency. This will give a reasonable idea of the PES shape and help identify the dissociation path. A spin-unrestricted scan along this path is then performed, with the results shown in Fig. 11.7B. As expected, DFT does not describe the region where the bond is dissociating very well; Fig. 11.7B shows that there is significant spin contamination around the saddle point. Away from this point, where the spin is zero the DFT energies are presumably quite reliable and allow us to estimate the height of the dissociation barrier to lie between 0.3 and 0.35 eV. The barrier for formation of the dimer from two surface-bound isolated monomers is estimated to lie between 0.71 and 0.76 eV.

11.3 MOLECULAR DYNAMICS OF NANOPARTICLES

So far, only ground-state properties at 0 K have been discussed. Molecular dynamics (MD) is the standard method of introducing the motion of the atomic nuclei into the problem and hence simulating various temperature-dependent properties, such as phonon spectra or melting behavior. The MD capabilities implemented in SIESTA will be illustrated in this section, where the melting behavior of the 20-atom gold cluster is examined.37,38 This particular size cluster is interesting because its optimum geometry is an ordered tetrahedral pyramid and is isolated by about 1 eV from its nearest-lying isomer, at least as determined in 0 K DFT calculations.39 – 42 There is experimental evidence that this structure is indeed the optimum.43 The standard Verlet algorithm44,45 is implemented in SIESTA to propagate the MD trajectory in time. A detailed description of this algorithm and other established components of MD are given in many textbooks, for example.46 Here the initial velocities are chosen from the Maxwell–Boltzmann distribution corresponding to a specified temperature. The total energy of the system is then kept constant throughout the trajectory: the microcanonical ensemble. Motion of the center of mass of the system is frozen out initially, although rotational motion currently is not. Nonperiodic systems such as clusters and molecules can pick up slight center-of-mass kinetic energy over a long trajectory due to numerical errors. Rotational motion is generally very small to start but can become appreciable over a long trajectory. Specifying a fine integration grid can help prevent these problems. In this example, thermal behavior in the canonical ensemble is calculated using the Nos´e –Hoover47,48 thermostat to maintain constant temperature. Briefly, in this method the system is connected to a heat bath that can transfer energy into or out of the system to attempt to maintain constant temperature. The heat bath is realized by coupling a fictitious degree of freedom to the system. The degree of coupling is determined by the Nos´e mass, which controls quite sensitively the dynamics of the simulation. Constant-pressure simulations are also implemented in SIESTA using the Parrinello–Rahman method49 – 51 where again an effective mass must be set in order to carefully thermostat the trajectory

MOLECULAR DYNAMICS OF NANOPARTICLES

385

correctly. Constant-temperature and constant-pressure methods can be combined into a single simulation. The critical parameter to optimize is the time step; this must be small enough to capture the atomic motion but not too small that only short total times can be sampled. The MD time step is traditionally determined according to the following rule of thumb: dt =

1 1 10 cωmax

(11.3)

where c is the speed of light and ωmax is the highest vibrational frequency. The vibrational frequencies are determined by calculating the force matrix in SIESTA and then finding the eigenvalues of this matrix using the VIBRA utility supplied with SIESTA. The energy-shift parameter needs to be set to a small value, typically better than 5 mRy, to avoid negative frequencies for the optimized structure. For the present 20-atom gold cluster the maximum frequency is 221 cm−1 , corresponding to a time step of 15 fs.52 The time step can be analyzed more rigorously by monitoring the conservation of total energy of the extended system (i.e., the 20-atom cluster plus the Nos´e thermostat). In the present example time steps up to about 3 fs conserve this total energy well during the MD trajectory, but significant variations occur above this value. The time step is set to 2.5 fs for all the simulations presented here. A large value of Nos´e mass results in low coupling to the reservoir and leads to large temperature fluctuations and relatively constant total energy; thermostating is ineffective in this case. A low value, on the other hand, restrains the temperature oscillations and can lead to poor equilibration and overdamping of the dynamics. One way to assess the appropriate Nos´e mass value is to observe temperature fluctuations over a number of MD steps and decide on a suitable level of temperature fluctuation. Alternatively, the statistical convergence of the trajectory can be examined where the average values of the temperature, or equivalently, kinetic energy of the ions and higher moments of these quantities are observed. While the average is a good indicator that the ensemble is converging to the correct temperature, higher moments are a more sensitive indicator of the temperature fluctuations and statistical quality.53 The average kinetic energy of the ions < KEion > and second moment < (KEion − < KEion >)2 > are shown in Fig. 11.8 for a thermostat temperature of 900 K over 45,000 MD steps (112.5 ps) and a Nos´e mass of 50 Ry· fs2 . The energy shift is set to 20 mRy, the real-space grid is cut off to 100 Ry, and the LDA exchange-correlation function is used. Both quantities converge reasonably well over the entire trajectory but require about 10,000 steps to equilibrate. The average kinetic energy and its second moment converge to values corresponding to temperatures of 900 and 821 K, respectively.53 The second moment gives slightly different ensemble average temperature because it is more sensitive to temperature fluctuations. Higher moments can be calculated to give an indication of statistical quality. These results indicate that the current number of MD steps is sufficient to provide a good statistical ensemble and that

386

SIESTA: PROPERTIES AND APPLICATIONS 2.212

0.2

2.212

2.211

0.1

2.21 0.05

2.21 2.209

0

2.209 2.208

300 cm−1 ), the contribution of these motions to the overall partition function is negligible at room temperature (i.e., Qvib,i ≈ 1), and thus the error incurred in treating these modes as harmonic oscillators is not significant. However, for the low-frequency torsional modes, these errors can be significant and a more rigorous treatment is often necessary; this is especially the case for the reactions of relevance to free-radical polymerization.8,11,12b,c,l Ideally, one should solve the Schr¨odinger equation for the full multidimensional potential energy surface representing all active modes of a molecule, and use the resulting energy levels in Eq. (13.7) to obtain the partition functions; however, this is impractical for larger molecules. Instead, the approach that is usually adopted is to apply the harmonic oscillator approximation to all 3N − 6 internal modes of a molecule (as in the standard formulas above), but then multiply the resulting vibrational partition function by a correction factor for each internal hindered rotor partition function. This factor is calculated as the ratio of the 1D-HR partition function to the corresponding “pure” vibrational partition function, as calculated from the second derivative of the rotational potential at the minimum-energy structure. Using approximations such as this, the 1D-HR model has been shown to provide reasonable results in situations where testing against more sophisticated treatments is possible.61 To obtain the 1D-HR partition function for any given low-frequency torsional mode, we first need to compile the full rotational potential V (θ) for the mode in question; studies have shown that a resolution of 60◦ is sufficient for accurate results.62 The potential should be compiled as a relaxed scan (i.e., at each dihedral angle, the dihedral angle is frozen but the rest of the molecule is fully optimized) and, as in ordinary geometry optimizations, low levels of theory, such as B3LYP/6-31G(d) are usually sufficiently accurate. Having obtained the potential, this is then used to solve the one-dimensional Schr¨odinger equation for a rigid rotor: −

2 d 2 + V (θ) = εi 2Ir dθ2

(13.33)

In this equation is the wavefunction, ε is the energy, Ir is the reduced moment of inertia, and V (θ) is the rotational potential, which for this purpose should be supplied at a high resolution. To this end, the 60◦ resolution potential is fitted with a Fourier series of up to 18 terms and then reevaluated at a resolution of

CALCULATION OF KINETICS AND THERMODYNAMICS

465

1.2◦ . The reduced moment of inertia (Ir ) is assumed to be independent of θ and is calculated from the optimized geometry using the equation for I (2,3) , as defined by East and Radom.63 There is no analytical solution to this Schr¨odinger equation; however, it can be solved numerically for the eigenvalues, ε, by converting it into the Hill differential equation. Having obtained the energy levels, these are then summed in order to obtain the partition function via Eq. (13.7), in the usual manner. A program called T-CHEM for performing these calculations is freely available at http://rsc.anu.edu.au/∼cylin/scripts.html. Finally, in addition to the approach described above, there are a number of lower-cost methods available for calculating hindered-rotor partition functions; some of which (such as the Pitzer tables64 ) are applicable only for potentials that can be described by a pure cosine function; others are approximations designed for use with any type of partition function. It is beyond the scope of this chapter to detail these here, but a description and evaluation of these methods is found in the literature.62,65,66 13.4.5 Solvent Effects

The methodology described thus far is designed to reproduce chemically accurate values of the rate and equilibrium constants for gas-phase systems, and the vast majority of computational studies of radical polymerization in the literature have indeed been performed in the gas phase. In many situations, the effects of solvents on radical reactions are relatively minor and the gas-phase calculations are indicative of solution-phase behavior. For example, gas-phase calculations of the propagation rate coefficients of vinyl chloride and acrylonitrile were able to reproduce the experimental (solution-phase) rate coefficients for these monomers to within a factor of 2, and solvation effects (as calculated using simple continuum models) were minor.8 Gas-phase studies of the equilibrium constants in certain RAFT polymerizations have also reproduced experimental data to within chemical accuracy, for both small model reactions16a and polymeric systems.11e,18 Nonetheless, there are free-radical polymerizations, such as those of monomers, that are capable of undergoing hydrogen bonding or other specific interactions with the solvent, where strong solvent effects have been well documented experimentally.6c,67 Not unexpectedly in such cases, there can be very large differences between the gas-phase rate coefficients calculated and the corresponding solution-phase values. For example, in a recent computational study68 of the propagation rate coefficient of ethyl-α-hydroxymethacrylate (EHMA) the gas-phase rate coefficient calculated differed from the corresponding solution-phase experimental values69 by more than five orders of magnitude. In such cases, the correct treatment of solvent effects is therefore crucial. Unfortunately, the development of cost-effective methods for treating the solvent in chemical reactions is an ongoing area of research and there have been relatively few benchmarking studies for the specific case of radical polymerization. Nonetheless, it is worth making a few general comments on the main strategies that are available for modeling solvation effects.

466

FREE-RADICAL POLYMERIZATION

The simplest and most computationally efficient methods are continuum models, in which each solute molecule is embedded in a cavity surrounded by a dielectric continuum of permittivity ε.70 Most models, of which the ab initio conductor-like solvation model (COSMO)71 and the polarizable continuum model (PCM)72 are prominent examples, also include terms for the nonelectrostatic contributions of the solvent, such as dispersion, repulsion, and cavitation. Some of the more recent models also incorporate more sophisticated treatments of the solvent itself. For example, COSMO-RS73 is a variant of the COSMO model that describes the interactions in a fluid as local interaction of molecular surfaces, the interaction energies being quantified by the values of the two screening charge densities that form a molecule contact. SM674 (Solvent Model 6) is based on a generalized Born approach, which uses a long-range dielectric continuum to treat bulk electrostatics effects combined with short-range atomic surface tensions to account for first-shell solvent effects. Continuum solvation models can be invoked in most of the leading computational chemistry software packages, and the reader is referred to their respective manuals for specific implementation details. However, the following general points should be noted. First, continuum solvation models rely upon empirically optimized parameters, and it is important to choose radii and levels of theory that are optimized for the specific method in use. As always, the choice of solvation method for any particular system should be determined through assessment studies. Second, the specification of a particular solvent depends on several parameters in addition to the dielectric constant, including the volume, density, and solvent radius. If using a nondefault solvent model, care must be taken to set all of these parameters appropriately. Third, since the levels of theory used for solvation energy calculations, typically small basis set HF or B3LYP calculations, are not usually sufficiently accurate for gas-phase energetics, the total free energies in solution should be calculated via a simple thermodynamic cycle as follows: Gsoln = Ggas + Gsolv + G1atm→1M

(13.34)

In this equation, Ggas is the gas-phase free energy of reaction, which is calculated separately at a high level of theory, and Gsolv , the free energy of solvation, should not be confused with the total free energy of reaction in solution. In some software packages, additional keyword(s) are required for the solvation free energy (the difference of the gas- and solution-phase free energies at the same level of theory) to be calculated. In GAUSSIAN, the SCFVAC keyword is used for this purpose. The final term in Eq. (13.34), G1atm→1M , is required for converting from the gas-phase standard state for an ideal gas (typically, 1 atm) to 1 M in solution, and is given by G1atm→1M = nRT ln(V ) = nRT ln

RT P

(13.35)

CALCULATION OF KINETICS AND THERMODYNAMICS

467

where n is the number of moles of gas change from reactants to products. As an example, at room temperature (298.15 K) and standard pressure (1 atm), this term has a value of 7.9 kJ mol−1 . Finally, having made the correction for the change in state, G1atm→1M , the standard unit of concentration in the rate and equilibrium constant expressions [Eqs. (13.5) and (13.6)] becomes c◦ = 1 mol L−1 , rather than its value for an ideal gas (e.g., 0.0408 mol L−1 at room temperature and standard pressure). Continuum models are designed to reproduce bulk or macroscopic behavior and can fare extremely well in certain applications, not least the prediction of solvation energies of stable organic molecules.74,75 Continuum models have been applied to radical polymerization processes with mixed results. In an early study, Thickett and Gilbert12g used a simple PCM model to study the effect of solvent on acrylic acid propagation, confirming experimental observations76 that aqueous solvation substantially lowers the reaction barrier. However, it was noted in this work that the levels of theory used in the gas- and solution-phase calculations were not accurate enough for quantitative predictions of the reaction rate. As noted above, in our study of vinyl chloride and acrylonitrile propagation, we found that continuum models slightly improved the agreement between theory and experiment; however, in those systems the solvation effects were very small and well within the uncertainty of the experimental and theoretical data.8 More encouragingly, we have found that the combination of high-level ab initio calculations with continuum solvation models can reproduce one- and two-electron redox potentials of a wide range of open- and closed-shell systems,77 including systems directly relevant to atom transfer radical polymerization.9e,f In such systems, the solvation effects are very large, due to the presence of charged species. Nonetheless, in other systems, the continuum solvation models have failed to redress the deviations of theory and experiment. For the problematic EHMA system described above, the use of PCM solvation energies actually increased the deviation between theory and experiment from five orders of magnitude to as much as eight orders of magnitude, depending on the solvent.68 This is presumably because continuum models do not take into account the hydrogen-bonding interactions, expected to be important in this system. Indeed, similar failures have been noted in other (non-polymer-related) systems where hydrogen bonding is important.78 Moreover, even where explicit solute–solvent interactions can be neglected, the use of continuum models to study polymerization kinetics is likely to be problematic. This is because the results obtained using continuum models are highly sensitive to the choice of cavities, and these are typically parameterized to reproduce the free energies of solvation for a set of small stable organic molecules. As a result, the choice of appropriate cavities for weakly bound species such as transition structures can be difficult.75 For problematic systems where strong explicit solute–solvent interactions are important, the inclusion of explicit solvent molecules in the ab initio calculation is necessary. Ideally, one should include many explicit solvent molecules in the calculation and try to reproduce bulk behavior via molecular dynamics or Monte Carlo simulations, combined with the imposition of periodic boundary

468

FREE-RADICAL POLYMERIZATION

conditions.79 However, such calculations are hampered by problems such as the lack of potentials that can adequately describe both cluster and bulk behavior and the rapid increase in the conformational possibilities as the number of individual components increases. As a result, such approaches are not currently practical for polymerization systems. A less computationally demanding approach, known as a cluster-continuum model,80 is to include a small number of explicit solvent molecules in the calculation (effectively treating them as additional reactants), while modeling the remaining solvation effects via a continuum model. However, choosing an appropriate number of explicit solvent molecules and their location, without testing all possibilities exhaustively, is always problematic, particularly for larger molecules. Further work is required to design practical guidelines for applying these methods to polymerization systems. In the meantime, it is worth noting that very promising results have recently been obtained without the need for explicit solvent molecules using COSMO-RS solvation energies in conjunction with the standard high-level gas-phase methodology.8b To date, this approach has been evaluated only for the propagation kinetics of methyl acrylate and vinyl acetate, two systems where simple continuum models fail.8b If its excellent performance can be maintained for other problematic systems, this methodology will further expand the scope of computational radical polymerization. 13.5 CONCLUSIONS

Computational quantum chemistry has much to offer the experimental polymer chemist. At the microscopic level, it can be used to clarify the reaction mechanism and explain the effects of substituents on the individual reactions, thereby facilitating the rational design of optimal control agents. At the macroscopic level, it can be used to build accurate kinetic models for simulating the outcome of polymerization processes as a function of the reaction conditions, for use in process optimization and control. However, the success of computational chemistry is crucially dependent on choosing realistic model reactions and applying accurate computational procedures; simultaneously satisfying these competing demands has, until recently, been difficult. Nonetheless, in recent years the development of new cost-effective computational methods, along with concurrent increases in computing power, has at last brought chemical accuracy within reach. Although the treatment of solvent effects remains problematic, even here, computational quantum chemistry has now proven itself a reliable and useful tool and an important complement to experiment. REFERENCES 1. For more information on the chemistry and kinetics of free-radical polymerization, see, e.g., (a) Matyjaszewski, K.; Davis, T. P. Handbook of Radical Polymerization, Wiley, Hoboken, NJ, 2002. (b) Moad, G.; Solomon, D. H. The Chemistry of FreeRadical Polymerization, Pergamon Press, Oxford, UK, 1995. (c) Odian, G. Principles of Polymerization, Wiley-Interscience, New York, 1991.

REFERENCES

2. 3. 4. 5. 6.

7.

8. 9.

10.

11.

12.

469

Kamigaito, M.; Satoh, K. Macromolecules 2008, 41 , 269–276. Moad, G.; Rizzardo, E.; Thang, S. H. Aust. J. Chem. 2005, 58 , 379–410. Matyjaszewski, K. Prog. Polym. Sci . 2005, 30 , 858–875. Hawker, C. J.; Bosman, A. W.; Harth, E. Chem. Rev . 2001, 101 , 3661–3688. (a) Coote, M. L.; Zammit, M. D.; Davis, T. P. Trends Polym. Sci . 1996, 4 , 189–196. (b) van Herk, A. M. Macromol. Theory Simul . 2000, 9 , 433–441. (c) Beuermann, S.; Buback, M. Prog. Polym. Sci . 2002, 27 , 191–254. (d) Barner-Kowollik, C.; Buback, M.; Egorov, M.; Fukuda, T.; Goto, A.; Olaj, O. F.; Russell, G. T.; Vana, P.; Yamada, B.; Zetterlund, P. B. Prog. Polym. Sci . 2005, 30 , 605–643. Barner-Kowollik, C.; Buback, M.; Charleux, B.; Coote, M. L.; Drache, M.; Fukuda, T.; Goto, A.; Klumperman, B.; Lowe, A. B.; McLeary, J. B.; Moad, G.; Monteiro, M. J.; Sanderson, R. D.; Tonge, M. P.; Vana, P. J. Polym. Sci. A 2006, 44 , 5809–5831. See, e.g., (a) Izgorodina, E. I.; Coote, M. L. Chem. Phys. 2006, 324 , 96–110. (b) Lin, C. Y.; Izgorodina, E. I.; Coote, M. L. Macromolecules 2010, 43 , 533–560. (a) Gillies, M. B.; Matyjaszewski, K.; Norrby, P.-O.; Pintauer, T.; Poli, R.; Richard, P. Macromolecules 2003, 36 , 8551–8559. (b) Singleton, D. A.; Nowlan, D. T., III; Jahed, N.; Matyjaszewski, K. Macromolecules 2003, 36 , 8609–8616. (c) Matyjaszewski, K.; Poli, R. Macromolecules 2005, 38 , 8093–8100. (d) Lin, C. Y.; Coote, M. L.; Petit, A.; Richard, P.; Poli, R.; Matyjaszewski, K. Macromolecules 2007, 40 , 5985–5994. (e) Tang, W.; Kwak, Y.; Braunecker, W.; Tsarevsky, N. V.; Coote, M. L.; Matyjaszewski, K. J. Am. Chem. Soc. 2008, 130 , 10702–10713. (f) Lin, C. Y.; Coote, M. L.; Gennaro, A.; Matyjaszewski, K. J. Am. Chem. Soc., 2008 130 , 12762–12774. (a) Marsal, P.; Roche, M.; Tordo, P.; de Sainte Claire, P. J. Phys. Chem. A 1999, 103 , 2899–2905. (b) Gigmes, D.; Gaudel-Siri, A.; Marque, S. R. A.; Bertin, D.; Tordo, P.; Astolfi, P.; Greci, L.; Rizzoli, C. Helv. Chim. Acta 2006, 89 , 2312–2326. (c) Kaim, A.; Megiel, E. J. Polym. Sci. A 2005, 44 , 914–927. (d) Kaim, A. J. Polym. Sci. A 2006, 45 , 232–241. (e) Megiel, E.; Kaim, A. J. Polym. Sci. A 2008, 46 , 1165–1177. (a) Farmer, S. C.; Patten, T. E. J. Polym. Sci. A 2002, A40 , 555–563. (b) Coote, M. L.; Radom, L. J. Am. Chem. Soc. 2003, 125 , 1490–1491. (c) Coote, M. L.; Radom, L. Macromolecules 2004, 37 , 590–596. (d) Coote, M. L. Macromolecules 2004, 37 , 5023–5031. (e) Feldermann, A.; Coote, M. L.; Stenzel, M. H.; Davis, T. P.; Barner-Kowollik, C. J. Am. Chem. Soc. 2004, 126 , 15915–15923. (f) Coote, M. L.; Henry, D. J. Macromolecules 2005, 38 , 1415–1433. (g) Coote, M. L. J. Phys. Chem. A 2005, 109 , 1230–1239. (h) Coote, M. L.; Krenske, E. H.; Izgorodina, E. I. Macromol. Rapid Commun. 2006, 27 , 473–497. (i) Izgorodina, E. I.; Coote, M. L. Macromol. Theory Simul . 2006, 15 , 394–403. (j) Lin, C. Y.; Coote, M. L. Aust. J Chem. 2009, 62 , 1479–1483. (a) Leroy, G.; Dewispelaere, J.-P.; Benkadour, H.; Wilante, C. Macromol. Theory Simul . 1996, 5 , 269–289. (b) Heuts, J. P. A.; Gilbert, R. G.; Radom, L. J. Phys. Chem. 1996, 100 , 18997–19006. (c) Huang, D. M.; Monteiro, M. J.; Gilbert, R. G. Macromolecules 1998, 31 , 5175–5187. (d) Toh, J. S.-S.; Huang, D. M.; Lovell, P. A.; Gilbert, R. G. Polymer 2001, 42 , 1915–1920. (e) Filley, J.; McKinnon, J. T.; Wu, D. T.; Ko, G. H. Macromolecules 2002, 35 , 3731–3738. (f) Zhan, C.G.; Dixon, D. A. J. Phys. Chem. A 2002, 106 , 10311–10325. (g) Thickett, S. C.; Gilbert, R. G. Polymer 2004, 45 , 6993–6999. (h) Van Cauter, K.; Hemelsoet, K.; Van Speybroeck, V.; Reyniers, M. F.; Waroquier, M. Int. J. Quantum Chem. 2004,

470

13.

14.

15.

16.

17.

18. 19. 20. 21. 22. 23.

FREE-RADICAL POLYMERIZATION

102 , 454–460. (i) Salman, S.; Albayrak, A. Z.; Avci, D.; Aviyente, V. J. Polym. Sci. A 2005, 43 , 2574–2583. (j) G¨unaydin, H.; Salman, S.; T¨uz¨un, N. S.; Avci, D.; Aviyente, V. Int. J. Quantum Chem. 2005, 103 , 176–189. (k) Van Cauter, K.; Van Speybroeck, V.; Vansteenkiste, P.; Reyniers, M.-F.; Waroquier, M. ChemPhysChem 2006, 7 , 131–140. (l) Degirmenci, I.; Avci, D.; Aviyente, V.; Van Cauter, K.; Van Speybroeck, V.; Waroquier, M. Macromolecules 2007, 40 , 9590–9602. (a) Purmova, J.; Pauwels, K. F. D.; van Zoelen, W.; Vorenkamp, E. J.; Schouten, A. J.; Coote, M. L. Macromolecules 2005, 38, 6352–6366. (b) Van Cauter, K.; Van Den Bossche, B. j.; Van Speybroeck, V.; Waroquier, M. Macromolecules 2007, 40 , 1321–1331. (c) Purmov´a, J.; Pauwels, K. F. D; Agostini, M.; Bruinsma, M.; Vorenkamp, E. J.; Schouten, A. J.; Coote, M. L. Macromolecules 2008, 41 , 5527–5539. (a) Heuts, J. P. A.; Sudarko; Gilbert, R. G. Macromol. Symp. 1996, 111 , 147–157. (b) Heuts, J. P. A.; Gilbert, R. G.; Maxwell, I. A. Macromolecules 1997, 30 , 726–736. (c) Coote, M. L.; Davis, T. P.; Radom, L. Theochem 1999, 461–462 , 91–96. (d) Coote, M. L.; Davis, T. P.; Radom, L. Macromolecules 1999, 32 , 5270–5276. (e) Coote, M. L.; Davis, T. P.; Radom, L. Macromolecules 1999, 32 , 2935–2940. (f) Cieplak, P.; Kaim, A. J. Polym. Sci. A 2004, 42 , 1557–1565. Barner-Kowollik, C. W.; Coote, M. L.; Davis, T. P.; Stenzel, M. H.; Theis, A. Polymerization agent, International Patent WO2006122344 A1, 2006. http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=WO2006122344&F=0. (a) Ah Toy, A.; Chaffey-Millar, H.; Davis, T. P.; Stenzel, M. H.; Izgorodina, E. I.; Coote, M. L.; Barner-Kowollik, C. Chem. Commun. 2006, 835–837. (b) ChaffeyMillar, H.; Izgorodina, E. I.; Barner-Kowollik, C.; Coote, M. L. J. Chem. Theory Comput. 2006, 2 , 1632–1645. (a) Hodgson, J. L.; Coote, M. L. Macromolecules 2005, 38 , 8902. (b) Coote, M. L.; Hodgson, J. L.; Krenske, E. H.; Namazian, M.; Wild, S. B. Aust. J. Chem. 2007, 60 , 744–753. Coote, M. L.; Izgorodina, E. I.; Krenske, E. H.; Busch, M.; Barner-Kowollik, C. Macromol. Rapid Commun. 2006, 27 , 1015–1022. McLeary, J. B.; Calitz, F. M.; McKenzie, J. M.; Tonge, M. P.; Sanderson, R. D.; Klumperman, B. Macromolecules 2004, 37 , 2382–2394. Coote, M. L. Macromol. Theory Simul . 2009, 18 , 388–400. See, e.g., Heuts, J. P. A.; Russell, G. T. Eur. Polym. J . 2006, 42 , 3–20. Coote, M. L.; Davis, T. P. Prog. Polym. Sci . 1999, 24 , 1217–1251. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A., Jr.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N. P.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M. N.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B. A.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith,

REFERENCES

24.

25.

26.

27.

28. 29.

471

T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A. Gaussian 03, Revision B.03 , Gaussian Inc., Pittsburgh, PA, 2003. Werner, H.-J.; Knowles, P. J.; Lindh, R.; Manby, F. R.; Sch¨utz, M.; Celani, P.; Korona, T.; Rauhut, G.; Amos, R. D.; Bernhardsson, A.; Berning, A.; Cooper, D. L.; Deegan, M. J. O.; Dobbyn, A. J.; Eckert, F.; Hampel, C.; Hetzer, G.; Lloyd, A. W.; McNicholas, S. J.; Meyer, W.; Mura, M. E.; Nicklass, A.; Palmieri, P.; Pitzer, R.; Schumann, U.; Stoll, H.; Stone, A. J.; Tarroni, R.; Thorsteinsson, T. MOLPRO, Version 2006.1 , a package of ab initio programs, http://www.molpro.net. Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S. J.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J. Comput. Chem. 1993, 14 , 1347. Shao, Y.; Molnar, L. F.; Jung, Y.; Kussmann, J.; Ochsenfeld, C.; Brown, S. T.; Gilbert, A. T. B.; Slipchenko, L. V.; Levchenko, S. V.; O’Neill, D. P.; DiStasio, R. A.; Lochan, R. C.; Wang, T.; Beran, G. J. O.; Besley, N. A.; Herbert, J. M.; Lin, C. Y.; Van Voorhis, T.; Chien, S. H.; Sodt, A.; Steele, R. P.; Rassolov, V. A.; Maslen, P. E.; Korambath, P. P.; Adamson, R. D.; Austin, B.; Baker, J.; Byrd, E. F. C.; Dachsel, H.; Doerksen, R. J.; Dreuw, A.; Dunietz, B. D.; Dutoi, A. D.; Furlani, T. R.; Gwaltney, S. R.; Heyden, A.; Hirata, S.; Hsu, C. P.; Kedziora, G.; Khalliulin, R. Z.; Klunzinger, P.; Lee, A. M.; Lee, M. S.; Liang, W.; Lotan, I.; Nair, N.; Peters, B.; Proynov, E. I.; Pieniazek, P. A.; Rhee, Y. M.; Ritchie, J.; Rosta, E.; Sherrill, C. D.; Simmonett, A. C.; Subotnik, J. E.; Woodcock, H. L.; Zhang, W.; Bell, A. T.; Chakraborty, A. K.; Chipman, D. M.; Keil, F. J.; Warshel, A.; Hehre, W. J.; Schaefer, H. F.; Kong, J.; Krylov, A. I.; Gill, P. M. W.; Head-Gordon, M. Phys. Chem. Chem. Phys. 2006, 8 , 3172. Bylaska, E. J.; de Jong, W. A.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Valiev, M.; Wang, D.; Apra, E.; Windus, T. L.; Hammond, J.; Nichols, P.; Hirata, S.; Hackler, M. T.; Zhao, Y.; Fan, P.-D.; Harrison, R. J.; Dupuis, M.; Smith, D. M. A.; Nieplocha, J.; Tipparaju, V.; Krishnan, M.; Wu, Q.; Voorhis, T. V.; Auer, A. A.; Nooijen, M.; Brown, E.; Cisneros, G.; Fann, G. I.; Fruchtl, H.; Garza, J.; Hirao, K.; Kendall, R.; Nichols, J. A.; Tsemekhman, K.; Wolinski, K.; Anchell, J.; Bernholdt, D.; Borowski, P.; Clark, T.; Clerc, D.; Dachsel, H.; Deegan, M.; Dyall, K.; Elwood, D.; Glendening, E.; Gutowski, M.; Hess, A.; Jaffe, J.; Johnson, B.; Ju, J.; Kobayashi, R.; Kutteh, R.; Lin, Z.; Littlefield, R.; Long, X.; Meng, B.; Nakajima, T.; Niu, S.; Pollack, L.; Rosing, M.; Sandrone, G.; Stave, M.; Taylor, H.; Thomas, G.; von Lenthe, J.; Wong, A.; Zhang, Z. NWChem: A Computational Chemistry Package for Parallel Computers, Version 5.1 , Pacific Northwest National Laboratory, Richland, WA, 2007. Velde, G. T.; Bickelhaupt, F. M.; Baerends, E. J.; Guerra, C. F.; Van Gisbergen, S. J. A.; Snijders, J. G.; Ziegler, T. J. Comput. Chem. 2001, 22 , 931. Stanton, J. F.; Gauss, J.; Perera, S. A.; Watts, J. D.; Yau, A. D.; Nooijen, M.; Oliphant, N.; Szalay, P. G.; Lauderdale, W. J.; Gwaltney, S. R.; Beck, S.; Balkov´a, A.; Bernholdt, D. E.; Baeck, K. K.; Rozyczko, P.; Sekino, H.; Huber, C.; Pittner, J.; Cencek, W.; Taylor, D.; Bartlett, R. J. ACES II is a program product of the Quantum Theory Project, University of Florida. Integral packages included are VMOL (J. Alml¨of and P. R. Taylor); VPROPS (P. Taylor); ABA-CUS (T. Helgaker, H. J. Aa. Jensen, P. Jørgensen, J. Olsen, and P. R. Taylor); HONDO/GAMESS (M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. J. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, J. A. Montgomery).

472

FREE-RADICAL POLYMERIZATION

30. (a) Choi, C. C.; Kertesz, M.; Karpfen, A. Chem. Phys. Lett. 1997, 276 , 266. (b) Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Pople, J. A. J. Chem. Phys. 2000, 112 , 7374. (c) Woodcock, H. L.; Schaefer, H. F., III; Schreiner, P. R. J. Phys. Chem. A 2002, 106 , 11923. (d) Izgorodina, E. I.; Coote, M. L.; Radom, L. J. Phys. Chem. A 2005, 109 , 7558. (e) Check C. E.; Gilbert, T. M. J. Org. Chem. 2005, 70 , 9828. (f) Izgorodina, E. I.; Coote, M. L. J. Phys. Chem. A 2006, 110 , 2486. (g) Grimme, S. Angew. Chem. Int. Ed . 2006, 45 , 4460. (h) Schreiner, P. R.; Fokin, A. A.; Pascal, R. A., Jr.; de Meijere, A. Org. Lett. 2006, 8 , 3635. (i) Wodrich, M. D.; Corminbæf, C.; von Ragu´e Schleyer, P. Org. Lett. 2006, 8 , 3631. (j) Wodrich, M. D.; Corminbæf, C.; Schreiner, P. R.; Fokin, A. A.; von Ragu´e Schleyer, P. Org. Lett. 2007, 9 , 1851. (k) Grimme, S.; Steinmetz, M.; Korth, M. J. Chem. Theory Comput . 2007, 3 , 42. (l) Schreiner, P. R. Angew. Chem. Int. Ed . 2007, 46 , 4217. (m) Izgorodina, E. I.; Brittain, D. R. B.; Hodgson, J. L.; Krenske, E. H.; Lin, C. Y.; Namazian, M.; Coote, M. L. J. Phys. Chem. A 2007, 111 , 10754. (n) Brittain, D. R. B; Lin, C. Y.; Gilbert, A. T. B.; Izgorodina, E. I.; Gill, P. M. W.; Coote, M. L. Phys. Chem. Chem. Phys. 2009, 11 , 1138–1142. 31. Buback, M.; Hippler, H.; Schweer, J.; Vogele, H.-P. Makromol. Chem. Rapid Commun. 1986, 7 , 261–265. 32. (a) Kajiwara, A.; Kamachi, M. Macromol. Chem. Phys. 2000, 201 , 2165–2169. (b) Burnett, G. M.; Wright, W. W. Proc. R. Soc. (Lond .) A 1954, 211 , 41. 33. For a review of the early work in this field, see Fischer, H.; Radom, L. Angew. Chem. Int. Ed . 2001, 40 , 1340–1371. 34. For more recent studies, see, e.g., (a) Henry, D. J.; Parkinson, C. J.; Mayer, P. M.; Radom, L. J. Phys. Chem. A 2001, 105 , 6750. (b) Coote, M. L.; Wood, G. P. F.; Radom, L. J. Phys. Chem. A 2002, 106 , 12124–12138. (c) Coote, M. L. J. Phys. Chem. A 2004, 108 , 3865–3872. (d) G´omez-Balderas, R.; Coote, M. L.; Henry, D. J.; Radom, L. J. Phys. Chem. A 2004, 108 , 2874–2883. (e) Lin, C. Y.; Hodgson, J. L.; Namazian, M.; Coote, M. L. J. Phys. Chem. A 2009, 113 , 3690–3697. 35. Malick, D. K.; Petersson, G. A.; Montgomery, J. A. J. Chem. Phys. 1998, 108 , 5704. 36. Scott, A. P.; Radom, L. J. Phys. Chem. 1996, 100 , 16502. 37. (a) Pople, J. A.; Head-Gordon, M.; Fox, D. J.; Raghavachari, K.; Curtiss, L. A. J. Chem. Phys. 1989, 90 , 5622. (b) Curtiss, L. A.; Jones, C.; Trucks, G. W.; Raghavachari, K.; Pople, J. A. J. Chem. Phys. 1990, 93 , 2537. (c) Curtiss, L. A.; Raghavachari, K.; Trucks, G. W.; Pople, J. A. J. Chem. Phys. 1991, 94 , 7221. (d) Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Rassolov, V.; Pople, J. A. J. Chem. Phys. 1998, 109 , 7764. (e) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K.; Pople, J. A. J. Chem. Phys. 2001, 114 , 108. (f) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. J. Chem. Phys. 2007, 126 , 084108. 38. Henry, D. J.; Sullivan, M. B.; Radom, L. J. Chem. Phys. 2003, 118 , 4849. 39. Montgomery, J. A.; Frisch, M. J.; Ochterski, J. W.; Petersson, G. A. J. Chem. Phys. 1999, 110 , 2822. 40. Martin, J. M. L.; Parthiban, S. In Quantum Mechanical Prediction of Thermochemical Data, Cioslowski, J., Ed. Kluwer-Academic, Dordrecht, The Netherlands, 2001, pp. 31–65. 41. (a) Vreven, T.; Morokuma, K. J. Chem. Phys. 1999, 111 , 8799–8803. (b) Vreven, T.; Morokuma, K. J. Comput. Chem. 2000, 21 , 1419–1432. 42. Lipton, M.; Still, W. C. J. Comput. Chem. 1988, 9 , 343–355.

REFERENCES

473

43. Izgorodina, E. I.; Lin, C. Y.; Coote, M. L. Phys. Chem. Chem. Phys. 2007, 9 , 2507–2516. 44. (a) Kirkpatrick, S.; Gelatt, C. D., Jr.; Vecchi, M. P. Science 1983, 220 , 671. (b) Wilson, S. R.; Cui, W.; Moskowitz, J. W.; Schmidt, K. E. Tetrahedron Lett. 1988, 4343. 45. (a) Gibson, K. D.; Scheraga, H. A. J. Comput. Chem. 1987, 8 , 826. (b) Pincus, M. R.; Klausner, R. D.; Scheraga, H. A. Proc. Natl. Acad. Sci. USA 1982, 79 , 5107. (c) Hingerty, B. E.; Figueroa, S.; Hayden, T. L.; Broyde, S. Biopolymers 1989, 28 , 1195. 46. Malick, D. K.; Petersson, G. A.; Montgomery, J. A. J. Chem. Phys. 1998, 108 , 5704–5713. 47. (a) Knyazev, V. D.; Slagle, I. R. J. Phys. Chem. 1996, 100 , 16899–16911. (b) Knyazev, V. D.; Bencsura, A.; Stoliarov, S. I.; Slagle, I. R. J. Phys. Chem. 1996, 100 , 11346–11354. 48. Schwartz, M.; Marshall, P.; Berry, R. J.; Ehlers, C. J.; Petersson, G. A. J. Phys. Chem. A 1998, 102 , 10074–10081. 49. Coote, M. L.; Collins, M. A.; Radom, L. Mol. Phys. 2003, 101 , 1329–1338. 50. Coote, M. L. In Encyclopaedia of Polymer Science and Technology, 3rd ed., Vol. 9, Kroschwitz, J. I., Ed., Wiley, Hoboken, NJ, 2004, pp. 319–371. 51. See, e.g., (a) Benson, S. W. Thermochemical Kinetics, Wiley, New York, 1976. (b) McQuarrie, D. A. Statistical Mechanics, Harper & Row, New York, 1976. (c) Gilbert, R. G.; Smith, S. C. Theory of Unimolecular and Recombination Reactions, Blackwell Scientific, Oxford, UK, 1990. (d) Steinfeld, J. I.; Francisco, J. S.; Hase, W. L. Chemical Kinetics and Dynamics, 2nd ed., Prentice Hall, Englewood Cliffs, NJ, 1999. (e) Atkins, P. W. Physical Chemistry, 6th ed., W.H. Freeman, San Francisco, 2000. 52. Eyring, H. J. Chem. Phys. 1935, 3 , 107. 53. For a more detailed definition of this term, see, e.g., Karas, A. J.; Gilbert, R. G.; Collins, M. A. Chem. Phys. Lett. 1992, 193 , 181–184. 54. Skodje, R. T.; Truhlar, D. G.; Garrett, B. C. J. Phys. Chem., 1981, 85 , 3019. 55. Garrett, B. C.; Truhlar, D. G.; Wagner, A. F.; Dunning, T. H., Jr. J. Chem. Phys. 1983 78 , 4400. 56. Liu, Y. P.; Lu, D. H.; Gonzalez-Lafont, A.; Truhlar, D. G.; Garrett, B. C. J. Am. Chem. Soc. 1993, 115 , 7806. 57. Corchado, J. C.; Chuang, Y.-Y.; Fast, P. L.; Vill`a, J.; Hu, W.-P.; Liu, Y.-P.; Lynch, G. C.; Nguyen, K. A.; Jackels, C. F.; Melissas, V. S.; Lynch, B. J.; Rossi, I.; Coiti˜no, E. L.; Fernandez-Ramos, A.; Pu, J.; Albu, T. V.; Steckler, R.; Garrett, B. C.; Isaacson, A. D.; Truhlar, D. G. POLYRATE 9.1 , University of Minnesota, Minneapolis, MN, 2002, http://comp.chem.umn.edu/polyrate/. 58. (a) Kuppermann, A.; Truhlar, D. G. J. Am. Chem. Soc. 1971, 93 , 1840. (b) Garrett, B. C.; Truhlar, D. G.; Grev, R. S.; Magnuson, A. W. J. Phys. Chem. 1980, 84 , 1730. 59. Bell, R. P. The Tunnel Effect in Chemistry, Chapman & Hall, New York, 1980. 60. Eckart, C. Phys. Rev . 1930, 35 , 1303. 61. See, e.g., Vansteenkiste, P.; Van Neck, D.; Van Speybroeck, V.; Waroquier, M. J. Chem. Phys. 2006, 124 , 044314. 62. Lin, C. Y.; Izgorodina, E. I.; Coote, M. L. J. Phys. Chem. A 2008, 112 , 1956–1964. 63. East, A. L. L.; Radom, L. J. Chem. Phys. 1997, 106 , 6655.

474

FREE-RADICAL POLYMERIZATION

64. (a) Pitzer, K. S.; Gwinn, W. D. J. Chem. Phys. 1942, 10 , 428–440. (b) Pitzer, K. S. J. Chem. Phys. 1946, 14 , 239–243. (c) Li, J. C. M.; Pitzer, K. S. J. Phys. Chem. 1956, 60 , 466–474. (d) Kilpatrick, K. E.; Pitzer, K. S. J. Chem. Phys. 1949, 17 , 1064–1075. 65. Ellingson, B. A.; Lynch, V. A.; Mielke, S. L.; Truhlar, D. G. J. Chem. Phys. 2006, 125 , 084305. 66. Ayala, P. Y.; Schlegel, H. B. J. Chem. Phys. 1998, 108 , 7560. 67. Coote, M. L.; Davis, T. P.; Klumperman, B.; Monteiro, M. J. J. Macromol. Sci. Rev. Macromol. Chem. Phys. 1998, C38, 567–593. 68. Degirmenci, I.; Aviyente, V.; Van Speybroeck, V.; Waroquier, M. Macromolecules 2009, 42 , 3033–3041. 69. Morrison, D. A.; Davis, T. P. Macromol. Chem. Phys. 2000, 201 , 2128–2137. 70. Tomasi, J. Theor. Chem. Acc. 2004, 112 , 184. 71. (a) Klamt, A.; Schueuermann, G. J. Chem. Soc. Perkin Trans. 2 1993, 799. (b) Cossi, M.; Rega, N.; Scalmani, G.; Barone, V. J. Comput. Chem. 2003, 24 , 669. 72. Miertus, S.; Scrocco, E.; Tomasi, J. J. Chem. Phys. 1981, 55 , 117. 73. (a) Klamt, A. J. Phys. Chem. 1995, 99 , 2224. (b) Klamt, A. COSMO-RS: From Quantum Chemistry to Fluid Phase Thermodynamics and Drug Design, Elsevier Science, Amsterdam, 2005. (c) Klamt, A.; Jonas, V.; Burger, T.; Lohrenz, J. C. W. J. Phys. Chem. A 1998, 102 , 5074. 74. Kelly, C. P.; Cramer, C. J.; Truhlar, D. G. J. Chem. Theory Comput. 2005, 1 , 1133. 75. See, e.g., Takano, Y.; Houk, K. N. J. Chem. Theory Comput. 2005, 1 , 70–77. 76. Beuermann, S.; Buback, M.; Hesse, P.; Kuchta, F.-D.; Lacik, I.; Van Herk, A. M. Pure Appl. Chem. 2007, 79 (8), 1463–1469. 77. See, e.g., (a) Namazian, M.; Coote, M. L. J. Phys. Chem. A 2007, 111 , 7227–7232. (b) Hodgson, J. L.; Namazian, M.; Bottle, S. E.; Coote, M. L. J. Phys. Chem. A 2007, 111 , 13595–13605. (c) Namazian, M.; Zare, H. R.; Coote, M. L. Biophys. Chem. 2008, 132 , 64–68. (d) Namazian, M.; Siahrostami, S.; Coote, M. L. J. Fluorine Chem. 2008, 129 , 222–225. (e) Blinco, J. P.; Hodgson, J. L.; Morrow, B. J.; Walker, J. R.; Will, G. D.; Coote, M. L.; Bottle, S. E. J. Org. Chem. 2008, 73 , 6763–6771. (f) Zare, H.; Eslami, M.; Namazian, M.; Coote, M. L. J. Phys. Chem. B 2009, 113 , 8080–8085. 78. See, e.g., Ho, J.; Coote, M. L. J. Chem. Theory Comput. 2009, 5 , 295–306. 79. Levy, R. M.; Kitchen, D. B.; Blair, J. T.; Krogh-Jespersen, K. J. Phys. Chem. 1990, 94 , 4470–4476. 80. Pliego, J. R., Jr.; Riveros, J. M. J. Phys. Chem. A 2001, 105 , 7241–7247.

14

Evaluation of Nonlinear Optical Properties of Large Conjugated Molecular Systems by Long-Range-Corrected Density Functional Theory HIDEO SEKINO and AKIHIDE MIYAZAKI Toyohashi University of Technology, Toyohashi, Japan

JONG-WON SONG and KIMIHIKO HIRAO Advanced Science Institute, RIKEN, Saitama, Japan

Advantages and problems of quantum chemical methods for nonlinear optical (NLO) property evaluation are discussed. Density functional theory (DFT) is the best quantum chemical tool for quantitative evaluation of the property of NLO materials that have no absorption in the response frequency region. We introduce a practical DFT method with long-range correction (LC) for the purpose. We discuss a strategy for realistic evaluation of large conjugated systems, finding sufficient the classical hypothesis that only the π-electron system needs to be considered in conjugated molecules. The errors arising from this approximation are much smaller than those caused by a deficiency in traditional DFT functionals. We examine the LC-DFT method further by comparison of the length dependence between polyyne and polyene. From a comparison with rigorous ab initio correlated methods, we conclude that the LC-DFT method can calculate NLO properties successfully without a catastrophic overestimation of the conventional DFT functionals and can provide basic information for systematic fabrication of new organic NLO materials.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

475

476

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

14.1 INTRODUCTION

The nonlinear optical (NLO) response of materials under intense optical electromagnetic field is an important, yet challenging subject in theoretical and computational materials science that can arise through a variety of processes. However, it is evident that the nonlinear electronic response plays the most significant role, making essential rigorous quantum chemical calculations for molecular systems. While the importance of contributions from vibrational processes in determining the hyperpolarizabilities of small conjugated molecules has recently been highlighted,1 the pure high-order electronic response is paramount, especially for the evaluation of hyperpolarizabilities of large conjugated systems. Quantum chemical methods are the most reliable methods to use to quantitatively describe the electronic response in molecules. Although no exact analytical solution is available for the Schr¨odinger equation of many-electron systems, advances in quantum chemical theories and computational technologies have pushed the methods to the stage where near-equilibrium molecular electronic states can be described to chemical precision. When the intensity of the incident light field is low, the electronic response arises from states whose parity difference corresponds to that of a single photon. For high-intensity incident light fields, however, more than two photons arrive at the material within a short time and interact simultaneously with the electron. The process that describes such situations must therefore involve states whose parity differences correspond to those of multiphotons, and many higher-lying states are accessed. This nonlinear optical response involves a complex analytical formalism, and several practiced methodologies have been developed based on the energy or dipole response properties. We can further adapt these methods to consider the system as being initially in a molecular single state, typically the ground state. However, to describe the electronic response of extended materials quantitatively, we need knowledge of this initial state in the presence of the light field. Therefore, care must be taken to introduce extra flexibility into calculations to allow for this effect. Large delocalized electronic systems are good candidates for NLO materials because they contain many low-lying states that can temporarily be occupied by electrons, perhaps introducing charge-transfer character to the ground state. The nonlinear response of electrons to external fields is often described using such states as intermediates. Therefore, the computational requirements for describing the NLO processes are much more demanding than those for computing just the total energy of the system. Although ab initio correlated methods have been quite successful in providing chemical descriptions of molecules, they are not feasible at present for the evaluation of nonlinear response properties of large systems. Density functional theory (DFT) methods have been shown capable of reproducing and predicting a variety of chemical properties, such as atomization energies, bond lengths, and vibrational frequencies, while requiring much less computational effort than do rigorous ab initio correlated methods.2 Despite their manifold successes in predicting a wide range of chemical properties, DFT has been found to give poor results for some properties, including weakly bound systems and

INTRODUCTION

477

charge-transfer systems, as well as for the electronic response in large conjugated systems.3 The latter aspect is the subject that we discuss in this chapter, demonstrating how these problems can be overcome to yield effective and practical computational methods for the NLO properties of materials. Traditionally, DFT catastrophically overestimates the rate of increase in the polarizability of a long molecule as its length increases.4,5 The well-known deficiency in evaluating polarizabilities comes from inadequacies in the conventional exchange functional used in DFT. Conventional exchange correlation functionals are local and cannot represent correctly the response of the electrons at long distance. The effects are modest in small molecular systems but become nonnegligible in large molecules. Conventional exchange functionals thus fail to evaluate correctly such properties as the polarizability and hyperpolarizability of large molecules. The gradient correction for nonlocality that is commonly applied through the generalized-gradient approximation (GGA) is ineffective in relieving the problem, which instead needs to be solved as a manybody interaction involving different energy levels. Conventional hybrid methods such as B3LYP6 do not improve the situation either, making the search for new functionals a key focus. A variety of approaches have been developed. The optimized effective potential (OEP) method has been advanced as a solution that seems to provide useful results,7 at least when it is implemented appropriately.8 Unfortunately, the OEP method is rather complicated in that an extra equation must be solved to obtain the optimized potential,9 and this equation is also technically difficult to solve. Care must be taken in the choice of appropriate auxiliary basis functions to properly represent the extra equation with in particular the use of large basis sets leading to a deterioration of the solution. Other methods include the Krieger–Li–Iafrate (KLI) approximation10 and the common-energy-denominator approximation (CEDA)11 for large-molecule applications. Unfortunately, these approximations adversely influence calculated response properties even when the ground-state energy is well represented.12 The current density functional theory (CDFT)13,14 provides another alternative for the evaluation of NLO response properties. It predicts reasonable polarizabilities and hyperpolarizabilities15,16 for long molecules (except for hydrogen chains). There has also been a study on the optical properties of molecules using a many-body fxc kernel that yielded good polarizabilities and optical spectra.17 Although such approaches provide deep insights into the origin and evaluation of the NLO properties, their implementation is also rather complicated. Heavier computational demand also makes these methods less accessible for the large molecules that appear in nano or bio systems. Recently, we introduced a simple hybrid method with long-range correction (LC) using an Ewald partitioning technique on the electron repulsion operator to account for the nonlocal effect of long-distance interactions.18,19 The use of this method to evaluate the hyperpolarizabilities of long conjugated systems has been successful.20 – 24 In this chapter we explain briefly the basic theory for the evaluation of molecular hyperpolarizabilities and describe the LC-DFT method. We also discuss the classic π-electron-only hypothesis and its validity for

478

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

conjugated systems in the context of NLO property evaluation. Finally, we discuss the effects of different types of conjugations. 14.2 NONLINEAR OPTICAL RESPONSE THEORY

The response of the electrons in matter to applied optical fields can be measured in terms of the energy W (E) of the electronic systems as a function of the electric field E caused by the incident light: W (E) = W0 + W (1) E + W (2) E 2 + W (3) E 3 + · · ·

(14.1)

Here, W (n) is the nth-order energy in the expansion with respect to the applied field. W (1) is related to permanent dipole moment, and W (2) , W (3) , . . . are related to the linear and nonlinear polarizabilities, respectively. The total energy of the electronic state in equilibrium with the optical field is well defined and can be computed by solving the time-independent Schr¨odinger equation. In most approximation schemes employed in quantum chemistry, the solutions are upper bounded and well behaved as the level of theory and basis set are improved. An alternative finite-field expression based on the analogous expansion of the induced dipole, μ(E) = μ0 + αE + β

E3 E2 +γ + ··· 2 2·3

(14.2)

is also in common use. Here, the zeroth order term, μ0 , is the permanent dipole, while α is the linear polarizability and β, γ, . . . are the hyperpolarizabilities. The advantage of the latter approach is that the dipole moment is a physical observable that can be compared directly with experimental results. However, computation of the induced dipole generally involves more elaborate computations. The direction of the induced moment does not necessarily coincide with that of the applied field, and therefore the expansion coefficients (α, β, γ, . . .) are, in fact, tensors. The key observables, the macroscopic polarization projected against the molecular orientation vectors, are obtained from the ensemble average of the microscopic polarization tensors over the time scale of resolution for the experiments. The shape of the mobile electron cloud is intimately related to the polarization tensor. While all the tensor components are needed, in principle, to evaluate the macroscopic polarization, many NLO materials consist of molecules whose dimension is enlarged in one direction, and thus the corresponding components of the tensors dominant. Since we focus on such a case, that of linearly prolonged conjugated systems, we are concerned primarily with the absolute values of the longitudinal component of α, β, γ, . . . in the expansion above. To achieve intense electric fields, optical laser beams of specific frequency ω are used. This is modeled using the frequency-dependent Hamiltonian Hint (ω) = μ · 12 (e+iωt + e−iωt )E

(14.3)

479

NONLINEAR OPTICAL RESPONSE THEORY

The induced moment is observed at the frequency of the corresponding NLO process. For example, the induced moment from second-harmonic generation (SHG) is observed at the doubled frequency 2ω, that from third harmonic generation (THG) is observed at tripled frequency 3ω, and so on. Therefore, the expressions with only static electric field E, such as in Eqs. (14.1) and (14.2), are inappropriate for specific NLO process and need to be enhanced as μ(E) = μ0 + α0 E0 + α(−ω; ω)Eω eiωt + β0

E02 E2 + β(−2ω; ω, ω) ω e2iωt + β(−ω; ω, 0)E0 Eω eiωt + · · · 2 2

+ γ0

E03 E3 E2 + γ(−3ω; ω, ω, ω) ω e3iωt + γ(−2ω; ω, ω, 0) ω E0 e2iωt 2·3 2·3 2

+ γ(−ω; ω, ω, −ω)

Eω2 E−ω eiωt + · · · 2

(14.4)

Typically, the frequency-dependent expansion coefficients, α(−ω; ω), β(−2ω; ω, ω), β(−ω; ω, 0), γ(−3ω; ω, ω, ω), γ(−2ω; ω, ω, 0), γ(−ω; ω, ω, −ω), . . . are formulated in the sum-over-states (SOS) representation using time-dependent perturbation theory as α(−ω; ω) = 2P−ω,ω

n|μ|kk|Hint (ω)|n kn − ω

(14.5a)

k

β(−ωσ ; ω1 , ω2 ) = 3K(−ωσ ; ω1 , ω2 )P−σ,1,2 ·

n|μ|ll|H int (ω2 )|kk|Hint (ω1 )|n (ln − ωσ )(kn − ω1 ) k,l

(14.5b) γ(−ωσ ; ω1 , ω2 , ω3 ) = 4K(−ωσ ; ω1 , ω2 , ω3 )P−σ,1,2,3 n|μ|mm|H int (ω3 )|ll|H int (ω2 )|kk|Hint (ω1 )|n · k,l,m (mn − ωσ )(ln − ω1 − ω2 )(kn − ω1 ) ⎤ n|μ|ll|Hint (ω3 )|nn|Hint (ω2 )|kk|Hint (ω1 )|n ⎦ − (14.5c) (ln − ωσ )(ln − ω1 )(kn + ω2 ) k,l

Here, P−σ,1,2,3,... denotes the average of all terms generated by simultaneous while corresponding operators, permutations at frequencies ωσ , ω1 , ω2 , ω3 , . . . , means a summation of all μ, H (ω1 ), H (ω2 ), H (ω3 ), . . . and the notation states except the initial state n. Here, kl = ωk − ωl − 12 ikl is defined by the energy difference of states k and l corrected by a radiative damping factor, a complex number that plays an important role in resonant situations. Also, K(−ωσ ; ω1 , ω2 ), K(−ωσ ; ω1 , ω2 , ω3 ), . . . are the numerical prefactors that depend on the NLO process of interest. The prefactors typically are established so as to provide a consistent identical hyperpolarizability value at zero-frequency limit in the expressions corresponding to different NLO processes. However, care must be taken when the theoretical values thus evaluated are compared with experimental values, since ensemble averaging of microscopic tensor

480

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

components contributes to the observable differently for each experimental setting. As is seen in the equations above, the properties are expressed as a summation of products of transition moments between ground and intermediate excited states, divided by energy denominators. The latter are the energy difference between the electronic states shifted by multiples of the applied frequency, ω, 2ω, 3ω, and so on, in the nonresonant situation. For the resonant-frequency region, the damping factors kl play an important role in evaluating the lifetime and lineshape, but in this chapter we are concerned primarily with NLO properties in the nonresonant region where no significant absorption occurs. However, it should be noted that the effects of dispersion on nonlinear process are enhanced compared to those in a linear process because of the multipliers in the denominators. When the energy difference between the electronic states (excitation energy) approaches the doubled or tripled frequency of the applied field, the dispersion becomes nontrivial. While the SOS representation provides a wealth of information concerning the NLO process of interest, it involves the infinite number of intermediate states whose evaluation is impossible in practice. Unfortunately, truncation of the intermediate states is not a successful strategy because the expansion is poorly convergent.25 Of course, it is possible to compute the dynamic properties by directly solving the perturbed equation of appropriate order and the corresponding NLO process at a given frequency, and the frequency-dependent NLO property has been evaluated by the time-dependent coupled Hartree–Fock (TDCHF) method.26 LC-DFT implementation of such an algorithm for NLO property evaluation is in progress.20 We here compute hyperpolarizabilities of long conjugated molecules at zero frequency in order to evaluate their dependence on the length of the molecule. In the zero-frequency limit, we can use finite-field techniques based on Eq. (14.1) and therefore almost all quantum chemical methods can be employed. While a property evaluated in the zero-frequency limit may be quite different from that observed at the specific frequency in a certain kind of experimental setting, this approach provides much information concerning NLO materials. We explain our hybrid DFT method developed recently, introducing a range-dependent partition of the Coulomb force known as the range separation hybrid (RSH) scheme. 14.3 LONG-RANGE-CORRECTED DENSITY FUNCTIONAL THEORY

As explained in the introduction, DFT is the most appropriate quantum chemical method for large-molecular systems such as long conjugated molecules but suffers from a few pertinent problems. To correct for the long-range deficiencies of traditional exchange functionals, a partitioning technique is introduced. Following to original idea of Savin,27 we partition the Coulomb force into short- and longrange parts using the error function

LONG-RANGE-CORRECTED DENSITY FUNCTIONAL THEORY

1 − erf(μr12 ) erf(μr12 ) 1 = + rij r12 r12

481

(14.6)

where μ is a parameter that determines the ratio of the partition. The shortrange exchange energy Exsr is computed by modifying the usual exchange energy expression from Ex = − 12 σ into Exsr = −

1 2 σ

4/3

ρσ Kσ d 3 R

(14.7)

√ 1 8 K π erf (b − c ) dr a ρ4/3 1 − + 2a σ σ σ σ σ σ 3 2aσ (14.8)

where aσ , bσ , and cσ are 1/2

μKσ aσ = √ 1/3 6 πρσ 1 bσ = exp − 2 − 1 4aσ cσ = 2aσ2 bσ +

1 2

(14.9) (14.10) (14.11)

and Kσ is called the enhancement factor. The use of Kσ allows the modification of GGA functionals. The long-range part of the exchange energy Exlr is evaluated using Hartree–Fock (HF) exchange integrals as Exlr = −

occ occ i

and

(ij |j i)lr

erf(μr12 ) ψr ψs (pq|rs) = ψp ψq r12 lr

(14.12)

j

(14.13)

where ψiσ is the ith molecular orbital (MO). In contrast to density partitioning schemes such as B3LYP, the proportion of the nonlocal HF contribution varies according to the range of the interaction in the present LC scheme. The ratio of the nonlocal HF part to the local DFT part becomes larger at greater distances, thus including the nonlocal effect more efficiently. In all the DFT calculations using the LC scheme, Becke’s exchange and one-parameter functional (BOP) is used with a parameter of μ = 0.4728 (except for one example discussed in Section 14.4.1), and all the calculations are performed using the development version of GAUSSIAN03.29

482

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

14.4 EVALUATION OF HYPERPOLARIZABILITY FOR LONG CONJUGATED SYSTEMS 14.4.1 Examination of the Classical Hypothesis: The Role of π-Conjugation in Determining NLO Properties

Chemists have categorized conjugated molecules quite differently from other hydrocarbons because of their distinguished reactivities and spectroscopic properties; it is apparent that these molecules have enhanced responses to light irradiation. The reason for this sensitivity has been attributed to their mobile π-electrons, which can move freely through the conjugation pathway in the system. Since the early theoretical development of spectroscopic quantum chemistry, it has been recognized that it is a good approximation to ignore the effects of the many σ-electrons in a large molecule and focus purely on the contribution from the much fewer π-electrons, providing an enormous computational simplicity. With modern software and hardware, such ideas have seemed to become obsolete. However, as is evident from the SOS representation of NLO properties given above, π-electrons do play a role of paramount importance in the nonlinear optical process. An important question is therefore the quantitative reliability of the π-electron approximation in practical NLO applications. We show in Table 14.1 the longitudinal polarizabilities of polyenes with different lengths evaluated using the π-electron approximation together with those obtained using all electrons. All properties are evaluated by a finite-field method using energies computed by the HF, BLYP, B3LYP, and LC-BLYP (μ = 0.33)19 methods as a function of the applied field, but for the π-electron approximation, the finite field is applied only on the π-space of the Hamiltonian. There is found a systematic difference in the evaluated absolute value of the polarizabilities with, in particular, the π-electron approximation significantly underestimating the property. This comes from the omission of the σ-electron response, with the error increasing as the size of the system increases. However, the neglected contribution does not increase TABLE 14.1 Longitudinal Polarizabilities α (a.u.) of Polyenes Computed by the HF, BLYP, B3LYP, and LC-BLYP Methods Using the 6-31G Basis Set

Ethylene Total π only Butadiene Total π only C20 H22 Total π only

HF

BLYP

B3LYP

LC-BLYP

33.66 21.94

30.90 17.31

26.84 18.27

31.06 17.96

80.91 63.33

78.09 58.81

70.32 59.63

75.08 55.15

1328 1225

2046 1995

1609 1548

1253 1147

EVALUATION OF HYPERPOLARIZABILITY

483

with length as much as the contribution from the π-electron part. Consequently, for the longer polyenes, the relative error of the approximation becomes more acceptable. Indeed, for the longer molecules, the variation in the computed value with computational methods significantly exceeds the error introduced by the π-electron approximation. It is interesting to note that even the error caused by crude representation of the space using STO-3G (921 for total and 858 for π-only compared with 1253 and 1147 of 6-31G LC-BLYP) seems to be similar or even less than the one from a deficiency of conventional DFT functional (1633 for total and 1603 for π-only compared with 2046 and 1995 of 6-31G BLYP) for C20 H22 . In Table 14.2 we summarize the longitudinal hyperpolarizabilities of C20 H22 . For this molecule, the π-electron approximation results in an overestimation, indicating a more complicated mechanism for this NLO process than for the linear response process, even in the interplay between σ- and π-electrons responding to the applied field. The error in the π-electron approximation remains less than the variation with computational methods, however. 14.4.2 Double- and Triple-Bonded Systems

We calculate the hyperpolarizabilities (γ) of polyyne and polyene to examine the NLO properties of different conjugated systems using DFT, HF, and ab initio electron correlation methods such as M¨uller–Plesset MP2, MP3, MP4(SDQ) theory30,31 and coupled-cluster CCSD, and CCSD(T) theory.32,33 For the geometries of the polyynes H—(C≡C)n —H, a single (C—C) bond length of ˚ and triple (C≡C) bond length of 1.2050 A ˚ are used, taken from 1.3650 A the averaged experimental values obtained from x-ray diffraction data of the i-Pr3 Si—(C≡C)n —Sii-Pr3 (n = 4, 5, 6, and 8) molecules.34 For the polyenes H—(HC=CH)n —H, we used the geometries obtained from B3LYP/6-311G geometry optimizations.4 In all calculations,the cc-pVDZ basis set35 is used. Hyperpolarizabilities γ are computed by the finite-field (FF) method using Eq. (14.1) by numerical Romberg iteration.36 Figure 14.1 and Table 14.3 show, respectively, the γ-values of polyynes obtained using DFTand several wavefunction methods. As reported by other researchers,3 the pure functional [BOP (B88x exchange37 and the one-parameter progressive correlation functional38 )] and the hybrid functional, B3LYP,39,40 which do not have long-range correction, overestimate γ-values. The tendency becomes more enhanced as the chain length, n, increases. The LC-DFT (LC-BOP) functional provides γ-values reasonably close to those from the TABLE 14.2 Longitudinal Second Hyperpolarizabilities γ (107 a.u.) of C20 H22 Computed Using the 6-31G Basis Seta

Total π only a The

HF

BLYP

B3LYP

LC-BLYP

2.0 (2.0) 2.3

5.8 (5.6) 6.6

5.6 (5.5) 6.4

2.8 (3.1) 3.2

values in parentheses were obtained using cc-pVDZ.

484

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

Fig. 14.1 (color online) H—(C≡C)n —H.

Longitudinal second hyperpolarizabilities (γ) of the polyynes,

TABLE 14.3 Calculated Longitudinal Second Hyperpolarizabilities (γ/a.u.) of the Polyynes, H—(C ≡ C)n —Ha γ

1(×102 ) 2(×103 ) 3(×104 ) 4(×105 ) 5(×105 ) 6(×106 ) 7(×106 ) 8(×106 )

BOP B3LYP LC-BOP HF MP2 MP3 MP4 CCSD CCSD(T) a The

6.04 5.11 4.96 2.42 6.10 5.84 5.32 5.74 5.11

6.84 7.11 8.32 6.63 10.8 9.16 9.44 9.91 10.2

4.86 5.03 5.42 4.20 6.86 5.50 5.77 5.82 6.45

2.14 2.14 1.96 1.50 2.56 1.94 2.07 2.03 2.32

7.13 6.79 5.40 3.87 6.97 5.00 5.39 5.27 6.29

2.04 1.76 1.20 0.81 1.53 1.04 1.14 1.10 1.36

4.56 3.91 2.28 1.42 2.84 1.87 2.05 1.95 2.51

9.91 7.74 3.86 2.31 4.81 3.03 3.36 3.16 4.17

numbers in the first row are the unit number n.

CCSD and CCSD(T). On the other hand, HF shows the lowest value and MP2 shows the highestvalue among the wavefunction methods. Figure 14.2 and Table 14.4 show, respectively, the γ-values of polyenes obtained with the DFT and wavefunction methods. Although the complete set of γ-values as a function of chain length n is not presented, key features can be identified. Similar to the results obtained for the polyynes, MP2 predicts the highest and HF the lowest values among the wavefunction methods. The conventional functionals (BOP and B3LYP) also predict large values, while the LC-DFT (LCBOP) functional again predicts γ-values surprisingly close to those from CCSD and CCSD(T). On the other hand, MP2 predicts the largest γ-values for the entire range of the polyynes and the polyenes in all the methods, except for conventional DFT methods which present gradual divergence of hyperpolarizabilities as the chain numbers are larger.

EVALUATION OF HYPERPOLARIZABILITY

485

Fig. 14.2 (color online) Longitudinal second hyperpolarizabilities (γ) of the polyenes, H—(HC=CH)n —H. TABLE 14.4 Longitudinal Second Hyperpolarizabilities (γ/a.u.) of the Polyenes, H—(HC=CH)n —Ha γ

2(×104 ) 3(×105 ) 4(×105 ) 5(×106 ) 6(×106 ) 7(×106 ) 8(×107 ) 9(×107 ) 10(×107 )

BOP B3LYP LC-BOP HF MP2 MP3 MP4 CCSD CCSD(T) a The

0.31 1.07 1.09 0.75 4.40 2.19 1.81 1.52 1.35

0.77 0.83 1.05 0.68 1.69 1.62 1.49 1.25 1.16

3.43 3.71 4.01 2.82 7.88 6.16 5.74 4.54 4.16

1.12 1.21 1.16 0.85 1.82 1.82 1.72 1.38 1.32

3.06 3.25 2.67 2.04 4.30 3.95 3.73 2.88 2.80

7.27 7.54 5.34 4.14 8.84 7.12 6.86 5.47 5.45

1.58 1.58 0.93 0.75 1.55 1.27 1.21 0.90 0.93

3.11 3.04 1.50 1.27 2.55

5.75 5.50 2.28 1.97 3.93

numbers in the first row are unit number n.

On moving from CCSD to CCSD(T), the γ-values of the polyynes change significantly, suggesting that even the CCSD(T) hyperpolarizabilities are not converged with respect to the inclusion of correlation effects (see Fig. 14.3). The calculation of γ for polyynes appears to be a challenging case problem for conventional correlated methods.41 – 44 On the other hand, the differences between the hyperpolarizabilities calculated for the polyenes by CCSD and CCSD(T) are small, perhaps suggesting that the values for the polyenes are nearly converged. Although direct comparison to the experimental values of the absolute values evaluated theoretically should be the final goal for theorists, it is well known that the absolute value of third-order hyperpolarizabilities in the condensed phase is strongly pronounced through intermolecular interactions.43 Some of those effects can be taken conveniently in local field correction, which assumes a continuous medium, but the large deviation of absolute molecular hyperpolarizability values

486

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

Fig. 14.3 (color online) Variation in the longitudinal second hyperpolarizabilities (γ) of the polyenes, H—(HC=CH)n —H (n = 6, 7, and 8), and the polyynes, H—(C≡C)n —H (n = 6, 7, and 8) with the calculation method used: HF, MP2, MP3, MP4, CCSD, and CCSD(T).

computed by rigorous ab initio methods from the third-order NLO coefficients observed for those systems suggests that the intermolecular interaction effect in those systems is paramount under the experimental settings, and it is clear that the much more sophisticated and/or computationally demanding methods must be used for taking the effects. We introduce here an argument using a powerlaw function to calibrate the length effect on the molecular property. To identify the length dependence of the molecular properties, simple power-law functions provide a useful tool. We fit the calculated γ-values for polyynes n = 1 to 8 and polyenes n = 2 to 10 to the power-law function, γ = bnc , and the results are given in Table 14.5 [only n = 1 to 8 are used for MP2, MP3, MP4, CCSD, and CCSD(T)]. For both the polyynes and polyenes, the exponents c calculated by the pure and hybrid functionals exceed 5 and are very much larger than those obtained using wavefunction methods. This is consistent with the fact that the γ-values for large molecules calculated using conventional DFT are overestimated4,5,21 ; hence, these methods cannot provide reliable information on the length dependence of NLO properties. On the other hand, the exponents evaluated using LC-DFT are rather close to those from CCSD(T) for the polyynes. It is notable that the HF exponent for the polyenes is larger than that from other wavefunction methods, whereas that for the polyynes is smaller. The hyperpolarizability exponent c observed for the polyynes, 4.3,34 is higher than that for the polyenes, 2.3 to 3.6.45,46 Contrary to the experimental findings, all values computed for the polyenes exceed those for the polyynes. It is well known that for a reliable comparison with experiment, vibrational NLO effects should be considered. To estimate these contributions for the polyenes and polyynes, we use RHF/6-31G calculated values44 for the ratio of

487

79 (±11) 176 (±4) 778 (±79) 971 (±99) 1086 (±93) 1196 (±125) 1186 (±117) 1314 (±140) 1142 (±119)

b

γ 5.64 (±0.066) 5.14 (±0.011) 4.09 (±0.050) 3.74 (±0.050) 4.04 (±0.042) 3.77 (±0.052) 3.82 (±0.049) 3.75 (±0.053) 3.95 (±0.052)

c

Polyyne

145 (±20) 171 (±6) 620 (±49) 882 (±123) 937 (±89) 968 (±85) 980 (±93) 1132 (±148) 917 (±82)

b

γvib b 5.29 (±0.071) 5.13 (±0.018) 4.19 (±0.042) 3.78 (±0.070) 4.09 (±0.050) 3.85 (±0.051) 3.90 (±0.051) 3.84 (±0.052) 4.04 (±0.047)

c 92 (±3) 142 (±4) 1812 (±67) 881 (±86) 2397 (±198) 2158c (±282) 2052c (±201) 2345c (±361) 1697c (±225)

b

γ 5.80 (±0.013) 5.59 (±0.013) 4.10 (±0.041) 4.35 (±0.043) 4.22 (±0.037) 4.17c (±0.065) 4.17c (±0.048) 3.97c (±0.076) 4.14c (±0.066)

c

Polyene

b For

102 (±12) 163 (±18) 2271 (±109) 1129 (±40) 2994 (±214) 3711c (±658) 3655c (±468) 4081c (±569) 2950c (±382)

b

γvib 5.89 (±0.052) 5.66 (±0.049) 4.14 (±0.021) 4.38 (±0.015) 4.25 (±0.032) 4.06c (±0.088) 4.04c (±0.063) 3.85c (±0.069) 4.02c (±0.064)

c

c Values of the γ Power Law (γ = bnc ) for the Polyynes [H—(C ≡ C)n —H] and the Polyenes [H—(HC=CH)n —H]a

values in parentheses are estimates of the fitting error in each method. The cc-pVDZ basis set is used in all calculations. polyyne, we included data only for n = 1 to 7 as Ref. 44 does not give values for n = 8. c For polyene, MP3, MP4, CCSD, and CCSD(T) data are used only for n = 1 to 8.

a The

CCSD(T)

CCSD

MP4

MP3

MP2

HF

LC-BOP

B3LYP

BOP

TABLE 14.5

488

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

vibrational γ-values (γvib ) and the electronic γ-values. In Table 14.5 we also present the calculated exponents c and its prefactor b for the polyynes and polyenes calculated using this correction. While more sophisticated vibrational correction are typically required,1,41 their explicit determination remains impractical for large-molecular systems. The vibrational corrections used here look not to change the exponents so much, but it is noticeable that in the CCSD(T) and LC-DFT methods, which are thought to predict hyperpolarizability values more reliably than other methods do, the exponents for polyynes become slightly larger than those of polyenes. This shows us that the vibrational corrections can be a key to explaining the reason that the hyperpolarizability exponent observed for the polyynes is higher than that for the polyenes. We expect that more sophisticated vibrational correction will be able to address this problem. Besides the vibrational effect, considering the geometries of the polyenes, we also notice that the molecules used in the hyperpolarizability measurements have varying conformations,46,47 whereas all-trans —C C— conformations are used in the calculations.48 Further, the geometries of the polyenes used in the calculations were optimized using B3LYP, a method that underestimates bondlength alternation and hence is expected to overestimate hyperpolarizabilities.6,21 For the polyynes, only one experimental configuration is possible, but various end-group cappings were used in the experiments.45 Another possible explanation for the difference between the length dependences of the hyperpolarizabilities observed and those calculated is that the longitudinal second hyperpolarizability, γzzzz , is calculated, whereas the experimental values refer to the isotropic second hyperpolarizability γ.33 Finally, we must keep in mind that the experimental γvalues are also affected by solvent effects that can significantly alter the energies of excited charge-transfer states, effects absent in the calculations.49 14.5 CONCLUSIONS

We revisited the basic response theory for NLO property evaluation of materials using time-dependent perturbation theory to present a basic strategy for the theoretical investigation of NLO materials. Although the SOS representation is intuitive and may be useful for predicting the behavior of NLO properties in the vicinity of a resonance, it is not practical for the nonresonant situations important for the NLO materials of interest. Direct evaluation of dynamic NLO properties by solving the perturbed equation at the frequency of the applied field also involves considerable computational effort. Finite-field studies of the static hyperpolarizability can provide reliable information about the NLO materials far from resonance; they are limited, however, in that they cannot provide information relating to the specific NLO process with the frequency of the applied oscillating field. Because of the deficiencies in conventional DFT functionals, these methods are not applicable to NLO studies of large conjugated molecules. We introduce a practical method that incorporates long-range corrections into conventional DFT methods. It is based on the simple idea of range-dependent partitioning of the Coulomb interaction. We find that this method provides a

REFERENCES

489

qualitatively correct description of the NLO properties of large molecules without requiring prohibitive computational effort. We investigate further the validity of the π-electron approximation. This approach is found inadequate for an evaluation of the response properties of small molecules, but for larger systems the dominant terms are properly included so that the error diminishes in relative magnitude. Indeed, the error from this approximation becomes much smaller than the variation in the results associated with the choice of the computational method. These results provide an optimistic perspective for the theoretical prediction of the properties of NLO materials, since this approximation considerably reduces the computational resources required. We further investigated the influence of different types of π-conjugation on NLO properties by contrasting polyynes with polyenes. For both systems, LCDFT gives γ-values close to those predicted by CCSD and CCSD(T), whereas conventional DFT methods such as BOP, as well as hybrid DFT methods such as B3LYP, considerably overestimate the response. MP2 predicts the highest and HF predicts the lowest γ-values among all the wavefunction methods tested. The CCSD and CCSD(T) methods predict similar hyperpolarizabilities for the polyenes but not for the polyynes, indicating that electron correlation may not be described properly in the dense π-electron polyynes. For the exponential scale factor c (from the fit γ = bnc ), LC-DFT also predicts results similar to those of CCSD(T). The theoretical prediction that hyperpolarizabilities increase much faster with increasing length for polyenes compared to polyynes is inconsistent with experimental observations, however. This could arise from the differences in the chemical structures considered, solvent effects, or the approximation that the diagonal hyperpolarizability component dominates the values observed. Even though the vibrational effect considered here shows a small influence on the γvalue and γ scaling factor, more sophisticated vibrational effects may correct the theoretical inconsistency with the experimental observations. Acknowledgments

J.-W.S. is indebted to the postdoctoral fellowship for a foreign researcher of the Japan Society for the Promotion of Science (JSPS). H.S. is grateful for support from the Next Generation Supercomputer Project, Nanoscience Program, MEXT, Japan. REFERENCES 1. Torrent-Sucarrat, M.; Sola, M.; Duran, M.; Luis, M. J.; Kirtman, B. J. Chem. Phys. 2004, 120 , 6346. 2. Koch W.; Holthausen, M. C. A Chemist’s Guide to Density Functional Theory, WileyVCH, New York, 2000. 3. (a) Reimers, J. R.; Cai, Z.-L.; Bili´c, A.; Hush, N. S. Ann. N.Y. Acad. Sci . 2003, 1006 , 235. (b) Cai, Z.-L.; Sendt, K.; Reimers, J. R. J. Chem. Phys. 2002, 117 , 5543.

490

EVALUATION OF NONLINEAR OPTICAL PROPERTIES

´ A.; Jaquemin, D.; van Gisbergen, S. J. A.; Baerends, 4. Champagne, B.; Perp`ete, E. E.-J.; Soubra-Ghaoui, C.; Robins, K. A.; Kirtman, B. J. Phys. Chem. A 2000, 104 , 4755. ´ A.; van Gisbergen, S. J. A.; Baerends, E.-J.; Snijders, 5. Champagne, B.; Perp`ete, E. J. G.; Soubra-Ghaoui, C.; Robins, K. A.; Kirtman, B. J. Phys. Chem. A 1998, 109 , 10489. 6. Stevens, P. J.; Devlin, J. F.; Chabalowski, C. F.; Frisch, M. J. J. Phys. Chem. 1994, 98 , 11623. 7. Sahni, V.; Gruenebaum, J.; Perdew, J. P. Phys. Rev . 1982, B26, 4371. 8. Mori-S´anchez, P.; Wu, Q.; Yang, W. J. Chem. Phys. 2003, 119 , 11001. 9. (a) Kummel, S.; Perdew, J. P. Phys. Rev. B 2003, 68 , 035103. (b) Kummel, S.; Perdew, J. P. Phys. Rev. Lett. 2003, 90 , 043004. 10. Krieger, J. B.; Li, Y.; Iafrate, G. J. Phys. Rev . 1992, A46, 5453. 11. Gritsenko, O. V.; Baerends, E. J. Phys. Rev . 2001, A64, 042506. 12. K¨ummel, S.; Kronik, L.; Perdew, J. P. Phys. Rev. Lett. 2004, 93 , 213002. 13. van Faassen, M.; de Boeij, P. L.; van Leeuwen, R.; Berger, J. A.; Snijders, J. G. J. Chem. Phys. 2003, 118 , 1044. 14. van Faassen, M.; Jensen, L.; Berger, J. A.; de Boeij, P. L. Chem. Phys. Lett. 2004, 395 , 274. 15. van Faassen, M.; de Boeij, P. L.; van Leeuwen, R.; Berger, J. A.; Snijders, J. G. Phys. Rev. Lett. 2002, 88 , 186401. 16. van Faassen, M. Int. J. Mod. Phys. 2006, B20, 3419. 17. Marini, A.; Del Sole, R.; Rubio, A. In Time-Dependent Density Functional Theory, Lecture Notes in Physics, Vol. 706, Marques, M. A. L., Ullrich, C. A., Nogueira, F., Rubio, A., Burke, K., and Gross, E. K. U., Eds., Springer-Verlag, Berlin, 2006, Chap. 20. 18. Iikura, H.; Tsuneda, T.; Yanai, T.; Hirao, K. J. Chem. Phys. 2001, 115 , 3540. 19. Tawada, Y.; Tsuneda, T.; Yanagisawa, S.; Yanai, T.; Hirao, K. J. Chem. Phys. 2004, 120 , 8425. 20. Kamiya, M.; Sekino, H.; Tsuneda, T.; Hirao, K. J. Chem. Phys. 2005, 122 , 234111. 21. Sekino, H.; Maeda, Y.; Kamiya, M.; Hirao, K. J. Chem. Phys. 2007, 126 , 014107. 22. Kirtman, B.; Bonness, S.; Ramirez-Solis, A.; Champagne, B.; Matsumoto, H.; Sekino, H. J. Chem. Phys. 2008, 128 , 114108. 23. Song, J.-W.; Watson, M. A.; Sekino, H.; Hirao, K. J. Chem. Phys. 2008, 129 , 024117. 24. Song, J.-W.; Watson, M. A.; Sekino, H.; Hirao, K. Int. J. Quantum Chem. 2009, 109 , 2012. 25. Sekino, H.; Bartlett, R. J. Theoretical and Computational Modeling of NLO and Electronic Materials, Karna, S. P., and Yeates, A. T., Eds., ACS Symposium Series, 1994, pp. 79–101. 26. Sekino, H.; Bartlett, R. J. J. Chem. Phys. 1986, 85 , 976. 27. Savin, A. In Recent Developments and Applications of Modern Density Functional Theory, Seminario, J. J., Ed., Elsevier, Amsterdam, 1996, Chap. 9. 28. Song, J.-W.; Hirosawa, T.; Tsuneda, T.; Hirao, K. J. Chem. Phys. 2007, 126 , 154105. 29. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; et al. Gaussian 03, Revision D.02 , Gaussian Inc., Wallingford CT, 2004.

REFERENCES

491

30. Sekino, H.; Maeda, Y.; Kamiya, M. Mol. Phys. (Bartlett Special Issue) 2005, 103 , 2183. 31. M¨uller, C; Plesset, M. S. Phys. Rev . 1934, 46 , 0618. 32. Bartlett, R. J; Purvis, G. D., III. Int. J. Quantum Chem. 1978, 14 , 561. 33. Pople, J. A.; Krishnan, R.; Schlegel, H. B; Binkley, J. S. Int. J. Quantum Chem. 1978, 14 , 545. 34. Eisler, S.; Slepkov, A. D.; Elliott, E.; Luu, T.; McDonald, R.; Hegmann, F. A.; Tykwinski, R. R. J. Am. Chem. Soc. 2005, 127 , 2666. 35. Dunning, T. H., Jr. J. Chem. Phys. 1989, 90 , 1007. 36. Jaquemin, D.; Champagne, B.; Andr´e, J.-M. Int. J. Quantum Chem. 1997, 65 , 679. 37. Becke, A. D. Phys. Rev. A 1988, 38 , 3098. 38. Tsuneda, T.; Suzumura, T.; Hirao, K. J. Chem. Phys. 1999, 110 , 10664. 39. Lee, C.; Yang, W.; Parr, R. G. Phys. Rev. B 1988, 37 , 785. 40. Becke, A. D. J. Chem. Phys. 1993, 98 , 5648. 41. Torrent-Sucarrat, M.; Sol´a, M.; Duran, M.; Luis, J. M.; Kirtman, B. J. Chem. Phys. 2003, 118 , 711. 42. Toto, J. L.; Toto, T. T.; de Melo, C. P. Chem. Phys. Lett. 1996, 104 , 8586. 43. Bredas, J. L.; Adant, C.; Tackx, P.; Persoons, A.; Pierce, B. M. Chem. Rev . 1994, 94 , 243. 44. Kirtman, B.; Champagne, B. Int. Rev. Phys. Chem. 1997, 16 , 389. 45. Luu, T.; Elliott, E.; Slepkov, A. D.; Eisler, S.; McDonald, R.; Hegmann, F. A.; Tykwinski, R. R. Org. Lett. 2005, 7 , 51. 46. Samuel, I. D. W.; Ledoux, I.; Dhenaut, C.; Zyss, J.; Fox, H. H.; Schrock, R. R.; Silbey, R. J. Science 1994, 265 , 1070. 47. Craig, G. S. W.; Cohen, R. E.; Schrock, R. R.; Silbey, R. J.; Puccetti, G.; Ledoux, I.; Zyss, J. J. Am. Chem. Soc. 1993, 115 , 860. 48. Rossi, G.; Chance, R. R.; Silbey, R. J. Chem. Phys. 1989, 90 , 7594. 49. Ray, P. C. Chem. Phys. Lett. 2004, 395 , 269.

15

Calculating the Raman and HyperRaman Spectra of Large Molecules and Molecules Interacting with Nanoparticles NICHOLAS VALLEY Northwestern University, Evanston, Illinois

LASSE JENSEN Pennsylvania State University, University Park, Pennsylvania

JOCHEN AUTSCHBACH University at Buffalo–SUNY, Buffalo, New York

GEORGE C. SCHATZ Northwestern University, Evanston, Illinois

This chapter describes calculations of the Raman and hyperRaman spectra of large molecules and molecules interacting with nanoparticles using time-dependent density functional theory with the Amsterdam density functional (ADF) program package. The ADF code uses Slater basis functions, which provides a very efficient basis set for optical property calculations using density functional theory (DFT). In addition, ADF has special capabilities for determining resonant Raman spectra, which is enabled by the inclusion of excited-state lifetimes in the calculations, and therefore polarizabilities and polarizability derivatives for wavelengths close to resonance can be determined. Specific details of the theory are described, and examples of applications to pyridine (for nonresonant properties) and uracil (for resonant properties) are provided.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

493

494

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

15.1 INTRODUCTION

Raman spectroscopy is an inelastic linear light-scattering method that provides a vibrational fingerprint of a molecule. This fingerprint can be used to identify molecules, so there has been increasing interest in using Raman in analytical chemistry applications and medical diagnostics,1 – 6 particularly with the development of lasers and detectors which allow Raman measurements to be made over a wide range of wavelengths from the near-infrared to the ultraviolet (UV). HyperRaman spectroscopy is an analogous optical technique that involves inelastic light scattering relative to the second harmonic of the incident light, so this nonlinear optics technique also provides a vibrational fingerprint, but for an incident frequency which is only half of the frequency needed to produce the same scattered photon as in Raman scattering.7 – 10 In addition, the selection rules for hyperRaman are different from those for Raman, as the latter involves two photons (incident and scattered) while the former involves three photons (two incident plus one scattered). This means that vibrations that are silent in Raman become active in hyperRaman. Both techniques are intrinsically weak processes, however both can be amplified by placing the molecule next to a silver or gold nanoparticle, as plasmon excitation in the particle can produce enhanced electromagnetic fields near the particle surface, leading to surface-enhanced Raman and hyperRaman spectroscopies (SERS and SEHRS, respectively).11,12 In addition, enhancement can also arise if the molecule has a resonant electronic state at the excitation wavelength, leading to resonance Raman and resonance hyperRaman spectroscopy. Under favorable conditions it is possible to combine resonance and surface enhancement effects, leading to surface-enhanced resonance Raman spectroscopy (SERRS) and surface-enhanced resonance hyperRaman spectroscopy (SERHRS).13,14 Raman intensities are proportional to the square of the derivative of the polarizability of the molecule with respect to vibrational normal coordinates,15 so the calculation of Raman intensities requires a determination of the frequencydependent polarizabilities, usually by determining the first-order response of the molecule to the applied electromagnetic field. Many electronic structure codes have the ability to produce Raman spectra in the static limit (low frequency) through analytical determination of the polarizability derivative. This works well for small molecules that do not have important electronic transitions in the visible. However, for larger systems, especially for molecules with transitions at optical frequencies, or for molecules interacting with metal particles (as in SERS), this approximation is not appropriate. In this chapter we describe calculations of Raman intensities based on the Amsterdam density functional (ADF) code,16 – 18 a code specifically developed to determine response properties using time-dependent density functional theory (TDDFT). The basics of density functional theory (DFT) and TDDFT are described in detail in Chapter 1. ADF and a recently developed local version of ADF have some unique features for calculating Raman, resonance Raman, and SERS intensities at finite frequencies.19 – 21 ADF can also determine hyperRaman intensities, but

INTRODUCTION

495

in an automated fashion only in the static limit at this point. The capability of calculating dynamic hyperpolarizabilities is available22 – 24 and will soon be combined with near-resonance damping functionality. In either case, ADF provides an efficient approach to studying large-molecular systems due to the use of Slater orbital basis functions in the calculations. These functions mimic the slow fall-off of atomic orbitals, a property that is especially important for response properties, much better than do Gaussian orbitals. Hence, they provide a more efficient representation of the change in density that arises in response to an applied electromagnetic field. As such, ADF enables the determination of Raman intensities for a number of challenging problems,25 including studies of the resonance Raman scattering for molecules with multiple excited states,20 and the study of SERS intensities for molecules interacting with silver and gold metal clusters.26,27 In all these SERS calculations, the atoms in the molecule and in the metal cluster are described using basis sets of comparable quality and the same density functional [the same combination of exchange–correlation (XC) potential and XC response kernel]. This has the advantage of providing a completely balanced electronic structure description of the entire system, but a limitation with this approach is that the calculations are restricted to a total system size on the order of 100 to 200 atoms. To go beyond this requires methods that partition between components of the system that are described with quantum mechanics and components described using classical electrodynamics. The formal theory of such calculations was recently developed28 but has not yet been implemented. The Raman intensity calculation begins with a determination of the harmonic frequencies and normal coordinates of vibration for the molecule of interest by using density functional theory to calculate the Hessian matrix (second derivative of the energy with respect to the nuclear positions). Diagonalization of the mass-weighted Hessian determines the vibrational frequencies, and the eigenvectors define the normal coordinates. Subsequently, the polarizabilities (second derivative of energy with respect to applied finite field) are determined from TDDFT. For the Raman intensity, the polarizability calculations are performed for geometries that are displaced from equilibrium so that the derivatives of the polarizability with respect to each normal mode vibration can be calculated by finite differencing. Both normal Raman differential cross sections and relative surface-enhanced Raman intensities can be calculated from combinations of the polarizability derivatives. This approach can also be expanded to allow for the calculation of resonance Raman spectra. HyperRaman and surface-enhanced Raman spectra can also be calculated using ADF. While the use of finite differencing may seem to be inefficient relative to the analytical evaluation of the polarizability derivatives, for large molecules one often does not want or need derivatives with respect to all the modes. Indeed, for applications in SERS, where the system of interest is a molecule plus a large metal cluster, only a small fraction of the possible modes, those referring to vibrations of the molecule, is of interest, and in any case the finite-difference procedure is trivially parallelized.

496

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

In the following sections we describe the underlying theory of Raman/ hyperRaman intensity calculations, and specific details of these calculations based on the ADF code. 15.2 DISPLACEMENT OF COORDINATES ALONG NORMAL MODES

To construct the Raman or hyperRaman spectrum of a computationally large molecule, it is first necessary to calculate the vibrational normal modes of an optimized local minimum structure. Details of these calculations, which involve diagonalization of the mass-weighted Hessian matrix (second derivative of energy with respect to atomic coordinates), are described in standard textbooks, so we omit the steps here. The Raman intensity is easily calculated by making the double harmonic approximation. This approximation is composed of two parts.29 First, each vibration is assumed to be described by a harmonic potential (i.e., a linear expression for the intramolecular forces). Second, the dipole moment function μ(r) is assumed to vary linearly with the normal mode coordinate r in the region where r is close to the equilibrium structure denoted by re . In ADF it is possible to calculate the energies (in wavenumbers, cm−1 ) and Cartesian displacements (in bohr) of the vibrational normal modes of a molecule by using the FREQUENCIES keyword under the GEOMETRY block. To calculate the polarizability derivatives, the components of the polarizability tensor are calculated at two structures that have been displaced in different directions along a vibrational mode. Starting with the equilibrium geometry, the coordinates Req,i of each atom are changed by a small amount ±sR Rk,i where Rk,i is the Cartesian displacement of the ith coordinate in the kth vibrational normal mode and sR is the step size. Ideally, sR should be mode specific, such that a more shallow potential (low harmonic frequency) should be treated with a somewhat larger displacement.30 The sR should be chosen so that the norm (root of the sum of squares) of sR Rk,i for each k is on the order of a few hundredths of a bohr.31 If the sR is too large, the double harmonic approximation breaks down, while if it is too small, there will not be an appreciable change in the polarizability tensor. Both cases will lead to errors in the polarizability derivatives and thus the calculated Raman intensities. Once a suitable sR has been chosen, the equilibrium coordinates are displaced to obtain two sets of coordinates. The set created by using Req,i − sR Rk,i will be denoted as the minus structure, and those created by Req,i + sR Rk,i will be denoted as the plus structure. Polarizability derivatives are then calculated by finite differencing. 15.3 CALCULATION OF POLARIZABILITIES USING TDDFT

Polarizabilities can be calculated using time-dependent DFT (TDDFT) response theory. In the ADF program, this functionality can be reached by specifying

CALCULATION OF POLARIZABILITIES USING TDDFT

497

the input “block” keyword RESPONSE or AORESPONSE [conveniently, also via the graphical user interface (GUI)]. ADF input files consist of a list of keywords (e.g., BASIS, ATOMS, GEOMETRY) which provide the program with specifics of the chemical system (e.g., charge, atomic positions), type of calculation desired (e.g., geometry optimization, Hessian matrix diagonalization), and specifics of the calculation (e.g., basis set, level of theory). Many keywords have specific options that can be enumerated on lines following the keyword forming a block which is ended with the line END. For more details on using ADF, refer to the documentation, including a user guide and input examples, available at http://www.scm.com. The RESPONSE keyword triggers the original implementation of TDDFT response theory by van Gisbergen et al.,16,22 which is capable of using symmetry. AORESPONSE triggers a more recently developed code32,33 that offers additional functionality, such as the near-resonance dynamic response capability,19,34 or enhanced analysis features,23,24 but lacks symmetry. Both blocks allow calculation of frequency-dependent polarizabilities, but the AORESPONSE block is needed to calculate the resonance Raman spectra. For the examples in this chapter the RESPONSE key was used to calculate hyperpolarizabilities from which hyperRaman spectra can be predicted. In our explanations of how to calculate Raman spectra, use of the AORESPONSE block will be assumed. Any specifics for calculating hyperRaman spectra will assume use of the RESPONSE block. In an upcoming version of the program the hyperRaman and resonance hyperRaman functionality will be combined with the AORESPONSE functionality. An example of an input (more example inputs can be found in the supporting information) to calculate the static polarizability tensor of a displaced structure of pyridine using the AORESPONSE block is as follows: BASIS C /share/apps/adf2007.01/atomicdata/ET/DIFFUSE/ET-QZ3P-polar/C H /share/apps/adf2007.01/atomicdata/ET/DIFFUSE/ET-QZ3P-polar/H N /share/apps/adf2007.01/atomicdata/ET/DIFFUSE/ET-QZ3P-polar/N END DEPENDENCY XC model SAOP END ATOMS N C C C C C

0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

0.000002 -0.000002 1.197479 -1.197480 1.141525 -1.141523

0.043787 2.855245 2.143113 2.143111 0.748759 0.748756

498 H H H H H END

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

0.000000 0.000000 0.000000 0.000000 0.000000

SYMMETRY

2.064563 -2.064560 2.159648 -2.159650 -0.000003

0.165373 0.165368 2.654667 2.654661 3.944999

NOSYM

AORESPONSE ALDA END END INPUT

The real parts of the nine Cartesian components of the frequency-dependent polarizability tensor of an input structure are calculated if the AORESPONSE block is included in the input. When an external potential νext,i (r, t) = Eri cos ωt is applied to a molecule, the components of the polarizability, αij (ω), can be determined from the change in the electron density using (15.1) αij (ω) = − d 3 r ρi (1) (r, ω)rj where i and j are the Cartesian directions and ρi (1) (r, ω) is the linear change in density (linear response) due to the external potential.35 In TDDFT, the density change is found using the linear response function of the noninteracting Kohn–Sham system χs (r, r , ω) and the linear change in the effective potential ν(1) eff (r, ω) with the relation35 (1) ρ (r, ω) = dr χs (r, r , ω)ν(1) (15.2) eff (r , ω) where for the potential given above, the external field part of the linear perturbation operator (ν(1) ext below) is obtained through division by the field amplitude. In the absence of finite-lifetime or other damping terms, the expression for the Kohn–Sham response function, constructed from the occupied and virtual Kohn–Sham orbitals (φ), energies (ε), and occupation numbers (n), is35 χs (r, r , ω) =

occ. virt.

ni φi (r)φm (r)φm (r )φi (r )

m

i

1 1 + × (εi − εm ) + ω (εi − εm ) − ω

(15.3)

499

CALCULATION OF POLARIZABILITIES USING TDDFT

When adopting the finite lifetime damping technique, the frequencies are formally substituted for by ω → ω + iγ, where γ is a common damping parameter, and thus the response function as well as the linear density response become complex. This allows calculation of both the real and imaginary parts of the polarizability. The change in effective potential is35 ν(1) eff (r , ω)

=

ν(1) ext (r , ω)

+

ρ(1) (r , ω) dr |r − r |

+

dr fxc (r, r , ω)ρ(1) (r , ω) (15.4)

and contains terms for the external field as noted above, the linear response of the Coulomb potential, and the linear response of the exchange-correlation potential; fxc is called the exchange-correlation kernel. The change in the effective potential is constructed in such a way that it will result in the correct change in density for the fully interacting system even though the noninteracting response function is being used, assuming that one would know the exact expression for the XC kernel. Of course, in practice, this is the term that gets approximated. In most cases an adiabatic approximation is used (i.e., one uses a frequency-independent fxc , which neglects all memory effects). With the adiabatic approximation, XC kernels can be obtained simply by taking functional derivatives of the XC potential used for the ground-state calculation, based on popular functionals such as VWN, LYP, BP86, B3LYP, and PBE0. It is particularly efficient to use an XC kernel based on a local-density approximation (LDA) such as the VWN or Xalpha functional (ALDA keyword in AORESPONSE block, default in RESPONSE). Used in the examples, the adiabatic LDA (ALDA) exchange correlation kernel fxc is local in space and time.35 With a hybrid functional the kernel contains some nonlocal Hartree–Fock exchange. An implementation based on ADF’s Slater-type basis and density-fitting approach has been reported by Ye et al.23 The last two terms in the expression for the change in effective potential are dependent on the change in the density. Calculation of the density change must therefore be done in a self-consistent manner. The initial density change is cal(1) culated using ν(1) eff = νext . Then the new effective field is determined using the updated density change. A new density change is calculated using the new effective field and the cycle continues until the change in the density change is below a set threshold. As in other self-consistent field codes, the iterations incorporate procedures to accelerate and stabilize the solution such that convergence is virtually guaranteed.36,37 The number of iterations and the convergence threshold can be set in the SCF block. With the change in density converged, the polarizability components are calculated. Similar procedures are adopted for calculating electric hyperpolarizabilities; see articles by van Gisbergen et al.16,22 and Ye et al.23,24 for further details regarding implementations in the ADF package and benchmark data. To calculate Raman spectra for nonresonant molecules where the frequency dependence of the polarizability derivatives is weak, it is often sufficient to calculate the static polarizabilities (polarizabilities at zero frequency: ω = 0) and

500

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

include a nonzero frequency only when calculating cross sections.38,39 (There are prefactors in the cross sections that cause the cross sections to be zero at zero frequency, so a finite frequency estimate of the cross sections requires inputting a frequency other than zero.) If the frequency-dependent polarizabilities are desired, the FREQUENCY keyword can be put into AORESPONSE, followed by a list of frequencies and the units (EV, HARTREE, or ANGSTROM) used for the frequencies. The necessary components of the static hyperpolarizability are obtained by adding the ALLCOMPONENTS and HYPERPOL keywords to the RESPONSE block. To obtain frequency-dependent hyperpolarizabilities, the DYNAHYP keyword must also be added to the “block” and the frequency in hartrees must be specified after the HYPERPOL keyword. Use of the DEPENDENCY keyword in the input as well as specifying SYMMETRY NOSYM is suggested for both types of calculations.

15.4 DERIVATIVES OF THE POLARIZABILITIES WITH RESPECT TO NORMAL MODES

With either the static or frequency-dependent polarizabilities at hand for both the plus and minus structures for a normal mode, the polarizability derivatives can be calculated using the quotient of the change in polarizability and twice the normal-mode step size. This step size, sQk , is different from the step size sR used earlier to make the displaced structures, and must be calculated separately for each normal mode. Note that this is not contained in ADF. The two step sizes are related by the equation

sQk

⎛ ⎞2 3N Ri sR ⎜ ⎟ = sR ⎝ ⎠ = norm √ R /Qnorm 3N 2 i i (Ri mi )

(15.5)

where mi is the mass of the atom being displaced by Ri , and Qnorm is the square root of the sum of the squares of the mass-weighted displacements.31 The coordinates were displaced both backward and forward along the vibration, so the change in the polarizabilities must be divided by twice the sQk step size. The polarizability derivatives, αij are therefore given as αij =

αij (plus) − αij (minus) 2sQk

(15.6)

Polarizabilities in ADF are reported as polarizability volumes in atomic units and so have units of cubic bohr. By calculating sQk using the displacements in bohr and the masses in atomic mass units, the polarizability derivatives will have units of square bohr per square root of amu. Hyperpolarizabilities are also given in atomic units (quintic bohr per electron charge), which can be converted to quintic angstroms per electrostatic unit and then to quartic angstroms per statvolt.40 The

ORIENTATION AVERAGING

501

components of the polarizability are also given with respect to a molecule fixed coordinate frame. Results for the specific components will therefore vary if the molecule coordinates are transformed with respect to this frame. Although there are times when the molecular orientation is important, most manipulations to produce spectra are invariant to orientation as they involve orientation averaging.

15.5 ORIENTATION AVERAGING

Certain combinations of the polarizability derivatives will give values that accurately predict the relative Raman peak intensities. When trying to reproduce spectra of systems that sample over all orientations of the molecule, the intensity of Raman scattered light will be IRaman =

ω4 2 I0 α˜ ij (ω, Q) c4

(15.7)

ij

where ω is the frequency of the scattered light. The tilde denotes that the components of the polarizability derivatives are defined relative to a space fixedcoordinate system, and the brackets denote that the value within is orientation averaged. For hyperRaman scattering, the expression for the intensity is IhyperRaman =

8πω4 ˜ 2 βijj (ω, Q) I0 c4

(15.8)

If a common experimental setup is assumed where the scattering observed is 90◦ relative to the direction of the incident light and the scattered beam polarization is not resolved, the expression for the Raman intensity for a normal mode k becomes41 ω4 7 IkRaman = 4 I0 ak2 + γk2 (15.9) c 45 The value 45ak2 + 7γk2 is called the Raman scattering factor, Sk , and is dependent on the polarizability derivatives through ak , the trace, and γk , the anisotropy, of the polarizability derivatives. The trace and anisotropy in terms of the polarizability derivatives in the molecule fixed-coordinate system are31 ak = 13 [(αxx )k + (αyy )k + (αzz )k ] γk2 = 12 [(αxx )k − (αyy )k ]2 + [(αyy )k − (αzz )k ]2 + [(αzz )k − (αxx )k ]2 + 6[(αxy )2k + (αyz )2k + (αzx )2k ] (15.10) Raman scattering factors are generally reported in quartic angstroms per atomic mass unit.

502

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

Under the same experimental conditions as those outlined for Raman scattering, the hyperRaman intensity expression becomes IhyperRaman =

8πω4 2 ˜ 2 I (β + β˜ 2 ijj ) c5 0 iii

(15.11)

9 ˜ 2 In terms of the molecule fixed hyperpolarizability derivatives, β˜ 2 iii and βijj are

β˜ 2 iii =

1 2 4 2 2 4 βiii + βiij + βiii βijj + βjii βiij 7 35 35 35 i=j

i

+

i=j

4 1 2 4 βiii βjji + βjii + βiij βjkk 35 35 105 i=j

i=j

i=j =k

+

1 4 βjii βjkk + β β 105 i=j =k 105 i=j =k iij kkj

+

2 2 4 βijk + βijk βjik 105 105 i=j =k

β˜ 2 iii =

i=j

(15.12)

i=j =k

1 2 4 4 8 2 βiii + βiii βiij − βiii βjji + βiij 35 105 70 105 i=j

i

+

i=j

i=j

3 2 4 1 βijj − βiij βjii + βijj βikk 35 70 35 i=j

i=j

i=j =k

−

4 4 βiik βjjk − β β 210 i=j =k 210 i=j =k iij jkk

+

2 2 4 βijk − βijk βjik 35 210 i=j =k

(15.13)

i=j =k

15.6 DIFFERENTIAL CROSS SECTIONS

Although Raman scattering factors will give a good idea of relative intensities, it is the differential cross sections that are directly comparable to experimental measurements. The frequency of the incident light is part of the expression of the differential cross section which allows normal Raman spectra for a specific wavelength of incident light to be calculated even while using static polarizabilities.39 This approach should give reasonable estimates for any off-resonance situation as long as the dispersion of the polarizability is relatively small. The computational effort to calculate the scattering factors using dynamic polarizabilities is higher, but is recommended for improved accuracy.

DIFFERENTIAL CROSS SECTIONS

503

For the Q branch in an experiment where the scattering angle is 90◦ and the incident light is perpendicularly plane polarized with respect to the scattering plane, the differential cross section is31,39 dσ Sk h 1 (˜νin − ν˜ k )4 = 2 d 45 1 − exp(−hc˜νk /kB T ) 8ε0 c˜νk

(15.14)

where ν˜ in is the frequency of the incident light and ν˜ k is the frequency of the kth normal mode, both in wavenumbers. If the Raman scattering factors in quartic angstroms per atomic mass unit are converted to C2 · m2 /V2 · kg using a factor of 1/4πε0 along with the appropriate length and mass conversions, the differential cross section can be made to have units of cm2 /sr (sr is the abbreviation for steradians). These are the standard units for reporting Raman scattering differential cross sections. Example 1: Raman Spectra of Pyridine and Pyridine on a Silver Cluster As an example of the results that can be expected using the method described above, simulated Raman spectra for pyridine and pyridine on the surface of a tetrahedral 20-silver-atom cluster will be shown. The orientationally averaged off-resonance spectra calculated are referred to as normal or bulk Raman spectra, and are comparable to those obtained in experiments performed on solutions of the species modeled. Geometry optimization and normal-mode frequency calculations were performed using the PW91 functional and a polarized triple-zeta Slater-type basis (TZP) for all atoms. Relativistic effects, which have been shown to be important in the modeling of optical properties of silver clusters,42 are included with the use of the zeroth-order regular approximation (ZORA)43,44 in its spin-free (scalar relativistic) version. An extension of AORESPONSE to include spin-orbit coupling has also been developed recently,45 but for an Ag cluster, such effects can be considered negligible. The normal-mode frequencies calculated were compared to those from experiment to ensure decent agreement. Normal-mode frequencies and atomic coordinates for the optimized geometries are available in the supporting information. Polarizability calculations used an asymptotically correct XC potential, SAOP,46 and the larger ET-QZ3P-polar basis set for the carbon, hydrogen, and nitrogen atoms (still using TZP for the silver atoms). Use of the SAOP model potential gives the correct long-distance behavior, which is important for obtaining accurate polarizabilities (although for the systems at hand, BP86 and TZP give similar results) and even more so for hyperpolarizabilities.47 The normal Raman spectrum for pyridine, calculated from static polarizabilities and using an incident wavelength of 514.5 nm in the equation for the cross section, is shown in Fig. 15.1 (the differential cross section is given in units of 10−30 cm2 sr−1 and wavenumbers are given in cm−1 ). The stick spectrum (note: it has been scaled) obtained from calculation of intensities at each normal-mode frequency is overlaid by the spectrum where each peak has been convoluted with a Lorentzian with a width of 20 cm−1 . Peaks and intensities seen in the experimental spectrum48,49 are reproduced well by the calculations. The minor

504

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA 0.12 1026 dσ/dΩ (10–30cm2/sr)

0.1 0.08 983

0.06

1500 1300 1100 900

700

0.04 0.02

1581 1472

0

1500

1209 1146 1300

651

1100

900

700

599 500

300

Wavenumber (cm–1) 0.16 1026

dσ/dΩ (10–30cm2/sr)

0.14 0.12 0.10

982 0.08 0.06 0.04 0.02 0.00

1580 1472 1500

1208 1146 1300

1100

651

900

Wavenumber

700

599 500

300

(cm–1

)

Fig. 15.1 (color online) Simulated normal Raman spectrum of pyridine at an incident wavelength of 514.5 nm using static (top) and frequency-dependent (bottom) polarizability derivatives. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum. Inset: Experimental spectrum from Golab et al.48

peaks, however, are relatively more intense and the intensity ordering of the peaks at 983 and 1026 cm−1 is the opposite of what is seen experimentally. Adding a tetrahedral 20-silver-atom cluster allows the investigation of phenomena such as the chemical enhancements observed in SERS.39 Though the pyridine–Ag20 system has a large number of normal modes, only those in the range 300 to 1600 cm−1 , which correspond primarily to motions of the atoms in pyridine, are of interest. Figure 15.2 shows the optimized pyridine–Ag20 complex geometry (where the pyridine is perpendicular to a face of the cluster and binds through the N atom to the Ag atom at the center of the face) and the calculated normal Raman spectrum for the structure with the cross section

DIFFERENTIAL CROSS SECTIONS

505

Fig. 15.2 (color online) Optimized geometry and simulated normal Raman spectrum of the surface pyridine–Ag20 complex at an incident wavelength of 514.5 nm using static polarizability derivatives. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum.

once again assuming an excitation wavelength of 514.5 nm. Comparing the intensities of the peaks in this spectrum to those in the pyridine spectrum, the chemical enhancement is approximately one order of magnitude. These results are comparable to results presented by Zhao et al.,39 where it was also found that the corresponding spectra at wavelengths that are on-resonance for the Ag20 are enhanced by 105 or greater. This provides a model for understanding SERS.

Example 2: HyperRaman Spectrum of Pyridine Using the same geometry and frequencies for pyridine as in the normal Raman example, the hyperRaman spectra can also be simulated. The hyperpolarizability calculations at the displaced geometries were run with the SAOP model potential and an ET-QZ3Ppolar basis set for all atoms. The orientationally averaged hyperRaman spectrum is shown in Fig. 15.3 [intensities are given in angstrom6 /(amu · statvolt2 )]. The differential cross section is not calculated because the equation outlined is only applicable to Raman spectroscopy with a specific experimental setup.31,39 Although an effective excitation wavelength cannot be added into the spectrum, the relative intensities of the peaks should still be able to be compared to experimental spectra. In general, experimental hyperRaman spectra are rarely determined, due to the hyperRaman signal being even weaker than the already weak Raman signal. Luckily, for pyridine there are experimental measurements, which are matched rather well by the calculated spectrum.49 Not all the peaks calculated can be verified due to noise in the experiment, but the relative intensities of those that are observed matches well.

506

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

Fig. 15.3 (color online) Simulated normal hyperRaman spectrum of pyridine using static hyperpolarizability derivatives. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum. Inset: Experimental spectrum from Neddersen et al.49

15.7 SURFACE-ENHANCED RAMAN AND HYPERRAMAN SPECTRA

The previous discussion is applicable to the Raman and hyperRaman spectra of any system where orientation averaging applies. The intensities are generally small, but they can be greatly enhanced by placing the molecules on a surface. Molecules adsorbed to a surface are generally restricted to a finite set of orientations relative to the surface, so the expressions based on orientation averaging no longer apply. If a specific orientation to the surface is assumed, the Raman intensities are proportional to the polarizability component perpendicular to the surface, as the plasmon-enhanced electromagnetic field near the surface is dominated by this component. For calculations which assume that the z-direction is 2 normal to the surface, α2 zz (βzzz for hyperRaman intensities) will give the relative peak intensities. As the interest is only in one of the components of the polarizability tensor of the molecule, the orientation of the molecule in the input becomes important. For example, to calculate the surface-enhanced Raman spectrum of a molecule standing straight up on a surface, the molecule should be appropriately oriented along the z-axis (as determined by its adsorption behavior) in all of the inputs. Also, the frequency calculation and polarizability calculations at the displaced coordinates would be performed as for a normal Raman calculation. The difference is that it is only necessary to calculate the polarizability derivative for the αzz component.

APPLICATION OF TENSOR ROTATIONS TO RAMAN SPECTRA

507

15.8 APPLICATION OF TENSOR ROTATIONS TO RAMAN SPECTRA FOR SPECIFIC SURFACE ORIENTATIONS

In cases where the molecular orientation is uncertain, the comparison of simulated spectra with experiment can be used to infer the correct orientation. In this case the complete polarizability tensor needs to be determined for an arbitrary orientation, and then the polarizability is rotated to the desired orientation. A second-order tensor (the polarizability tensor [αlm ]) or third-order tensor (the hyperpolarizability tensor [βijk ]) tensor can be rotated into a new coordinate frame by applying a rotation matrix [R] and its inverse. The tensor in the new coordinate frame is given by [α∗ij ] = [R][αlm ][R]−1

(15.15)

Here R is an orthogonal matrix ([R]−1 = [R]T ) whose components ril are the cosines of the angle between the ith axis of the original coordinate frame and the lth axis of the target coordinate frame: ril = cos(i, l)

(15.16)

For surface-enhanced Raman, only the perpendicular component of the polarizability tensor is of interest. This can easily be calculated using the formula αij ∗ =

ril rjm αlm

(15.17)

ril rjm rkn βlmn

(15.18)

lm

for polarizabilities, and βijk ∗ =

lmn

for hyperpolarizabilities. Of course, this work can be avoided completely if the molecular structure is defined in coordinates where one axis is along the surface normal. Example 3: Surface-Enhanced Raman Spectrum of Pyridine If a normal Raman spectrum has already been calculated for the molecule of interest, it takes only minor modifications to obtain a surface-enhanced Raman spectrum. For the example molecule pyridine, the results of the polarizability calculations from the pyridine normal Raman example will be used. To model the surfaceenhanced spectrum using only the polarizability derivatives of the molecule (so that plasmon enhancement effects are left out), an orientation relative to a fictional surface must be assumed. For pyridine, it will be assumed that the nitrogen atom binds to the surface and that the molecule stands straight up. This orientation places the C2 -axis of pyridine along the surface normal.

508

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

Fig. 15.4 (color online) SERS spectrum of pyridine standing straight up on a fictional surface. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum. Inset: Experimental spectrum from Golab et al.48

The equilibrium structure of pyridine used for the polarizability calculations has its C2 -axis along the z-axis. This means that the surface normal is along the z-axis in the calculations and that the squares of the derivatives of the αzz components of the polarizabilities will be proportional to the experimental SERS intensities. The SERS spectrum for pyridine obtained in this manner is shown in Fig. 15.4 (intensities are given in angstrom4 /amu). Once again, the differential cross-section equation does not apply to what is being modeled. The surface is fictional, so only the relative intensities have any real significance. The calculated spectrum compares well with experimental data,48 except for the peak at 1026 cm−1 , which should be only slightly more intense than the peaks at 1581 and 1209 cm−1 . While the correct intensity ordering is not observed, the peak at 983 cm−1 does increase in intensity relative to the peak at 1026 cm−1 going from the nonresonant Raman spectrum to the SERS spectrum, which is seen experimentally.48 It may be possible that the differences observed are occurring because the orientation of pyridine relative to the surface is not what has been assumed in the calculations. A rigorous study would consider other orientations and possibly average over a range of orientations to see if better agreement can be achieved.

15.9 RESONANCE RAMAN

Another phenomenon used to increase Raman intensities in experiments is the resonance Raman effect. Resonance Raman involves using incident light with an energy that matches the energy needed to put the molecule in an electronically

DETERMINATION OF RESONANT WAVELENGTH

509

excited state.50 In the expression for the Kohn–Sham response function, this would mean ω = εi − εm , which leads to division by zero in the response function described above.19 The zero occurs because it was assumed that the excited state has an infinite lifetime. However, the excited states of molecules in a condensed phase always have a significant width, due to dephasing of the excited state through interaction with the environment. The AORESPONSE functionality in ADF allows calculation of polarizabilities at resonant wavelengths by adding in an effective lifetime by way of a damping parameter in the response function. This is not a perfect fix, though, because it assumes that all excited states have the same lifetime, which is generally not true. Damping parameters are best obtained by fitting experimental absorption data for the molecule of interest.19 If there are no available data, it is possible to use the value for a similar molecule if the short-time approximation is valid. A value of 0.004 atomic unit (0.1 eV) has been found to be reasonable for many large organics, as well as pyridine interacting with silver clusters.39 In the AORESPONSE block, the keyword LIFETIME followed by the lifetime in atomic units will tell the program to account for the excited-state lifetime provided. With a lifetime specified, ADF will be able to calculate both the real and imaginary parts of the polarizability. The imaginary polarizabilities should be treated like their real counterparts until the scattering factors are calculated. At that point, the real and imaginary scattering factors can be summed to give the total scattering factor. 15.10 DETERMINATION OF RESONANT WAVELENGTH

Using the AORESPONSE lifetime functionality, it is possible to calculate polarizabilities for the displaced structures at resonant wavelengths, but it is important to have an idea of where the resonance is located before doing the calculations. Experimental resonance Raman literature or absorption maximum data for the system provide a good place to start. Using the optimized geometry for the system, polarizability calculations should then be run for a range of incident light frequencies close to where the resonant frequency is believed to be. The polarizability calculations should also be using the finite lifetime that was found to be appropriate for the system. The absorption maximum for the system occurs where the imaginary polarizability has its maximum and is an appropriate frequency to choose for the resonance Raman calculations.39 Of course, another way to determine the excitation energies of the system for a given combination of basis set, XC potential, and XC kernel would simply be to run a calculation of the excitation spectrum using TDDFT. This can be accomplished using the EXCITATIONS keyword in ADF. The equivalence of the two approaches, Im[α] versus TDDFT excitation spectra, was demonstrated explicitly by Jensen et al.,19 Devarajan et al.,45 and Krykunov et al.51 for the closely related case of optical rotatory dispersion versus TDDFT circular dichroism spectra. Example 4: Resonance Raman Spectrum of Uracil To detail the steps necessary to calculate a resonance Raman spectrum, the molecule uracil will be used

510

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

as an example. An excitation in uracil that can be used to study RRS corresponds to the lowest-energy π → π∗ transition.20 This excitation is found experimentally at 5.08 eV (244 nm) in the gas phase and 4.77 eV (260 nm) in the aqueous phase.52 To discern what excitation energy to use in the calculation of resonance polarizabilities, the real and imaginary polarizabilities of the equilibrium geometry were calculated at discrete points between incident light wavelengths of 240 and 280 nm. For all the calculations in this example, a value of = 0.004 a.u. was chosen for the damping parameter, the BP86 functional was used, and all atoms were treated with a TZP basis set. The real and imaginary polarizabilities as a function of the wavelength of the incident light are shown in Fig. 15.5. A maximum is seen in the imaginary polarizability of the system at 263 nm. For the polarizability derivative calculations, it is reasonable to use 263 nm for the incident light wavelength in the input to the displaced geometry calculations, or a nearby wavelength that was used in experiments. Using an incident light wavelength of 263 nm, the spectrum displayed in Fig. 15.6 can be obtained. The spectrum assumes an average over all molecular orientations, and the stick spectrum has been broadened by a Lorentzian as in the pyridine nonresonant Raman example. Close agreement with experiment is seen except for the peak at 1737 cm−1 , which is much too intense in the calculations, and the peaks at 1448 and 1353 cm−1 which are seen as a single peak at 1401 cm−1 in experiments. The second issue appears to be due to solvent effects since adding two water molecules to the calculations shifts the two peaks together around 1400 cm−1 .20 This does not, however, correct the peak at 1737 cm−1 . This error probably arises due to Fermi resonance (not included in the calculations) between the

Fig. 15.5 Real (squares) and imaginary (circles) polarizabilities of uracil as a function of the wavelength of incident light between 240 and 280 nm.

SUMMARY

511

Fig. 15.6 (color online) Simulated resonance Raman spectrum of uracil at an incident wavelength of 263 nm. The spectrum is broadened by a Lorentzian with a full width at half-maximum of 20 cm−1 . The scale is for a broadened spectrum. Inset: Experimental spectrum from Jensen et al.20

C—O and N—H bending modes.53,54 Fermi resonances and overtones are not accounted for in the harmonic approximation that has been made in the calculation of the vibrations.20 Raman spectra calculated for molecules in which these processes play a visible role will not accurately reproduce all the peak intensities.

15.11 SUMMARY

In this chapter we have provided a detailed discussion of the calculation of Raman and hyperRaman spectra for large molecules and molecules interacting with metal clusters using the ADF computer program and time-dependent density functional theory. Both static- and frequency-dependent Raman spectra are considered, and the frequency-dependent spectra include the possibility of excitation on resonance through the input of an empirical width factor in the resonant optical response. In addition, we describe the calculation of spectra for specific molecular orientations and an average over orientations. Specific examples are presented for pyridine in vacuum, for pyridine interacting with a silver cluster, and for pyridine oriented on a fictitious surface to mimic orientation effects that can occur in SERS. In addition, we examined the resonance polarizability and resonance Raman spectrum of uracil as an example of a resonance Raman calculation.

512

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

Although these examples reveal important capabilities that are now available using TDDFT, there remain important limitations in the use of this method for large systems. The current technology can handled up to 100 to 200 atoms but becomes impractical for much larger systems. Even for 100 to 200 atoms, it can be quite challenging to calculate spectra for a lot of normal modes. In addition, the excited-state widths are purely empirical factors in the current version of the code and are assumed not to depend on the nature of the excited state. Finally, we note that the models of SERS which replace the metal particles by silver clusters that have less than 100 atoms make important approximations whose validity is still uncertain. Plasmon resonances are size dependent for small clusters, so the resonance wavelengths do not match the observations, and the cluster-size dependence of the widths is unknown. In addition, the behavior of the electromagnetic fields around the cluster are unlikely to match the fields associated with large particles, so the field enhancements that lead to SERS are not likely to be described accurately. Supporting Information

Supporting information including atomic coordinates and vibrational frequencies for all example species may be found on the book Web site. Acknowledgments

This research was supported by AFOSR/DARPA project BAA07-61 (FA955008-1-0221) and the National Science Foundation Network for Computational Nanotechnology. We thank our many collaborators, including Stephen Gray, Richard Van Duyne, Chad Mirkin, and Teri Odom.

REFERENCES 1. Camden, J. P.; Dieringer, J. A.; Zhao, J.; Van Duyne, R. P. Acc. Chem. Res. 2008, 41 , 1653. 2. LaFratta, C. N.; Walt, D. R. Chem. Rev . 2008, 108 , 614. 3. Jain, P. K.; Huang, X.; El-Sayed, I. H.; El-Sayad, M. A. Plasmonics 2007, 2 , 107. 4. Lal, S.; Link, S.; Halas, N. J. Nat. Photon. 2007, 1 , 641. 5. Murphy, C. J.; Gole, A. M.; Hunyadi, S. E.; Stone, J. W.; Sisco, P. N.; Alkilany, A.; Kinard, B. E.; Hankins, P. Chem. Commun. 2008, 544. 6. Willets, K. A.; Van Duyne, R. P. Annu. Rev. Phys. Chem. 2007, 58 , 267. 7. Kneipp, J.; Kneipp, H.; Kneipp, K. Chem. Soc. Rev . 2008, 37 , 1052. 8. Kelley, A. M. J. Phys. Chem. A 2008, 112 , 11975. 9. Yang, W. H.; Schatz, G. C. J. Chem. Phys. 1992, 97 , 3831. 10. Yang, W.-H.; Hulteen, J.; Schatz, G. C.; Van Duyne, R. P. J. Chem. Phys. 1996, 104 , 4313. 11. Jeanmaire, D. L.; Van Duyne, R. P. J. Electroanal. Chem. 1977, 84 , 1.

REFERENCES

12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.

37. 38. 39. 40. 41. 42. 43. 44.

513

Hulteen, J. C.; Young, M. A.; Van Duyne, R. P. Langmuir 2006, 22 , 10354. Schatz, G. C. Acc. Chem. Res. 1984, 17 , 370. Moskovits, M. Rev. Mod. Phys. 1985, 57 , 783. Schatz, G. C.; Ratner, M. A. Quantum Mechanics in Chemistry, Dover, Mineola, NY, 2002. van Gisbergen, S. J. A.; Snijders, J. G.; Baerends, E. J. Comput. Phys. Commun. 1999, 118 , 119. Velde, G. T.; Bickelhaupt, F. M.; Baerends, E. J.; Guerra, C. F.; Van Gisbergen, S. J. A.; Snijders, J. G.; Ziegler, T. J. Comput. Chem. 2001, 22 , 931. ADF2008.01; SCM: Theoretical Chemistry, Vrije Universiteit, Amsterdam, http://www.scm.com, click on “Theoretical Chemistry.” Jensen, L.; Autschbach, J.; Schatz, G. C. J. Chem. Phys. 2005, 122 , 224115/1. Jensen, L.; Zhao, L. L.; Autschbach, J.; Schatz, G. C. J. Chem. Phys. 2005, 123 , 174110/1. Mort, B. C.; Autschbach, J. J. Phys. Chem. A 2006, 110 , 11381. van Gisbergen, S.; Snijders, J. G.; Baerends, E. J. J. Chem. Phys. 1998, 109 , 10644. Ye, A.; Patchkovskii, S.; Autschbach, J. J. Chem. Phys. 2007, 127 , 074104. Ye, A.; Autschbach, J. J. Chem. Phys. 2006, 125 , 234101. Jensen, L.; Aikens, C. M.; Schatz, G. C. Chem. Soc. Rev . 2008, 37 , 1061. Jensen, L.; Zhao, L. L.; Schatz, G. C. J. Phys. Chem. C 2007, 111 , 4756. Aikens, C. M.; Schatz, G. C. J. Phys. Chem. A 2006, 110 , 13317. Masiello, D. J.; Schatz, G. C. Phys. Rev. A 2008, 78 , 042505/1. Bernath, P. F. Spectra of Atoms and Molecules, 2nd ed., Oxford University Press, New York, 2005. Mort, B. C.; Autschbach, J. J. Phys. Chem. A 2005, 109 , 8617. Reiher, M.; Neugebauer, J.; Hess, B. A. Z. Phys. Chem. 2003, 217 , 91. Krykunov, M.; Autschbach, J. J. Chem. Phys. 2005, 123 , 114103. Krykunov, M.; Autschbach, J. J. Chem. Phys. 2007, 126 , 024101. Autschbach, J.; Jensen, L.; Schatz, G. C.; Tse, Y. C. E.; Krykunov, M. J. Phys. Chem. A 2006, 110 , 2461. van Gisbergen, S. J. A.; Snijders, J. G.; Baerends, E. J. J. Chem. Phys. 1995, 103 , 9347. Pulay, P. Analytical derivative techniques and the calculation of vibrational spectra. In Modern Electronic Structure Theory, Part II, Vol. 2, Yarkony, D. R., Ed., World Scientific, Singapore, 1995, p. 1191. Pople, J. A.; Raghavachari, K.; Schlegel, H. B.; Binkley, J. S. Int. J. Quantum Chem. 1979, S13 , 225. Neugebauer, J.; Reiher, M.; Kind, C.; Hess, B. A. J. Comput. Chem. 2002, 23 , 895. Zhao, L.; Jensen, L.; Schatz, G. C. J. Am. Chem. Soc. 2006, 128 , 2911. Kanis, D. R.; Ratner, M. A.; Marks, T. J. Chem. Rev . 1994, 94 , 195. Califano, S. Vibrational States, Wiley, New York, 1976. Aikens, C. M.; Li, S. Z.; Schatz, G. C. J. Phys. Chem. C 2008, 112 , 11272. van Lenthe, E.; Baerends, E. J.; Snijders, J. G. J. Chem. Phys. 1993, 99 , 4597. van Lenthe, E.; Baerends, E. J.; Snijders, J. G. J. Chem. Phys. 1994, 101 , 9783.

514

CALCULATING THE RAMAN AND HYPERRAMAN SPECTRA

45. Devarajan, A.; Gaenko, A.; Autschbach, J. J. Chem. Phys. 2009, 130 , 194102. 46. Gritsenko, O. V.; Schipper, P. R. T.; Baerends, E. J. Chem. Phys. Lett. 1999, 302 , 199. 47. Schipper, P. R. T.; Gritsenko, O. V.; van Gisbergen, S. J. A.; Baerends, E. J. J. Chem. Phys. 2000, 112 , 1344. 48. Golab, J. T.; Sprague, J. R.; Carron, K. T.; Schatz, G. C.; Van Duyne, R. P. J. Chem. Phys. 1988, 88 , 7942. 49. Neddersen, J. P.; Mounter, S. A.; Bostick, J. M.; Johnson, C. K. J. Chem. Phys. 1989, 90 , 4719. 50. Albrecht, A. C. J. Chem. Phys. 1961, 34 , 1476. 51. Krykunov, M.; Kundrat, M. D.; Autschbach, J. J. Chem. Phys. 2006, 125 , 194110. 52. Clark, L. B.; Peschel, G. G.; Tinoco, I. J. Phys. Chem. 1965, 69 , 3615. 53. Peticolas, W. L.; Rush, T. J. Comput. Chem. 1995, 16 , 1261. 54. Szczesniak, M.; Nowak, M. J.; Rostkowska, H.; Szczepaniak, K.; Person, W. B.; Shugar, D. J. Am. Chem. Soc. 1983, 105 , 5969.

16

Metal Surfaces and Interfaces: Properties from Density Functional Theory IRENE YAROVSKY, MICHELLE J. S. SPENCER, and IAN K. SNOOK Applied Physics, School of Applied Sciences, RMIT University, Victoria, Australia

In this chapter we describe comprehensive theoretical studies of metallic surfaces and interfaces using density functional theory (DFT) calculations. First, we provide a general introduction and background, then describe the methodology used and validation studies performed. Calculations performed on Fe(100), Fe(110), and Fe(111) surfaces to investigate their structure, energetics, electronic, magnetic, and adsorption properties are then discussed. Interfaces between these surfaces and, specifically, adhesion and the associated electronic and magnetic properties are then presented. Adhesion is studied between the surfaces in match (in registry) and mismatch (out of registry), ideal and relaxed, and clean and sulfur-contaminated states. Finally, we provide summaries, conclusions, and suggestions for future work. 16.1 BACKGROUND, GOALS, AND OUTLINE

Iron surfaces have been of interest to both pure and applied sciences since the Iron Age. Despite their crucial importance for many industries,1 – 3 from crude heavy industry to refined electronics, there is a gap in the fundamental understanding of many important properties of iron surfaces, such as magnetic properties and adhesion, which may slow their application in new and innovative technologies. This gap in understanding arises partly because of the inherent difficulty of studying the material both experimentally, due to its high susceptibility to corrosion,4 and theoretically due to its transition metal nature and hence complex electronic properties. Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

515

516

METAL SURFACES AND INTERFACES

Specifically, adhesion between metallic iron surfaces plays an important role in many industrial processes.5,6 For example, in the extraction of metallic iron (Fe) via the fluidized-bed iron ore reduction process of powdered ores, the process often suffers from the buildup of deposits, known as accretions, in various parts of the reactors, and component particles may strongly adhere, forming large clumps resulting in defluidization of the bed.7 As iron forms a major constituent of the accretions, a fundamental understanding of the mechanism by which the metal particles adhere, as well as identification of the species capable of preventing severe adhesion, is of vital importance. Previous investigations of Fe surfaces and interfaces have looked at a number of their properties, including structural and magnetic features. However, generally speaking, previous studies have not provided a systematic fundamental description, particularly of the dynamic properties associated with thermal and impurity-induced transformations and the effects of the material which are crucial for the ability to design and manipulate its properties at the macro- and nanoscale. Here we present an account of our theoretical work on Fe, which includes new results and those obtained previously. Specifically, after describing the methodology in Section 16.2, in Section 16.3.1 we review results on the computed relaxations and energies of the three low index surfaces—(100), (110), and (111)—of body-centered cubic (bcc) Fe and compare the computational results with experimental observations. In Section 16.3.2 we describe new results on the magnetic properties of the Fe(100), Fe(110), and Fe(111) surfaces, such as changes in the magnetic and electronic properties after relaxation and the layerresolved magnetic moment values, as well as up- and down-spin-resolved density of states. In Sections 16.3.3 and 16.3.4 we present results on the adsorption of atomic S on the atop, bridge, and hollow sites of Fe(100) and Fe(110) surfaces at 1/2 and 1/4 monolayer (ML) coverages. The most stable site, the effects of S adsorption on surface reconstruction, and magnetic and electronic properties are considered. A summary of the effect of higher S coverages on these properties is also presented. In Sections 16.3.5 and 16.3.6 we discuss our calculations on the dynamic behavior of S and H2 S adsorbed on Fe(100) and Fe(110), including ab initio molecular dynamics (AIMD) simulations to examine the effect of elevated temperatures. In Section 16.4.1 we review our studies on adhesion between clean, bulkterminated bcc Fe(100), Fe(110), and Fe(111) matched and mismatched interfaces. The parameters obtained from this work allowed the behavior of the work of separation (Wsep ) to be determined and examined. In Section 16.4.2 we examine newly obtained results on the relationship between magnetic and electronic properties and adhesion of the Fe(100), Fe(110), and Fe(111) surfaces in match and mismatch. In Section 16.4.3 we discuss the avalanche effect in adhesion between Fe(100) surfaces, in match and mismatch, where the role of model constraints has been focused on specifically. In Section 16.4.4 we give a brief summary of our study of the effect of adsorbed S on the adhesion of Fe(100) and Fe(110) surfaces in the atop, bridge, and hollow sites at 1/2 and 1/4 ML coverages

METHODOLOGY

517

in match and mismatch interfaces. The effect of adsorbed S on the charge-density distribution and magnetic properties of the interface are also examined and related to the interfacial geometry. Also discussed is the effect of relaxation of the interfaces and different coverages of S at the interface. We conclude this chapter with a summary and outline of future work in Section 16.5. 16.2 METHODOLOGY

Density functional theory (see Chapter 1) is a technique that can provide fundamental understanding of the structural, electronic, magnetic, and adhesion properties of materials and their surfaces and interfaces at the electronic level.8 – 16 Theoretically, it is possible to construct a model interfacial system of any two surfaces with any degree of lattice match or mismatch and to make arbitrary alterations to the surfaces: for example, to introduce atomic and molecular impurities and then systematically investigate their effects on the system’s properties. A range of atomic simulation methods, including DFT, have already been applied successfully to the investigation of various metallic and ceramic interfaces (e.g., MgO/Ag,17 Mo/MoSi2 , 18 NiAl/Cr,19 and Fe20,21 ) and on the effects of impurities (S, C, N, O, P, etc.) on adhesion between surfaces.18,22 – 29 A fairly comprehensive review of applications of various theoretical simulation techniques to study material interfaces can be found in the literature by Finnis30 and is beyond the scope of this publication. We have developed a number of methods using classical empirical potentials based on the embedded-atom method (EAM) to study Fe surfaces and interfaces31 – 34 ; the advantage of this approach is that it is significantly less computationally expensive and it is possible to estimate the system free energies for much larger models and hence simulate a wider variety of surface structures and defects. However, in this chapter we describe our investigations of the surface and interfaces using the DFT approach. 16.2.1 Choice and Validation of the Computational Method: Bulk Iron Studies

All calculations were performed using the Vienna ab initio simulation package (VASP),35 – 37 which performs fully self-consistent DFT calculations to solve the Kohn–Sham equations38 within the local spin density approximation (LSDA) using the functional of Perdew and Zunger39 (PZ) or the generalized-gradient spin approximation (GGSA), using the functional of Perdew and Wang40 (PW91). The electronic wavefunctions are expanded as linear combinations of plane waves (see Chapter 3), truncated to include only plane waves with kinetic energies below a prescribed cutoff energy, Ecut . Due to the delocalized nature of conduction electrons in metals, a delocalized plane-wave basis provides a good representation of metallic systems. The core electrons are replaced by ultrasoft pseudopotentials by Vanderbilt,41 and k -space sampling was performed using the scheme of Monkhorst and Pack.42

518

METAL SURFACES AND INTERFACES

TABLE 16.1 Calculated Structure and Properties of Bulk Fe, Using Both LSDA and GGSA Functionalsa Property ˚ Lattice parameter, a0 (A) Bulk modulus, B (GPa) Magnetic moment/atom (μB )

LSDA

GGSA

Experimental

2.767 (−3.5%) 195 (+16%) 1.98 (−11%)

2.869 (+0.11%) 140 (−16%) 2.37 (+6.8%)

2.866 168 2.22

Source: Ref. 20. a The percent deviation from known experimental values43 is shown in parentheses.

The bulk, surfaces, and interfaces of Fe are modeled using the supercell approach, where periodic boundary conditions are applied to the central supercell so that it is reproduced periodically throughout space. Tests were performed on the bulk bcc phase of Fe, using both LSDA and GGSA functionals as well as different Ecut and k -space sampling values, to ensure that the bulk properties were converged.20 The optimized bulk structure was then used to create different surface and interface models. The total energy, Etot , and lattice parameter, a0 , of bulk bcc Fe were calculated using different plane-wave cutoff energy values and k -point sampling sets to ensure the reliability of the calculations. It was found that an Ecut of 300 eV and k -point mesh of 12 × 12 × 12 gave convergence of Etot and a0 to ˚ respectively. The lattice parameter, bulk modulus, 10−4 eV/atom and 0.001 A, and magnetic moment values calculated using these converged parameters with both LSDA and GGSA functionals are presented in Table 16.1, along with the experimental values.43 The values calculated using GGSA were found to give better agreement with the known experimental values than those calculated with LSDA. In particular, the LSDA functional was shown to predict the face-centered-cubic (fcc) Fe phase to be more energetically stable at 0 K than the bcc phase, while the GGSA functional correctly predicted the order of stability, consistent with previous findings (see, e.g., Jansen and Peng44 ). 16.2.2 Surface and Interface Models

The relaxed-bulk bcc Fe cell (with the lattice parameter determined using the GGSA PW91 functional) was cut along the (100), (110), and (111) Miller planes to form the three low-index Fe surface models (see Fig. 16.1). These models also served as our interface models. Using the supercell approach, the interfacial separation distance was defined by the vacuum layer thickness between image cells adjacent to each other in the z -direction (Fig. 16.1). Interfaces were modeled in two different orientations corresponding to a perfect lattice match between the two surfaces (i.e., epitaxial interfaces) and maximum lattice mismatch (i.e., where surface atoms of the two surfaces share the same coordinates in the x,y-plane within the supercell). An even number of

METHODOLOGY

(100)

(110)

519

(111)

(a) match interfaces vacuum (interfacial) separation d

(b) mismatch interfaces d

Unit cell top view 2.866 Å

2.48 Å

4.057 Å

Fig. 16.1 (color online) Surface/interface models: (a) (100), (110), and (111) match interfaces; (b) (100), (110), and (111) mismatch interfaces. Profiles of the surface unit cells are displayed below each surface supercell model. The interfacial separation, d , is indicated.

atomic layers was used to model the match interfaces, while an odd number of layers was used to model the mismatch interfaces (Fig. 16.1). To determine the number of layers in the surface model required for convergence, the surface energies of the unrelaxed surfaces (Esurf ) were calculated as ˚ Esurf was a function of slab thickness, using a vacuum layer separation of 10 A. calculated using the expression: Esurf =

Etot (slab) − nEtot (bulk) 2A

(16.1)

where Etot (slab) and Etot (bulk) are the total energies of the slab and bulk, respectively; n is the number of bcc Fe unit cells present in the slab; and A is the cross-sectional surface area of the slab.

520

METAL SURFACES AND INTERFACES

All surface and interface calculations also used the PW91 functional and GGSA approach. Further specific computational details for each case are given in relevant sections, as appropriate. 16.2.3 Interfacial Adhesion: Work of Separation and UBER

Most of the calculations we report here (except those described in Section 16.4.3) have been performed for interfaces between ideal Fe surfaces; namely, we calculate the work of separation (Wsep ). The concept of the work of separation versus the work of adhesion has been introduced by Finnis30 and was discussed by us previously in detail.20 In terms of the surface and interfacial excess free energies of the materials, the ideal Wsep is given by the Dupre equation45 : Wsep = σ1 + σ2 − σ12

(16.2)

where σ1 and σ2 are ideal surface free energies of materials 1 and 2, and σ12 is the interfacial free energy. This quantity should be distinguished from the work of adhesion, which is defined as the energy required to separate two surfaces from the equilibrium separation to infinity, taking full account of all relaxation and diffusion processes. Wsep can be calculated directly from the molecular simulation of isolated surfaces and of these surfaces when brought into close contact to form an interface.30 By calculating the single-point energy at discrete separation distances, d , one can obtain an interaction energy curve Ead (d): Ead (d) =

E(d) − E(∞) A

(16.3)

where E (d ) is the total computed energy at separation distance d, E(∞) is the total energy at infinite separation, and A is the cross-sectional area of interaction. The well depth of this curve, E0 , is equivalent to the Wsep . The adhesion curves calculated can be fitted to the universal binding-energy relation (UBER),46 which is given by a Rydberg-type function adapted for the case of interfacial adhesion and is considered to give a valid representation of binding in situations where bonding results mainly from overlap of the tails of wavefunctions47 : Ead (d) = −E0 (1.0 + d ∗ )ed∗

(16.4)

(d) is the fitted adhesion interaction energy, d ∗ = (d − d0 )/ l (scaled where Ead distance), E0 is the depth of the adhesion energy well at equilibrium interfacial separation (equivalent to the work of separation Wsep ), d0 is the interfacial separation at the adhesion energy minimum, and l is the scale factor, which for transition metals may be interpreted as the surface scaling length, and sets the approximate scale for the distance over which electronic forces can act. The value of E0 represents the work of separation for a particular interface.

STRUCTURE AND PROPERTIES OF IRON SURFACES

521

16.2.4 Calculation of Binding Energies for Surface Impurities

We have computed the binding energies of sulfur impurity adsorbed in various adsorption sites by the equation S(g) + Fe(s) → S · Fe(s)

(16.5)

The binding energy is the difference in total energy of the products minus the reactants: BE = Etot (products) − Etot (reactants) = Etot (S · Fe) − [Etot (S) + Etot (Fe)]

(16.6)

where Etot (S) is the total energy of an isolated S atom and Etot (S · Fe) and Etot (Fe) are the total energies of the relaxed clean Fe surface and S-adsorbed Fe(110) surface, respectively. 16.3 STRUCTURE AND PROPERTIES OF IRON SURFACES 16.3.1 Structural Relaxation and Stability of Fe(100), Fe(110), and Fe(111) Surfaces 16.3.1.1 Introduction and Previous Studies Relaxation of metal surfaces after cleavage from the bulk is a well-known phenomenon. The reduction in atomic interactions perpendicular to the surface can cause the topmost surface layers to contract toward the bulk or expand away from it. In addition, movements of the surface atoms within the plane of the surface can lead to surface reconstructions. Some previous theoretical findings have differed from those obtained experimentally.48 For the low-index surfaces of Fe in particular, there is also some conflict; however, it has been shown that the surfaces do not reconstruct, while they do relax.49 – 51 We have already reviewed the findings of the experimental studies,52 which used low–energy electron diffraction (LEED)49 – 51,53 and medium-energy ion scattering (MEIS)54 – 56 to examine relaxation of the Fe(100), Fe(110), and Fe(111) surfaces and find that there is some conflict between the reported surface relaxations. The relaxations that occur from cleavage of a bulk structure to yield a surface result from the drive to minimize the energy of the surface. The measurement of surface energy values experimentally, however, can be very difficult to perform for a number of reasons, one being the difficulty to control the presence or absence of contaminants. In calculations, the state of the surface and level of impurities can be examined systematically. Theoretical studies that have determined surface energy values of the low-index Fe surfaces have included mainly molecular mechanics (MM) techniques,34,57 – 68 with fewer studies using a quantum mechanical (QM) approach.20,69,70 In particular, the latter studies did not

522

METAL SURFACES AND INTERFACES

take into account the effect of surface relaxation on the calculated surface energy values. Furthermore, there were conflicting trends obtained for the stability of the three low-index surfaces. Hence, we performed DFT calculations52 to model these properties and to try to clarify the situation. 16.3.1.2 Surface Models The Fe surfaces were modeled using the supercell approximation as described in Section 16.2.2. All models used a [1 × 1] crystal unit cell; however, a number of [2 × 2] unit cell slab calculations were performed as well in order to test for convergence. k -Space sampling was performed using the scheme of Monkhorst and Pack.42 A k -point mesh of 12 × 12 × 1 for the [1 × 1] unit cells and a 6 × 6 × 1 mesh for the [2 × 2] unit cell cal˚ was used, as culations were employed. A lattice constant value of 2.869 A this was the optimized value obtained in our previous study20 of bulk bcc Fe using the same computational parameters. Models with different numbers of layers (ranging from 7 to 17 layers) were constructed to determine the size of slab needed to converge the surface geometry and energy values. Either one middle layer (for an odd-number layered model) or two middle layers (for an even-number layered model) were fixed, to provide a reference point for comparing the relaxed Fe positions, while all other atoms were allowed to relax in the x -, y-, and z -directions. The models selected are described in Section 16.3.1.3. 16.3.1.3 Surface Relaxation Our calculations of the relaxed surface models52 indicated that only relaxations perpendicular to the surface (in the z -direction) occurred and that these surfaces do not reconstruct (showing no atomic displacements in the x,y-directions), in agreement with experimental studies.49 – 51 For each layer in our model we calculated the values of δzn , which is a measure of the distance the nth layer of the surface moves as a percentage of the interlayer spacing. A positive value indicates an expansion or upwards movement (towards the surface), whereas a negative value indicates a contraction or downwards displacement. The relaxation values for the (100), (110) and ˚ for (111) surfaces were found to be converged by 0.01, 0.0005 and 0.005 A a 13, 7 and 12 layer model, respectively. The [2 × 2] surface models showed close agreement with the [1 × 1] slabs. These models are employed in our further work. The relaxation values obtained (Table 16.2) showed good agreement with experiment, with the open surface relaxing more, in the order of (110) < (100) < (111). The magnitude of the relaxations was found to be smaller as the bulk layers were approached. For all surfaces, the topmost layer contracted toward the bulk, with the (111) surface showing the largest relaxation, followed by the (100), then the (110) surface. The relaxation of the (110) surface layer was essentially zero, indicating that it is basically bulk cleaved. The second layer was found to relax outward for the (100) and (110) surfaces, while it expanded away from the bulk for the (111) surface. Again, the relaxations were largest for the (111) surface and smallest for the (110) surface.

STRUCTURE AND PROPERTIES OF IRON SURFACES

523

TABLE 16.2 Calculated Relaxation Measurements, δzn (n = 1, 2, . . .) as a Percentage of the Bulk Interlayer Spacing for the First Five Layers of Fe(100), Fe(110), and Fe(111)a Surface Energy (J m−2 )

Surface Relaxation (%)

(100) (110) (111)

δz1

δz2

δz3

δz4

δz5

Relaxed

Unrelaxed

−1.89 −0.13 −13.3

+2.59 +0.197 −3.6

+0.21 −0.06 +13.3

−0.56 — −1.2

−0.14 — +0.35

2.29 2.27 2.52

2.32 2.27 2.62

Source: Ref. 52. a Calculated surface energy values.

For the (100) and (110) surfaces, the magnitude of our calculated surface relaxations agreed well with the experimentally determined values50,51,55 and fell within the error of these measurements. For the (111) surface, there was a discrepancy between the relaxation values measured experimentally using MEIS54,56 and LEED.49,53 The MEIS measurements54,56 indicated that the first layer contracted and the second expanded, whereas the LEED study53 indicated that the first two layers contracted and the third expanded. Our calculations agreed with the LEED measurements. The magnitude of the surface relaxations can be related to the openness of the surface, with the more open (111) surface showing larger relaxation and the most close-packed (110) surface being almost bulk cleaved. 16.3.1.4 Surface Energy The calculated surface energy values (Table 16.2) for all three surfaces was found to be converged to at least 0.01 J m−2 by nine layers with the unrelaxed models having slightly higher or the same surface energy values. Experimentally, the surface energy of Fe has been determined using liquid surface tension measurements by extrapolating the data to 0 K to give a numerical value for the solid of 2.41 J m−2.71 As this value does not represent a particular surface of Fe, we cannot make a direct comparison; however, our values were generally in line with this value, especially if the average for all three surfaces was calculated. It was also found that the results obtained from previous MM calculations are dependent on the quality of the potentials employed, while the QM calculations, including our work, all give values that are close to the experiment. The surface energy values that were calculated showed the order of the surface stability to be (110) < (100) < (111), before and after relaxation. This relative order could be explained in terms of bond cutting arguments as well as the openness of the surface.52 In summary, our models provide a good approximation of the surface energy values, with the extent of the decrease in surface energy after relaxation being related to the magnitude of the relaxation and are therefore used in subsequent studies. Our calculations described above provided the first fully converged study of the relaxation and surface energies of the three low-index Fe surfaces.

524

METAL SURFACES AND INTERFACES

16.3.2 Electronic and Magnetic Properties of Fe(100), Fe(110), and Fe(111) Surfaces 16.3.2.1 Introduction and Previous Studies It is well known that the magnetic properties of metals at a surface are different from those in the bulk and the magnetic moments of Fe surfaces have been studied both theoretically69,72 – 80 and experimentally.81 Table 16.3 summarizes available computational results. It is well established that the magnetic moment (μB ) at the surface is enhanced compared to the bulk, due to loss of coordination upon formation of the surface. However, only a few such studies that have investigated this effect theoretically consider surface relaxations,72,73,75 with most only examining bulk-terminated surfaces.69,73,74,76 – 80 Despite the number of studies that have investigated magnetic properties of surfaces, we are unaware of any published computational studies of how the magnetic properties of Fe surfaces are related to Fe adhesion and interface formation. At an interface, the magnetic properties can differ from those of the surface or the bulk. Understanding this is particularly important for magnetic device technology.82

TABLE 16.3 Computed Magnetic Moments (μB ) of the Relaxed and (Unrelaxed) Fe(100), Fe(110), and Fe(111) Surfaces of Fe, Along with the Values Determined Previouslya Magnetic Moment, μB Surface

Year

[Ref]

(100)

[this work]

(110)

199673 199574 199469 199276 199277 198778 198379 198180 [this work]

(111)

200272 199469 /199277 198778 [this work] 199375

S

S-1

S-2

S-3

S-4

3.03 (3.06) 2.74 (3.01) (2.97) (2.87) (2.97) (2.98) (2.98) (3.01) 2.75 (2.75) 2.47 (2.57) (2.65) 2.96 (3.01) 2.62

2.47 (2.50) 2.62 (2.36)

2.59 (2.55) — (2.42)

2.48 (2.47) — —

2.45 (2.46) — —

(2.34) (2.30)

(2.33) (2.37)

(2.25)

(2.24)

(2.35) (1.68) 2.53 (2.53) 2.29 (2.35) (2.37) 2.50 (2.57) 2.25

(2.39) (2.13) 2.40 (2.48) 2.32 (2.25) (2.28) 2.66 (2.66) 2.34

— — 2.43 (2.44) 2.26 (2.24) (2.25) 2.56 (2.54) 2.15

— — 2.41 (2.41) — (2.24) — 2.55 (2.56) 2.17

C 2.42 (2.43) 2.60 (2.32)

(2.25) (1.84) 2.39 (2.39) 2.22 (2.22) 2.56 (2.53) 2.11/2.00b

a S is the surface layer, S-n (n = 1 to 4) are the second to fifth layers, and C is the center of the slab. b The calculation also included an S-5 value; hence, the values indicated are S-5/C.

STRUCTURE AND PROPERTIES OF IRON SURFACES

525

16.3.2.2 Magnetic Moments and Density of States of Fe Surfaces To relate magnetic and electronic properties to adhesion we first examined the properties of the isolated relaxed and unrelaxed Fe(100), Fe(110), and Fe(111) surfaces. The layer-resolved magnetic moment values obtained from our calculations for the three low-index surfaces before and after relaxation are shown in Table 16.3 together with a summary of previously determined values for the same Fe surfaces. It can be seen that the magnetic moment values are enhanced at the surface, due to the loss in coordination at the surface resulting in localized surface states (see, e.g., Alden et al.77 and Freeman and Fu78 ). For the surfaces studied, the enhancement is 25%, 15%, and 16% for the (100), (110), and (111) surfaces, respectively, using the relaxed surface models. The difference in surface layer magnetic moment enhancement can be attributed to the coordination of Fe atoms at each surface, where the (110) surface atoms have a higher surface coordination number and hence the lowest surface enhancement. However, the difference between the (100) and (111) surfaces, which both have a surface Fe coordination number of 4, indicates that additional features of the surface atomic arrangement, such as packing, affect the magnetism of Fe, as seen previously.75 The enhancement of the surface magnetic moment value observed for all Fe(100), Fe(110), and Fe(111) surfaces has been attributed by Wu and Freeman83 to the difference in density of surface layer up- and down-spin states at the Fermi level (EF ) as compared to the bulk. They showed that for the bulk (or center layer) density of states (DOS), the Fermi level lies on an up-spin peak and in the valley of the down-spin DOS. At the surface layer, however, the DOS are significantly narrowed due to a loss in coordination. As a result, there is a decrease in up-spin states at EF and an increase of down-spin states due to surface states and resonances. It is this increased number of down-spin states relative to the up-spin states that gives rise to the surface magnetic moment enhancement. The total DOS resolved to up- and down-spin states of each of the unrelaxed (and relaxed) surfaces is shown in Fig. 16.2 (dashed line). The bulk DOS are shown in Fig. 16.3. As can be seen from Fig. 16.2, there is an increased density of down-spin states compared to up-spin states present at EF for all three surfaces, leading to the enhanced surface magnetic moment. Comparison of the DOS for the three surfaces with those obtained previously shows good agreement. The atoms in the lower layers of our surface models show magnetic moment values that generally decrease and are identical within 1.2% for the S-4 and C layers, indicating that the surface models are large enough to achieve convergence. The magnetic moment values for the center layer of the (100) and (110) surfaces are similar, within 1.6%, and are converged to less than 1.25% compared to the bulk value, calculated to be 2.40 μB using the same computational parameters. They are, however, up to 7% different when compared to the (111) surface value where the central layer μB is 5% larger than the bulk value, indicating that the surface model may not be large enough for convergence of this property. It is important to note, though, that other properties, including surface energy and relaxation, do converge for models of the same size.52 We therefore consider the models to be appropriate for this study and for comparison with previous work.

METAL SURFACES AND INTERFACES

4

Fe(100)

3 Up

2 1

-5 -4 -3

-1

n(E) (states/eV atom)

526

energy (eV)

EF 1

2

3

4

-1 -2

Down

4

Fe(110)

3 2

Up

1

-5 -4 -3

-1

n(E) (states/eV atom)

-3

energy (eV)

EF 1

2

3

4

-2 Down -3

4 3 Up 2 1 -5 -4 -3

-1

n(E) (states/eV atom)

5

Fe(111)

energy (eV) 1

2

3

4

-2 Down -3 -4

Fig. 16.2 Total density of states (DOS) resolved to up- and down-spin for the surface/top layer of the unrelaxed (dashed line) and relaxed (solid line) Fe (100), (110), and (111) surfaces. The DOS values have not been smoothed.

STRUCTURE AND PROPERTIES OF IRON SURFACES

4

Bulk Eq. (1.39Å) n(E) (states/eV atom)

3 Up 2

1

5

4

1

527

EF

energy (eV) 1

2

3

1

Down

2

3

Fig. 16.3 Total density of states (resolved to up- and down-spins) for the surface layer of the (100) matching interface at equilibrium separation compared to the bulk. The DOS values have not been smoothed.

Even though our calculated layer-resolved magnetic moment values decrease toward the bulk (i.e., away from the surface), the (100) and (111) surfaces show some small oscillations. As can be seen from Table 16.3, most previous studies show an oscillation as well, which has been explained by rearrangements in the electron density (i.e., Friedel oscillations). The two exceptions are given by Kishi and Itoh,73 whose surface model is not large enough to observe oscillations, and Eriksson et al.,76 who incorporate spin-orbit coupling into their calculations. We do not observe such oscillations for the (110) surface, similar to previous calculations by Freeman and Fu,78 Alden et al.,69,77 and Braun et al.72 We do see a 1.2% increase in the magnetic moment value at the S-3 layer, similar to the results of Braun et al.72 ; however, this change is probably within computational uncertainty. Comparison of the magnetic moment values after relaxation shows that the μB of the surface atom decreases for the (100) and (111) surfaces, while it remains the same for the (110) surface. This appears to be related directly to the magnitude of the surface relaxation of the outermost layer.52 The DOS of the outermost layer shows little change after relaxation (Fig. 16.2, dashed line). Thus, surface relaxation does not affect the surface magnetic moments or DOS to a significant extent, and therefore the “frozen surface” adhesion model we employ in Section 16.4.2 is justified.

528

METAL SURFACES AND INTERFACES

16.3.3 Sulfur Adsorption on Fe(110) 16.3.3.1 Introduction and Previous Studies The presence of S on Fe surfaces has been shown to affect adhesion, corrosion, and catalysis and is thus of importance in industrial processes. Impurities, in general, can either increase or decrease the strength of adhesion, depending on conditions. Prior to studying the effect of impurities on adhesion we needed to examine the adsorption of these impurities on the clean Fe surfaces. The experimental S adsorption data on Fe(110)84 – 86 has concentrated primarily on the 1/4 ML coverage with S adsorbed in a p(2 × 2) arrangement. Below we summarize our findings on adsorption of S on Fe(110) in three different high-symmetry adsorption sites: atop, bridge, and four-fold hollow at 1/4 ML coverage, followed by the effect of different S coverages on the foregoing properties of the Fe(110) surface87,88 (Section 16.3.3.8). 16.3.3.2 Adsorption Models and Computational Details The Fe surfaces were modeled using the supercell approach (Section 16.2.2). S adsorption at the experimentally observed coverage of 1/4 ML and p(2 × 2) arrangement84,85 was modeled by placing an S atom on one side of the slab (see Fig. 16.4). S was adsorbed in atop, bridge, or four-fold hollow sites. The S atom and only the three top Fe layers were allowed to relax. A k -point mesh of 6 × 6 × 1 was employed, as this gives a good description of FeS2 89,90 and clean Fe(110).52

vacuum spacing (~10Å) S Fe1 Fe2 Fe3 Fe4 Fe5 (a)

(b)

(c)

Fig. 16.4 (color online) Top and side views of the supercells used to model sulfur adsorbed in a p(2 × 2) arrangement ( 1/4 ML coverage) in (a) atop, (b) bridge, and (c) four-fold hollow sites.

529

STRUCTURE AND PROPERTIES OF IRON SURFACES

To determine the workfunction (defined as the energy required to remove an electron from the Fermi level, EF , to the vacuum) of the foregoing systems, a dipole correction was added in the direction perpendicular to the surface. As we have an asymmetric slab with the adsorbate placed on only one side of the slab, the electrostatic potential in the vacuum region will show a clear distinction between the each side of the slab, representing the adsorbed surface or the clean surface. The workfunction value, , is represented as = Evac − EF

(16.7)

where Evac is the electrostatic potential in the vacuum region of the supercell on the adsorbate side of the supercell and EF is the energy of the Fermi level. The change in workfunction value, , is calculated by subtracting the workfunction of the clean surface from that of the adsorbed surface. 16.3.3.3 Binding Energy and Workfunction Measurements The calculated binding energy values (Table 16.4) indicated that the hollow site is the most favored, and is in agreement with experimental data,84 followed by the bridge and then the atop sites. The calculated workfunction values and workfunction changes for S/Fe(110) in the three adsorption sites are shown in Table 16.4; our calculated values compared well to the experimental segregation energy value of 5.2 eV91 as does the calculated clean surface value with the experimental value of 5.12 ± 0.06 eV.92 The change in sign of the workfunction values after S adsorption was similar to other atomic adsorbates, such as oxygen, which also show a negative workfunction change,93 indicating a negatively charged surface species. As the magnitude of the workfunction change was only very small, it suggested that there is little transfer of charge from the Fe to the S. The change in workfunction values after S adsorption were largest for the atop site, followed by the bridge and then four-fold hollow site. 16.3.3.4 Adsorption Geometry After adsorption of S, the calculations showed that both relaxation and surface reconstruction occurred.87 Table 16.5 shows the TABLE 16.4 Parameters Calculated for S Adsorbed on Fe(110) in the Atop, Bridge, and Four-fold Hollow Sitesa Adsorption Site Parameter

Atop

Bridge

Hollow

BE (eV) (eV) (eV)

4.52 5.08 0.24

5.32 4.999 0.15

5.82 4.98 0.14

Source: Ref. 87. a BE, binding energy; , workfunction; , change in workfunction after S adsorption. The workfunction for the clean Fe(110) surface was calculated to be 4.84 eV, using a five-layer slab.

530

METAL SURFACES AND INTERFACES

TABLE 16.5 Calculated Distances for S Adsorbed on Fe(110) in a p(2 × 2) Arrangement in Atop, Bridge, and Four-Fold Hollow Sitesa Adsorption Site ˚ Distance (A) d⊥ (S–FeS d (S–Fe)

Atop87

Bridge87

Four-fold Hollow87

Four-fold Hollow84

2.06 (1.797) 2.06

1.70 2.15

1.49 2.19

1.43 2.17

Source: Ref. 87. a Included are the corresponding values determined from LEED measurements84 for the four-fold hollow site: the perpendicular height of S above the highest atom in the topmost Fe layer, d⊥ (S–FeS ), and the shortest S–Fe distance, d(S–Fe).

calculated distances between the adsorbed S and closest Fe atom, the height of S above the surface and the experimental LEED84 values for the 4-fold hollow site. The perpendicular height of the adsorbed S above the top Fe layer (Table 16.5) increases going from the four-fold hollow to the bridge and atop sites, as the S lies closer to the surface for the more highly coordinated adsorption sites. The shortest S–Fe bond distances were again related to the coordination number of the adsorption site; the S–Fe bond distance is shorter for the atop site, where it is bonding directly to one atom but is longer for the bridge and four-fold hollow sites, where the bonding is distributed over more atoms. Interestingly, some buckling of the surface layers was observed after S adsorption. For the four-fold hollow site all Fe atoms in the top layer relax upward slightly, opposite to the clean Fe(110) surface. In addition, the two Fe atoms lying farther from the S moved upward, while the two atoms closest to the S only moved upward, which resulted in the S–Fe distances to these four surface atoms being equalized, maximizing the S–Fe coordination. The second-layer Fe atoms were less buckled and the third-layer Fe atoms were bulklike, in good agreement with experimental data.84 For the bridge site, there was also some buckling of the surface layer, similar to the four-fold hollow site; for the second layer there was some small buckling, while the third layer was bulklike. For the atop site, all surface layer Fe atoms relaxed upward slightly, except for the atom directly below the adsorbed S, which moved downward. The atoms next closest to the S in the top layer relaxed upward, with the farthest ones also relaxing upward, but only slightly. The small displacement in the x - or y-direction indicated that the four-fold hollow site reconstructs the most and the atop site the least.52 For the fourfold hollow site, the second-layer Fe atoms showed no reconstruction, while those in the third layer reconstructed slightly but the movement was negligible ˚ (>(111). This order is the same as that calculated for the isolated surfaces in Section 16.3.1.4. As a result of the relative stability of the surfaces, despite the (111) matching interface having the largest Wsep of all the low-index interfaces, the lower stability of the surface indicates that it is less likely to exist as the clean bulk-terminated face. The d0 values calculated (Table 16.7) were found to be smaller for the matching interfaces than for the mismatching interfaces. In fact, the d0 values for the matching interfaces indicate that the interface forms the bulk structure at the equilibrium separation. For the mismatching interfaces, the d0 values were found ˚ 133 , as the to be approximately equal to the Fe–Fe bond distance of 2.482 A topmost Fe atoms on each surface forming the interface directly face each other. The l values (Table 16.7) calculated for the matching and mismatching interfaces were all close to each other and agreed with the empirically estimated ˚ for several Fe surfaces,46 except for average screening length value of 0.56 A the (111) mismatching interface, suggesting again that this interface is unlikely to form. The l values were slightly larger for the matching interfaces, indicating that the electronic interactions between the approaching surfaces forming the interface begin at a larger separation. The ideal peak interfacial stress values (Table 16.7), which give a measure of the maximum tensile stress that the interfaces can withstand without spontaneous cleavage, were shown to be in the same order as the Wsep values. 16.4.2 Relationship Between Adhesion and Electronic and Magnetic Properties

In this section we present new results investigating the relationship between adhesive energy and interfacial separation for the body-centered cubic (bcc) Fe(100), Fe(110), and Fe(111) interfaces. Both ideally matching and mismatching interfaces were considered in order to cover the endpoints of the range of adhesion of real surfaces. 16.4.2.1 Magnetic Properties and Adhesion of Fe Interfaces The computed layer-by-layer local atomic magnetic moments for the Fe(100), Fe(110), and Fe(111) interfaces in match and mismatch at three interfacial separation distances ˚ separation, the interfaces at approximately infinite separation; 4 A, ˚ (d )20,21 : 10 A the approximate distance at which metallic interactions begin to dominate; and the equilibrium separation (Eq.) are shown in Fig. 16.9. Figure 16.9 shows that for the (100) match interface, the top surface layer μB changes considerably as the surfaces approach, while the second and third layers change only slightly and the lower layers, hardly at all. At the equilibrium interfacial separation, the μB values differ very little from layer to layer, consistent with the fact that at this separation the system is essentially bulk Fe. For the mismatch interface, it is again the surface μB that is most changed upon

541

STRUCTURE AND PROPERTIES OF IRON INTERFACES

1

magnetic moment (μB)

2 3 4 layer number

5

6

10Å 4Å 1.99Å(Eq.)

3

0

1

2 3 4 layer number

5

6

Fe(111) Match

3.5 magnetic moment (μB)

2

0

1

2

3 4 5 layer number

6

7

Fe(110) Mismatch 10Å 4Å 2.43Å(Eq.)

3

2.5

2.5

2

0

1

2

3 4 5 layer number

6

7

Fe(111) Mismatch

3.5 10Å 4Å 1.5Å 0.8Å(Eq)

3

10Å 4Å 2.39Å(Eq.)

3

2.5

2.5 2

10Å 4Å 2.43Å(Eq.)

3

3.5

Fe(110) Match

magnetic moment (μB)

0

3.5

2

Fe(100) Mismatch

2.5

2.5 2

magnetic moment (μB)

10Å 4Å 2Å 1.39Å(Eq.)

3

magnetic moment (μB)

magnetic moment (μB)

3.5

Fe(100) Match

3.5

0

1

2 3 4 layer number

5

6

2

0

1

2

3 4 5 layer number

6

7

Fig. 16.9 (color online) Calculated layer-by-layer magnetic moment values (μB ) for the match and mismatch Fe(100), (100), and (111) interfaces at the interfacial separations indicated; Eq. is the equilibrium separation.

formation of the interface, while the lower layers stay almost constant. At the equilibrium interfacial separation the surface μB is still enhanced, as the bulk crystal is not formed when the surfaces are out of epitaxy. The (110) match and mismatch interfaces display similar trends to the (100) interfaces where the second- and third-layer μB values stay almost the same as those of the lower layers. The third layers of both (110) interfaces, however, appear to be less affected than they are on the (100) interface. This surface is more closely packed than the (100) surface, and hence it would be expected that the lower layers would be less affected by changes occurring at the surface layer. The (111) match and mismatch interfaces also show a surface layer magnetic moment enhancement; however, in addition to the surface layer, the second- and third-layer μB values are clearly altered as the interfacial separation is decreased. For this less close-packed surface, the second and third atomic layers are more exposed. It can therefore be suggested that there are surface states localized on

542

METAL SURFACES AND INTERFACES ΔμΒ

ΔμΒ

ΔμΒ

–0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 –0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 –0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0 0

–6

–3

–3

–4

–4

Match Mismatch

–5 Eq. –6

Eq.

–2

–2

–3 –5 Eq.

–1

–1 Eq.

Ead (kJ/mol)

–2

Ead (kJ/mol)

Ead (kJ/mol)

–1

Eq.

–4 –5 –6

Eq.

20,21 Fig. 16.10 (color online) Adhesion energy values, Ead, plotted against surface layer magnetic moment enhancements, μB = μBsurface − μBbulk , corresponding to the same interfacial separations for the (100), (110), and (111) interfaces (from left to right) in match and mismatch (triangles).

these “lower-layer” atoms, and as the surfaces are brought together, the lowerlayer surface states also begin to interact, resulting in changes in their computed magnetic moments. This is in contrast to the (100) and (110) surfaces, where atoms below the topmost layer are fully (i.e., bulk) coordinated; their magnetic moment values are therefore close to those computed for the bulk, and changes in interfacial separations have negligible influence. This observation is consistent with this surface being more open. The relation between the surface μB changes and the adhesion energy can be seen from Fig. 16.10, where the values for the surface μB enhancement, μB (the difference between the surface atomic layer μBsurface and the computed bulk μBbulk ), and the adhesion energy, Ead , for the interfaces have been plotted. For all three matching interfaces the adhesion energy decreases with decreasing μB until the adhesion energy reaches a minimum when the interface is most stable (bulklike), and the enhancement is essentially zero. For the mismatching interfaces, the adhesion energy decreases as the μB decrease but μB does not reach zero at the minimum adhesion energy because the bulk crystal structure is not formed. 16.4.2.2 Density of States DOS of Matching Interfaces The surface layer density of states (S-DOS), resolved to up- and down-spin states, for all interfaces were calculated at four ˚ 4 A, ˚ Eq., and a separation between 4 A ˚ and Eq. interfacial separations: 10 A, As the difference in magnitude of the up- and down-spin states at the Fermi level affects the surface μB enhancement, we examine how these states change as a function of interfacial separation. The S-DOS for the matching (100) interface are shown in Fig. 16.11. At 10˚ separation, the S-DOS are identical to those seen earlier for the unrelaxed A surface (Fig. 16.2), as this separation represents the isolated surfaces.20 The values calculated for the up- and down-spin DOS at EF (Table 16.8) show the presence of more down-spin states at EF , which gives rise to the surface μB enhancement.

543

STRUCTURE AND PROPERTIES OF IRON INTERFACES

TABLE 16.8 Number of Up- and Down-Spin States at the Fermi Energy (in States/eV Atom) for Match and Mismatch Interfaces at Interfacial Separations ˚ and Equilibrium (Eq.) of 10 A Match Interface (100) (110) (111)

Mismatch

Interfacial Separation

Up

Down

Up

Down

˚ 10 A Eq. ˚ 10 A Eq. ˚ 10 A Eq.

0.09 0.79 0.16 1.00 0.08 0.45

1.02 0.23 0.86 0.52 1.45 0.33

0.10 0.13 0.14 0.20 0.07 0.15

0.81 1.24 0.87 0.45 1.33 2.03

˚ is shown That there is little chemical interactions of the surfaces for d >4 A by the similarity in the S-DOS, consistent with the similarity of the adhesion energy curves20 and the values of the surface μB enhancements. At the equilibrium interfacial separation, the number of down-spin states at EF has decreased significantly (see Table 16.8), the overall features of the S-DOS are those of bulk Fe (Fig. 16.3), and the up-spin S-DOS change significantly at EF , with an increased number of states at EF . As a result of these changes, there is a larger number of up-spin states at EF , as compared to larger separation distances, leading to a significant decrease in the surface μB at this separation. For the (110) matching interface, similar behavior is observed as the interfacial separation is decreased, but at the equilibrium separation there is a decrease in the down-spin states, whereas there is an increase in the up-spin states at EF , and the DOS resemble those of the bulk crystal structure (Fig. 16.3). The up- and down-spin S-DOS of the matching (111) interface (Fig. 16.11) show behavior similar to that of the other two interfaces, with the down-spin states dominating at larger interfacial separations. At the equilibrium separation the up-spin states dominate at EF and the S-DOS resemble those of the bulk. This is consistent with the very small value computed for the μB enhancement. DOS of Mismatching Interfaces The resolved surface layer DOS values for the three mismatching interfaces were calculated and the up- and down-spin states ˚ separation, the Sat EF are shown in Table 16.8. For the (100) interface at 10 A DOS represents the isolated noninteracting surface. As the interfacial separation is decreased, the down-spin states present near EF vary slightly in number, but unlike the matching interface, they are still present at the equilibrium separation, still having an increased number of down-spin states, indicating an enhanced surface μB value. Similar behavior is seen for the DOS of the (110) and (111) mismatching interfaces. 16.4.2.3 Charge Density The charge-density distribution of the (100), (110), and (111) matching and mismatching interfaces was examined at two different interfacial separations: equilibrium separation and a separation greater than

METAL SURFACES AND INTERFACES

Fe(100)

4 3

Up

2 1

-5 -4

10Å 3.95Å 2Å 1.39Å(Eq.)

n(E) (states/eVatom)

544

-1 EF

energy (eV) 1

2

-2 Down -3 4 3 Up

2 1

-5 -4

10Å 4Å 1.99Å(Eq.)

n(E) (states/eVatom)

Fe(110)

-1 EF

energy (eV) 1

2

-2 Down -3 5 4 Up

3 2 1

10Å 4Å 1.5Å 0.8Å(Eq.)

n(E) (states/eVatom)

Fe(111)

energy (eV) -5 -4

-1 EF

1

2

3

Down -2 -3

Fig. 16.11 Surface layer density of states (resolved to up- and down-spin states) for the (100), (110), and (111) matching interfaces at the interfacial separation indicated, including equilibrium (Eq.). The DOS values have not been smoothed.

STRUCTURE AND PROPERTIES OF IRON INTERFACES a) match interface low

545

b) mismatch interface

high

d d

2Å

1.39 Å(equil.) (a)

2.43 Å(equil.)

4Å (b)

Fig. 16.12 (color online) Charge-density plots of (a) matching and (b) mismatching Fe(100) interfaces at the interfacial separation d indicated.

equilibrium. The plots shown in Fig. 16.12 correspond to a slice taken perpendicular to the (100) match and mismatch interfaces. ˚ (greater For the (100) matching interface (Fig. 16.12a) at a separation of 2 A than equilibrium), the plot shows a region of low charge density between the two surfaces forming the interface, indicating that negligible metallic bond formation ˚ there is a uniform distribution of occurs. At the equilibrium separation (1.39 A) the charge density between the atoms at the interface and the bulk, signifying bond formation has occurred and the bulk material formed. The (110) and (111) matching interfaces (not shown) show identical behavior at the corresponding interfacial separations. Hence, irrespective of the crystal face forming the interface in epitaxy, the interface is most stable when the charge density is evenly distributed between the atoms at the interface and those within the bulk. The charge-density plot for the corresponding (100) mismatching interface (Fig. 16.12b) shows that at an interfacial separation greater than equilibrium ˚ there is a region of very low charge density at the interface, separation (4 A), ˚ an similar to the matching interface. At the equilibrium separation (2.43 A), increase in the charge density between the closest surface atoms forming the interface indicates that some bonding occurs. However, there are large areas of low charge density between the directional bonds, which result in a much weaker interfacial energy than that in the epitaxial arrangement.20,21 The mismatching (110) and (111) interfaces show similar behavior. 16.4.2.4 Conclusions For all three surfaces studied, there is an enhanced magnetic moment at the surface due to an increased number of down-spin states as opposed to up-spin states at the Fermi level in the DOS, consistent with previous studies. The inclusion of surface relaxation in the calculations had little effect on the magnetic moment values and DOS. The magnetic moments calculated for the interfaces at a number of special interfacial separation distances were found to be related and were consistent with

546

METAL SURFACES AND INTERFACES

the adhesion properties obtained previously. The surface layer magnetic moment is most affected upon formation of the interface, with lower layers being less affected but most altered for more open surfaces. For the matching interfaces the surface layer magnetic moment enhancement decreases as the interfacial separation is reduced, until it reaches zero at the equilibrium separation. In contrast, for mismatching interfaces an enhanced surface magnetic moment is still present at the equilibrium separation, as manifested by the increased number of down-spin states at EF . The charge-density plots for different interfacial separations show rearrangement of the electron density as the surfaces are brought into contact in and out of epitaxy. There is little interaction between the surfaces at large interfacial separations, in agreement with the DOS and magnetic moment enhancement values, but for shorter separations they indicate bond formation. 16.4.3 Effect of Relaxation on Adhesion of Fe(100) Surfaces: Avalanche 16.4.3.1 Introduction and Previous Studies Avalanche is a process whereby the mutual attraction between two surfaces, at a critical interfacial separation, causes the surface atoms to displace toward the opposing surface, resulting in a collapse of the two slabs to form a single slab. A number of studies have examined this effect using a range of computational methods.134 – 138 Good and Banerjea139 performed Monte Carlo simulations at room temperature on bcc Fe and W140,141 and found that avalanche still occurred for Fe(110) interfaces that were out of registry; however, it was inhibited when the surfaces were far out of registry and when only a few layers near the surface were allowed to relax. Also, the energy released in the avalanche decreased as the loss of registry increased. A study of the avalanche effect for silicon (111) surfaces142 showed covalent bond effects, indicating the importance of using quantum mechanical methods. None of these studies, however, employed quantum mechanical techniques to examine avalanche in adhesion between metallic surfaces. Furthermore, no lateral displacements were allowed during the simulations, preventing the study of avalanche formation, or avalanche of a mismatching interface into a matching one. 16.4.3.2 Interface Models The Fe interfaces were modeled using the supercell approximation, described in Section 16.2.2. Surfaces were cleaved from a crystal structure of bcc Fe, corresponding to the (100) Miller plane; the specific details of the individual models and their graphical representations have been explained by Spencer et al.131 In model I131 the sandwich approach was used to represent the match and mismatch interfaces, which means that only one vacuum spacer was positioned between the surfaces, comprising six layers each for the match interface and six and five layers for the mismatch interface. The three-dimensional periodic boundary conditions (PBCs) were then applied to the cell. For the match interface, the two middle-layer atom positions were fixed; for the mismatch interface the

STRUCTURE AND PROPERTIES OF IRON INTERFACES

547

middle layer of atoms was fixed. All other atoms were allowed to relax. We defined the initial and final interfacial separations as the distance between the boundary layers of the original and relaxed separated surfaces, respectively. The ˚ total energies were calculated for separations from approximately 1 to 10 A. Model II131 was identical to model I except that no surface layers were fixed ˚ was added in the z -direction to allow and an additional vacuum spacer of >30 A the entire slab to move in the z-direction during relaxation. The initial interfacial ˚ for both match and mismatch interfaces. The separation was approximately 3 A systems were then subject to the full geometry optimization, keeping the total volume of the supercell fixed. The energy at the final interfacial separation was calculated. ˚ were introduced in In model III,131 vacuum spacers of approximately 8 A the x-, y-, and z-directions, creating a periodic cluster-type model. The number of layers was similar to those of models I and II, but only a mismatch initial configuration was used for the geometry optimization. One surface (i.e., cluster) was fixed during the geometry optimization, while another one was free to move ˚ and the final in all three directions. The initial interfacial separation was 4.8 A, geometry was examined. 16.4.3.3 Summary of Findings In model I, the relaxation resulted in increasing the interlayer spacing throughout the surfaces. For the relaxed system, the ˚ and for the unrelaxed surface it interlayer spacing was approximately 1.58 A ˚ was 1.4345 A, making the relaxed interlayer spacing approximately 10% larger than the unrelaxed spacing. Further detailed analyses131 indicated that in such a system setup, a proper avalanche effect cannot occur because of the additional constraint on the fixed layers of the slabs as well as the periodic boundary conditions in all three dimensions, which cause unrealistic stretching of the interlayer spacing and formation of a highly strained crystal region. In model II, relaxation of the periodic boundary condition in one (z-) dimension resulted in the two surfaces jumping together. The equilibrium interfacial ˚ was achieved for the match and mismatch separation of 1.437 and 2.4996 A interfaces, respectively. The match interface value was approximately equal to ˚ as was expected. Similarly, the mismatch the bulk interlayer spacing (1.4345 A), ˚ The overall geometry interface was close to the bulk Fe–Fe distance of 2.47 A. at the center of the interface formed upon avalanche was bulklike, as opposed to the strained model I. The adhesion energy for the match interface after relaxation compared well with that obtained for the minimum-energy structure with the same interfacial separation using model I, but as the outer layers of model II were allowed to move, this resulted in surface relaxation and hence in slightly lower energy. In our model III, the two clusters were found to approach each other, forming a nearly matching interface with some minor structural imperfections due to a limited simulation time. However, the calculation clearly illustrated that if no constraints are imposed on the system, it will undergo avalanche and relax toward perfect registry.

548

METAL SURFACES AND INTERFACES

16.4.4 Effect of Sulfur Impurity on Fe(110) Adhesion 16.4.4.1 Introduction and Previous Studies In Section 16.3.3 we discussed the effects that S impurity can have on the properties of Fe surfaces. Experimentally, the presence of S contamination affects the adhesive strength of the interface compared to the clean surfaces143 – 145 but there are some conflicting findings. Also, the effect that S has on the structural, electronic, and magnetic properties has not been examined. Below we summarize our findings on the effect of the experimentally observed 1/4 ML coverage of S adsorbed in atop, bridge, and four-fold hollow sites on the adhesion properties of Fe(110) surfaces132 and how they compared to the clean interfaces. We also provide a brief summary of the effect of different S coverages on the properties of Fe(110) in Section 16.4.4.3. 16.4.4.2 Interface Models and Computational Parameters Adhesion between a relaxed S/Fe(110) surface and an unrelaxed clean Fe(110) surface was investigated in order to make a comparison with our previous study of adhesion between unrelaxed clean Fe(110) surfaces.20 Our S/Fe(110) surface models obtained previously87 and described in Section 16.4.4.1 were used to model the S-contaminated interfaces. The relaxed five-layer model with a S atom adsorbed in either an atop, bridge, or four-fold hollow site on one side of the slab in a p(2 × 2) arrangement represented a mismatch interface, where insertion of the vacuum spacer in the z -direction resulted in formation of the interface. An additional layer was added to the relaxed five-layer model to form the match interfaces. The definitions of match and mismatch are described according to the geometry of the interface formed when the S is removed. By adjusting the size or thickness of the vacuum spacer, different interfacial separations were modeled. The two surfaces forming the interfaces were defined as surface A [the relaxed S/Fe(110) surface] and surface B [the unrelaxed clean Fe(110) surface]. The interfacial separation was defined as the distance between the topmost Fe atoms on each surface. A diagram of the models employed can be found elsewhere.132 For all three matching interfaces the S atom lies between two different adsorption sites, one on surface A and the other on surface B. On surface A, the S atom lies above an atop, bridge, or four-fold hollow site, whereas on surface B, the S atom lies above a four-fold hollow, bridge, and atop site, respectively. For the bridge–site interface, the two Fe atoms forming the bridge site on surface B are oriented at right angles to those forming the bridge site on surface A. As the topmost Fe atoms and S atoms on surface A were relaxed, they showed some buckling (described previously by Spencer et al.87 ). The Fe atoms on surface B represented a clean bulk-terminated surface which did not show any buckling. The interfaces were described as atop, bridge, or hollow, depending on the site to which the S atom was adsorbed on surface A. As the work of separation, by definition, disregards the effect of plastic or diffusional processes, we performed further calculations to remove some of the constraints applied to our interface models and to examine the effect of relaxation of the interface at equilibrium. These calculations were performed on the

STRUCTURE AND PROPERTIES OF IRON INTERFACES

549

interfaces at the equilibrium separation and allowed all S and Fe atoms to relax while also allowing the cell volume to change. 16.4.4.3 Results Adhesion Energetics The adhesion energy values calculated for each interface132 are presented in Fig. 16.13, along with the fitted UBER parameters in Table 16.9.132 In all adsorption sites and for both match and mismatch interfaces, the UBER provides a good description of the adhesion values. The S was found to decrease the adhesion energy compared to the clean interface20 in all adsorption sites and alignments of match and mismatch. The strongest interface was with S adsorbed in atop sites in a matching orientation. For all interfaces, except the hollow interface, the match interfaces were stronger than the corresponding mismatching interfaces. Relaxation of the interfaces at the equilibrium separation led to an increase in the adhesion energy, but the interfaces were still weaker than the corresponding clean ones. For all interfaces, the S was found to increase the equilibrium interfacial separation, with the S–Fe distances to different adsorption sites on the two surfaces being consistent with the distances on the same sites on the isolated surface. The shortest S–Fe distances to surfaces A and B were found to be smaller than on the isolated surface, due to the attraction between the Fe atoms across the interface, bringing the two surfaces closer together. The relaxation introduced surface buckling of the clean surface due to the presence of S, as it did on the isolated surface, but of larger magnitude. A comparison of the S–Fe distances at the interface with those found in naturally occurring iron sulfide minerals indicated the presence of chemical bonds across the interface. Similar to the Wsep values, the screening length, l (Table 16.9), for each interface was reduced by the presence of S-contamination, showing that the attraction

0.5

4

6

8

10

0.5 -0.50

interfacial separation 2

4

6

8

10

Ead(Jm-2)

2

Ead(Jm-2)

-0.50

interfacial separation

-1.5

-1.5

-2.5

hollow bridge atop clean hollow UBERfit bridge UBERfit atop UBERfit clean UBERfit

-3.5

-4.5

(a)

-2.5 -3.5

-4.5

(b)

Fig. 16.13 (color online) Adhesion energy data calculated and fitted UBER curves for the 1/4-ML S-contaminated Fe(110) match (a) and mismatch (b) interfaces with S adsorbed in atop, bridge, and hollow sites. The clean Fe(110) interface data20 are shown for comparison. (From Ref. 132.)

550

METAL SURFACES AND INTERFACES

TABLE 16.9 UBER Parameters Calculated for the S-Contaminated Match and Mismatch Interfaces132 and Values for Clean Interfacesa Adsorption Site

Atop

Bridge

Hollow

Clean

0.88 (1.50) 3.55 (2.29) 0.37 1.000

4.494 1.991 0.590 0.99

1.32 (1.72) 3.03 (2.60) 0.45 0.995

2.795 2.427 0.588 0.99

Match Interface E0 = Wsep (Ead ) (J m−2 ) ˚ d0 (A) ˚ l (A) R2

1.79 (2.41) 3.30 (2.30) 0.47 0.998

1.30 (1.95) 3.30 (2.25) 0.43 0.998

Mismatch Interface −2

E0 = Wsep (Ead ) (J m ) ˚ d0 (A) ˚ l (A) R2

1.02 (1.16) 3.86 (3.10) 0.37 0.999

1.19 (1.42) 3.33 (2.78) 0.43 1.000

Source: Ref. 20. a The adhesion energy, Ead , and d0 values calculated for the relaxed S-contaminated interfaces are shown in parentheses.

between the contaminated surfaces occurs over a shorter separation distance than with a clean interface. The relative order of the l values is correlated to the dis˚ tance of the S atom from the underlying surface. In particular, from 6 to ∼3.5 A the attraction was greater than between the clean surface at the same separation, indicating that it is more likely to adhere. Charge Density Charge-density plots taken along the directions that cut the shortest S–Fe bonds across the interface were examined and compared for each interface (see Ref. 132). For both match and mismatch interfaces at the equilibrium separation, they showed that the S bonds to both surfaces A and B, bonding to the same atoms as on the isolated surface as well as the closest Fe atoms on the other surface. They also further supported the chemical as opposed to physical nature of the bonds formed at the interface. Bonding across the interface was in line with the interfacial geometry, being symmetrical for the mismatching interfaces. For each interface, however, there were regions of low charge density between adjacent S atoms which were not seen for the clean interfaces, as the S atom prevents the Fe atoms from getting close enough to interact as strongly across the interfacial boundary. After relaxation of these interfaces, these large regions of low charge density were reduced due to the structural changes that lead to a more even distribution of charge at the interface. Magnetic Moments The magnetic moment enhancements, μB , calculated for the Fe atoms most strongly bonded to the S atom on surfaces A and B were calcu˚ for lated as a function of interfacial separation. At an interfacial separation of 12 A both match and mismatch interfaces, the magnetic moment enhancements of Fe atoms on surfaces A and B were the same as seen on the isolated S-contaminated

STRUCTURE AND PROPERTIES OF IRON INTERFACES

551

surfaces87 and clean surface (see Section 16.3.2.2), respectively, in line with the adhesion energy curves. Hence, for the clean surface B, the enhancements were positive, as seen on the clean isolated surface, whereas they were negative for the S-contaminated surface A, as S quenches the enhancement seen on the clean surface. At smaller separations, the enhancements were found to stay the same until the separation where the surfaces began being attracted to each other. The values then generally decreased significantly by the equilibrium separation, with the values for surface A being largest for the hollow site, and smaller for the bridge and then atop sites. For surface B they were in the opposite order. After relaxation, the enhancements for all interfaces were found to decrease, becoming more negative as a result of the stronger interaction between the surfaces, giving rise to more spin pairing. Also, the magnetic moment enhancements for S bonding to the same sites on the different surfaces became identical, in line with the changes in geometry and charge density. Effect of Sulfur Coverage on Adhesion To determine how other coverages of S affect the interfacial properties of Fe, we performed density functional theory calculations of S adsorbed in three adsorption sites (atop, bridge, and four-fold hollow) at two different arrangements, c(2 × 2) and p(1 × 1), corresponding to coverages of 1/2 and 1 ML, respectively. We examine the same parameters as calculated for the 1/4 ML coverage for interfaces, both in and out of epitaxy. Different experimental studies of the effect of different coverages of S impurity on the adhesion of different Fe143 – 145 surfaces led to some conflict as to whether it increases or decreases the Fe adhesion. Buckley144 found that S appreciably decreased the adhesive strength of the Fe(110) interface formed through S segregation at 1/4 ML coverage and c(2 × 4) arrangement. In contrast, later studies by Hartweck and Grabke,143,145 found that segregated S increased the strength of adhesion of polycrystalline surfaces at submonolayer coverages, showing a maximum in the adhesive force at an estimated S coverage of 0.6 ML. S reduced the strength of adhesion compared to that of the clean surfaces at coverages greater than 1 ML. The differences have been suggested to be due to grain boundary effects. The adhesion energy curves and UBER parameters calculated from the fitted curve146 indicate that S reduces the adhesive strength of Fe(110) surfaces in match and mismatch orientations at all coverages examined ( 1/4, 1/2, and 1 ML). The largest work of separation was for the matching atop interface with 1/2 ML S coverage. For the mismatching configuration, the bridge 1/2 ML mismatching interface has the largest work of separation; however, it is still weaker than the strongest matching interface. The mismatching four-fold hollow 1 ML interface has such a low work of separation that it is unlikely to form. The charge-density slices of the matching and mismatching interfaces of the strongest match and mismatch interfaces examined are presented in Fig. 16.14. The magnetic moment enhancement values, μB , calculated for the Fe atoms closest to the S atoms on either side of the interface are also indicated.

552

METAL SURFACES AND INTERFACES

Surface B -0.41 0.02 S

d0

0.19 Fe 1

0.03

Fe2 Fe3 Fe4

Surface A

Fe5 Fe6

Fig. 16.14 (color online) Charge-density plots of the atop match and bridge mismatch interfaces with 1/2-ML S coverage. Slices are taken through the azimuths indicated. The calculated magnetic moment enhancement values, μB , of the Fe atoms closest to the S atoms on either side of the interface are also indicated.

Overall, compared to the results for the clean interface, we found that the interfacial separation was increased by the presence of S. The distance of S from the two surfaces was also found to be related directly to the type of adsorption site in which S sits at the two surfaces. 16.4.5 Effect of Sulfur Impurity on Fe(100) Adhesion: A Brief Summary

We have performed a detailed study of the effects of S on the adhesion of the (100) surface of Fe using methodology similar to that employed for Fe(110), described in Section 16.4.4 and in the literature.119 Adhesion energy calculations show that at 1/2 ML coverage, S decreases the adhesive energy between the Fe(100) surfaces in both match and mismatch orientations, as was also seen for the Fe(110) match and mismatch interfaces with 1/4 ML coverage of adsorbed S. The strongest S-contaminated Fe(100) interface was found to be the atop match interface. The difference between the Wsep values calculated for the clean and S-contaminated atop and bridge mismatch interfaces, however, was only 6.5%, which is smaller than the difference for the corresponding Fe(110) interfaces. In particular, for these two interfaces (as well as for their matching counterparts), the adhesive attraction was found to be stronger at larger interfacial separations than it was for the corresponding clean interface. Hence,

SUMMARY, CONCLUSIONS, AND FUTURE WORK

553

this indicates that the S-contaminated interfaces can be more prone to adhesion. A complete report of the effects of 1/2 ML coverage of S on the adhesion properties of Fe(100) surfaces has been published elsewhere.119 16.5 SUMMARY, CONCLUSIONS, AND FUTURE WORK

The results above show that the (100) and (110) surfaces have almost identical surface energies, with the (110) being slightly lower while the (111) surface has the highest energy. The surface relaxation results demonstrate that for the (100) surface a contraction of the outer layer is observed while the second and third layers expand perpendicular to the surface plane; for the (110) surface, little relaxation occurs, indicating that it is essentially bulk cleaved; and for the (111) surface, the first two layers contract while the third expands, with the magnitude of the relaxations being much larger than for the other surfaces. The layer-resolved magnetic moment values, as well as up- and down-spinresolved density of states, indicate the presence of an enhanced magnetic moment at the surface which is only slightly affected by relaxation, with the more open (111) surface showing larger changes and the most closely packed (110) surface showing little change. The adsorption of atomic S on the Fe(100) and Fe(110) surfaces at different adsorbent surface densities at the atop, bridge, and hollow sites shows that for both the Fe(100) and Fe(110) surfaces, the hollow site is the most stable, followed by the bridge and atop sites. At all three sites, S adsorption results in minor surface reconstruction, the most significant being for the hollow site. All three adsorption configurations affect the underlying surface geometry, with S causing a buckling of the top Fe layer when adsorbed in an atop site. Comparisons between S-adsorbed and clean Fe surfaces revealed a reduction in the magnetic moments of surface layer Fe atoms in the vicinity of the S. At the hollow site, the presence of S causes an increase in the surface Fe d-orbital density of states but has no significant effect on the structure and magnetic properties of lower substrate layers. We have also modeled adhesion energy as a function of surface separation between clean, bulk-terminated Fe(100), Fe(110), and Fe(111) matched and mismatched surfaces. The values of the adhesion parameters obtained suggested that the (110) interface was slightly more stable than the (100) interface. However, the order of stability is reversed if the effects of both matching and mismatching interfaces are taken into consideration, in agreement with experimental findings. The (111) interface in epitaxy is much stronger than the mismatch interface. Compared to the (100) and (110) interfaces, the (111) match interface is strongest, whereas the (111) mismatch interface is the weakest. In addition, we have examined the relationship between magnetic and electronic properties and adhesion of the Fe(100), Fe(110), and Fe(111) surfaces and found that for matching interfaces, the surface layer magnetic moment is enhanced for larger interfacial separations and decreases to the bulk value as the surfaces are brought together. The enhancement approaches zero at the minimum

554

METAL SURFACES AND INTERFACES

adhesion energy, where the bulk solid is formed. The lower layers show smaller enhancements with little or no enhancement at the centre of the slab. The mismatch interfaces show similar behavior, but the enhancement does not reach zero at the equilibrium separation, as the bulk structure is not formed. To consider the dynamics of the interface formation, we have studied the avalanche effect between Fe(100) surfaces, in match and mismatch, and the role of model constraints on the results. When the central layers of the two surfaces are constrained, the surface layers are attracted toward each other, forming a strained crystal region at intermediate interfacial separations, but if the constraints in the z -direction are lifted, the surfaces avalanche together. When the surfaces are allowed to move sideways, an interface initially out of registry (mismatch) will tend to avalanche toward an interface that is in registry (match). The effects of adsorbed S on the adhesion of Fe(100) and Fe(110) surfaces have been studied by introducing S impurity in atop, bridge, and hollow sites at a range of coverages in match and mismatch interfaces. The calculated minima of the adhesion energy curves show that the presence of S on the surface reduces the strength of the interface. However, the contaminated interfaces can be more prone to adhesion, as the increased adhesive energy values at larger separations show. The effect of adsorbed S on the charge-density distribution and magnetic properties of the interface have also been examined and related to the interfacial geometry. The effect of relaxation of the interfaces at equilibrium was also investigated and was shown to increase the strength of the interface while reducing the equilibrium interfacial separation. Some recent studies have included modeling of the surface properties of the three low-index faces of Fe33,147 – 150 ; experiments and modeling of various properties of Fe nanoparticles,151,152 nanowires,153 and nanosized clusters154 ; adhesion and other properties of high-toughness steels155,156 ; and the behavior of segregated S at an Fe grain boundary.157 Finally, it must be emphasized that having developed several approaches to model Fe substrate structures, we can now create various surface defects and impurities as well as controlled modified surface models, with modifications ranging from individual atoms, molecules, nanoclusters, and thin layers to study their effects on the surface and interface properties and the effects of temperature and pressure on the structure and properties of surfaces and interfaces. With the current focus on miniaturization, the ability to modify surfaces atomically for specific applications opens up enormous possibilities for theoretical experimentation with various conditions, surface modifications, and resultant properties, which has a great potential to aid laboratory synthesis and fabrication. Acknowledgments

We thank BHP Billiton and, specifically, their (now retired) chief scientist and vice president for technology, Robert O. Watts, for providing the initial motivation for this work and financial support. Useful discussions with Mike Finnis (Imperial College London) are gratefully acknowledged. This research was undertaken

REFERENCES

555

on the Victorian Partnership for Advanced Computing and the NCI Facility, Australia, which is supported by the Australian Commonwealth Government.

REFERENCES 1. Baddoo, N. R. J. Constr. Steel Res. 2008, 64 , 1199. 2. Kuziak, R.; Kawalla, R.; Waengler, S. Arch. Civ. Mech. Eng. 2008, 8 , 103. 3. Camley, R. E.; Celinski, Z.; Fal, T.; Glushchenko, A. V.; Hutchison, A. J.; Khivintsev, Y.; Kuanr, B.; Harward, I. R.; Veerakumar, V.; Zagorodnii, V. V. J. Magn. Magn. Mater. 2009, 321 , 2048. 4. Grabke, H. J. Mater. Corros. 2003, 54 , 736. 5. Georg, D. Eng. Aus. 2000, 72 , 30. 6. Castle, J. E. J. Adhes. 2008, 84 , 368. 7. Hayashi, S.; Sawai, S.; Iguchi, Y. ISIJ Int . 1993, 33 , 1078. 8. Payne, M. C.; Teter, M. P.; Allan, D. C.; Arias, T. A.; Joannopoulos, J. D. Rev. Mod. Phys. 1992, 64 , 1045. 9. Greeley, J.; Norskov, J. K.; Mavrikakis, M. Annu. Rev. Phys. Chem. 2002, 53 , 319. 10. Gross, A. Surf. Sci . 2002, 500 , 347. 11. Segall, M. D.; Lindan, P. J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C. J. Phys. Condes. Matter 2002, 14 , 2717. 12. Velde, G. T.; Bickelhaupt, F. M.; Baerends, E. J.; Guerra, C. F.; Van Gisbergen, S. J. A.; Snijders, J. G.; Ziegler, T. J. Comput. Chem. 2001, 22 , 931. 13. Nagy, A. Phys. Rep. Rev. Sec. Phys. Lett . 1998, 298 , 2. 14. Ordejon, P. Phys. Status Solidi B 2000, 217 , 335. 15. Schwarz, K.; Blaha, P. Comput. Mater. Sci . 2003, 28 , 259. 16. Pisani, C. J. Mol. Struct. (Theochem) 1999, 463 , 125. 17. Hong, T.; Smith, J. R.; Srolovitz, D. J. J. Adhes. Sci. Technol . 1994, 8 , 837. 18. Hong, T.; Smith, J. R.; Srolovitz, D. J. Phys. Rev. B 1993, 47 , 13615. 19. Raynolds, J. E.; Smith, J. R.; Zhao, G.-L.; Srolovitz, D. J. Phys. Rev. B 1996, 53 , 13883. 20. Hung, A.; Yarovsky, I.; Muscat, J.; Russo, S.; Snook, I.; Watts, R. O. Surf. Sci . 2002, 501 , 261. 21. Spencer, M. J. S.; Hung, A.; Snook, I. K.; Yarovsky, I. Surf. Sci . 2002, 515 , L464. 22. Hong, S. Y.; Anderson, A. B.; Smialek, J. L. Surf. Sci . 1990, 230 , 175. 23. Hong, T.; Smith, J. R.; Srolovitz, D. J. Phys. Rev. Lett. 1993, 70 , 615. 24. Hong, T.; Smith, J. R.; Srolovitz, D. J. Acta Metall. Mater. 1995, 43 , 2721. 25. Raynolds, J. E.; Roddick, E. R.; Smith, J. R.; Srolovitz, D. J. Acta Mater. 1999, 47 , 3281. 26. Smith, J. R.; Cianciolo, T. V. Surf. Sci . 1989, 210 , L229. 27. Smith, J. R.; Hong, T.; Srolovitz, D. J. Phys. Rev. Lett. 1994, 72 , 4021. 28. Smith, J. R.; Raynolds, J. E.; Roddick, E. R.; Srolovitz, D. J. J. Comput. Aided Mater. Des. 1996, 3 , 169.

556

METAL SURFACES AND INTERFACES

29. Smith, J. R.; Raynolds, J. E.; Roddick, E. R.; Srolovitz, D. J. Processing and Design Issues in High Temperature Materials: Proceedings of the Engineering Foundation Conference, 1997, p. 37. 30. Finnis, M. W. J. Phys. Conders. Matter 1996, 8 , 5811. 31. Grochola, G.; Russo, S. P.; Snook, I. K.; Yarovsky, I. J. Chem. Phys. 2002, 117 , 7685. 32. Grochola, G.; Russo, S. P.; Snook, I. K.; Yarovsky, I. J. Chem. Phys. 2002, 117 , 7676. 33. Grochola, G.; Russo, S. P.; Yarovsky, I.; Snook, I. K. J. Chem. Phys. 2004, 120 , 3425. 34. Grochola, G.; Russo, S. P.; Snook, I. K.; Yarovsky, I. J. Chem. Phys. 2002, 116 , 8547. 35. Kresse, G.; Furthmuller, J. Phys. Rev. B 1996, 54 , 11169. 36. Kresse, G.; Furthmuller, J. Comput. Mater. Sci . 1996, 6 , 15. 37. Kresse, G.; Hafner, J. Phys. Rev. B 1993, 48 , 13115. 38. Kohn, W.; Sham, L. J. Phys. Rev . 1965, 140 , 1133. 39. Perdew, J. P.; Zunger, A. Phys. Rev. B 1981, 23 , 5048. 40. Perdew, J. P.; Yue, W. Phys. Rev. B 1992, 45 , 13244. 41. Vanderbilt, D. Phys. Rev. B 1990, 41 , 7892. 42. Monkhorst, H. J.; Pack, J. D. Phys. Rev. B 1976, 13 , 5188. 43. Herper, H. C.; Hoffmann, E.; Entel, P. Phys. Rev. B 1999, 60 , 3839. 44. Jansen, H. J. F.; Peng, S. S. Phys. Rev. B 1988, 37 , 2689. 45. Dupre, A. Theorie mechanique de la chaleur, Gauthier-Villars, Paris, 1869. 46. Rose, J. H.; Smith, J. R.; Ferrante, J. Phys. Rev. B 1983, 28 , 1835. 47. Banerjea, A.; Smith, J. R. Phys. Rev. B 1988, 37 , 6632. 48. Feibelman, P. J. Surf. Sci . 1996, 360 , 297. 49. Shih, H. D.; Jona, F.; Jepsen, D. W.; Marcus, P. M. Surf. Sci . 1981, 104 , 39. 50. Shih, H. D.; Jona, F.; Bardi, U.; Marcus, P. M. J. Phys. C 1980, 13 , 3801. 51. Legg, K. O.; Jona, F.; Jepsen, D. W.; Marcus, P. M. J. Phys. C 1977, 10 , 937. 52. Spencer, M. J. S.; Hung, A.; Snook, I. K.; Yarovsky, I. Surf. Sci . 2002, 513 , 389. 53. Sokolov, J.; Jona, F.; Marcus, P. M. Phys. Rev. B 1986, 33 , 1397. 54. Xu, C.; O’Connor, D. J. Nucl. Instrum. Methods Phys. Res. 1990, 51 , 278. 55. Xu, C.; O’Connor, D. J. Nucl. Instrum. Methods Phys. Res. 1991, 53 , 315. 56. Yalisove, S. M.; Graham, W. R. J. Vac. Sci. Technol. A 1988, 6 , 588. 57. Rodriguez, A. M.; Bozzolo, G.; Ferrante, J. Surf. Sci . 1993, 289 , 100. 58. Johnson, R. A.; White, P. J. Phys. Rev. B 1976, 13 , 5293. 59. Kato, S. Jpn. J. Appl. Phys. 1974, 13 , 218. 60. Tyson, W. R. J. Appl. Phys. 1976, 47 , 459. 61. Tyson, W. R.; Ayres, R. A.; Stein, D. F. Acta Metall . 1973, 21 , 621. 62. Haftel, M. I.; Andreadis, T. D.; Lill, J. V.; Eridon, J. M. Phys. Rev. B 1990, 42 , 11540. 63. Linford, R. G.; Mitchell, L. A. Surf. Sci . 1971, 27 , 142. 64. Schweitz, J. A.; Vingsbo, O. Mater. Sci. Eng. 1971, 8 , 275.

REFERENCES

557

65. Gvozdev, A. G.; Gvozdeva, L. I. Fiz. Met. Metalloved . 1971, 31 , 640. 66. Avraamov, Y. S.; Gvozdev, A. G. Fiz. Met. Metalloved . 1967, 23 , 405. 67. Gilman, J. J. Cleavage, ductility and tenacity in crystals. In Fracture in Solids, Averbach, B. L., Felbeck, D. K., Hahn, G. T., and Thomas, B. L., Eds., Wiley, New York, 1959, p. 193. 68. Nicholas, J. F. Aust. J. Phys. 1968, 21 , 21. 69. Alden, M.; Skriver, H. L.; Mirbt, S.; Johansson, B. Surf. Sci . 1994, 315 , 157. 70. Vitos, L.; Ruban, A. V.; Skriver, H. L.; Kollar, J. Surf. Sci . 1998, 411 , 186. 71. Tyson, W. R.; Miller, W. A. Surf. Sci . 1977, 62 , 267. 72. Braun, J.; Math, C.; Postnikov, A.; Donath, M. Phys. Rev. B 2002, 65 , 184412. 73. Kishi, T.; Itoh, S. Surf. Sci . 1996, 358 , 186. 74. Ostroukhov, A. A.; Floka, V. M.; Cherepin, V. T. Surf. Sci . 1995, 333 , 1388. 75. Wu, R. Q.; Freeman, A. J. Phys. Rev. B 1993, 47 , 3904. 76. Eriksson, O.; Boring, A. M.; Albers, R. C.; Fernando, G. W.; Cooper, B. R. Phys. Rev. B 1992, 45 , 2868. 77. Alden, M.; Mirbt, S.; Skriver, H. L.; Rosengaard, N. M.; Johansson, B. Phys. Rev. B 1992, 46 , 6303. 78. Freeman, A. J.; Fu, C. L. J. Appl. Phys. 1987, 61 , 3356. 79. Ohnishi, S.; Freeman, A. J. Phys. Rev. B 1983, 28 , 6741. 80. Wang, C. S.; Freeman, A. J. Phys. Rev. B 1981, 24 , 4364. 81. Danan, H.; Herr, A.; Meyer, A. J. J. Appl. Phys. 1968, 39 , 669. 82. Binns, C.; Baker, S. H.; Demangeat, C.; Parlebas, J. C. Surf. Sci. Rep. 1999, 34 , 107. 83. Wu, R. Q.; Freeman, A. J. Phys. Rev. Lett. 1992, 69 , 2867. 84. Shih, H. D.; Jona, F.; Jepsen, D. W.; Marcus, P. M. Phys. Rev. Lett. 1981, 46 , 731. 85. Kelemen, S. R.; Kaldor, A. J. Chem. Phys. 1981, 75 , 1530. 86. Oudar, J. Bull. Soc. Fr. Mineral. Cristallogr. 1971, 94 , 225. 87. Spencer, M. J. S.; Hung, A.; Snook, I.; Yarovsky, I. Surf. Sci . 2003, 540 , 420. 88. Spencer, M. J. S.; Snook, I. K.; Yarovsky, I. J. Phys. Chem. B 2005, 109 , 9604. 89. Hung, A.; Muscat, J.; Yarovsky, I.; Russo, S. P. Surf. Sci . 2002, 513 , 511. 90. Hung, A.; Muscat, J.; Yarovsky, I.; Russo, S. P. Surf. Sci . 2002, 520 , 111. 91. Broden, G.; Gafner, G.; Bonzel, H. P. Appl. Phys. 1977, 13 , 333. 92. Fischer, R.; Fischer, N.; Schuppler, S.; Fauster, T.; Himpsel, F. J. Phys. Rev. B 1992, 46 , 9691. 93. Delchar, T. A. Surf. Sci . 1971, 27 , 11. 94. Schonhense, G.; Getzlaff, M.; Westphal, C.; Heidemann, B.; Bansmann, J. J. Phys. 1988, C8 , 1643. 95. Weissenrieder, J.; Gothelid, M.; Le Lay, G.; Karlsson, U. O. Surf. Sci . 2002, 515 , 135. 96. Berbil-Bautista, L.; Krause, S.; Hanke, T.; Bode, M.; Wiesendanger, R. Surf. Sci . 2006, 600 , L20. 97. Taga, Y.; Isogai, A.; Nakajima, K. Trans. Jpn. Inst. Met . 1976, 17 , 201. 98. Spencer, M. J. S.; Snook, I.; Yarovsky, I. J. Phys. Chem. B 2006, 110 , 956.

558

METAL SURFACES AND INTERFACES

99. Sinkovic, B.; Johnson, P. D.; Brookes, N. B.; Clarke, A.; Smith, N. V. Phys. Rev. B 1995, 52 , R6955. 100. Sinkovic, B.; Johnson, P. D.; Brookes, N. B.; Clarke, A.; Smith, N. V. Phys. Rev. Lett. 1989, 62 , 2740. 101. Johnson, P. D.; Clarke, A.; Brookes, N. B.; Hulbert, S. L.; Sinkovic, B.; Smith, N. V. Phys. Rev. Lett. 1988, 61 , 2257. 102. Clarke, A.; Brookes, N. B.; Johnson, P. D.; Weinert, M.; Sinkovic, B.; Smith, N. V. Phys. Rev. B 1990, 41 , 9659. 103. Fujita, D.; Ohgi, T.; Homma, T. Appl. Surf. Sci . 2002, 200 , 55. 104. Zhang, X. S.; Terminello, L. J.; Kim, S.; Huang, Z. Q.; Vonwittenau, A. E. S.; Shirley, D. A. J. Chem. Phys. 1988, 89 , 6538. 105. Didio, R. A.; Plummer, E. W.; Graham, W. R. Phys. Rev. Lett. 1984, 52 , 683. 106. Legg, K. O.; Jona, F.; Jepsen, D. W.; Marcus, P. M. Surf. Sci . 1977, 66 , 25. 107. Grabke, H. J.; Paulitschke, W.; Tauber, G.; Viefhaus, H. Surf. Sci . 1977, 63 , 377. 108. Grabke, H. J.; Petersen, E. M.; Srinivasan, S. R. Surf. Sci . 1977, 67 , 501. 109. Didio, R. A.; Plummer, E. W.; Graham, W. R. J. Vac. Sci. Technol. A 1984, 2 , 983. 110. Fernando, G. W.; Wilkins, J. W. Phys. Rev. B 1986, 33 , 3709. 111. Fernando, G. W.; Wilkins, J. W. Phys. Rev. B 1987, 35 , 2995. 112. Kishi, T.; Itoh, S. Surf. Sci . 1996, 363 , 100. 113. Huff, W. R. A.; Chen, Y.; Zhang, X. S.; Terminello, L. J.; Tao, F. M.; Pan, Y. K.; Kellar, S. A.; Moler, E. J.; Hussain, Z.; Wu, H.; Zheng, Y.; Zhou, X.; von Wittenau, A. E. S.; Kim, S.; Huang, Z. Q.; Yang, Z. Z.; Shirley, D. A. Phys. Rev. B 1997, 55 , 10830. 114. Chubb, S. R.; Pickett, W. E. J. Appl. Phys. 1988, 63 , 3493. 115. Chubb, S. R.; Pickett, W. E. Phys. Rev. B 1988, 38 , 10227. 116. Chubb, S. R.; Pickett, W. E. Phys. Rev. B 1988, 38 , 12700. 117. Anderson, A. B.; Hong, S. Y. Surf. Sci . 1988, 204 , L708. 118. Hong, S. Y.; Anderson, A. B. Phys. Rev. B 1988, 38 , 9417. 119. Nelson, S. G.; Spencer, M. J. S.; Snook, I.; Yarovsky, I. Surf. Sci . 2005, 590 , 63. 120. Todorova, N.; Spencer, M. J. S.; Yarovsky, I. Dynamic properties of the sulfurcontaminated Fe(110) surface. In Proceedings of the Australian Institute of Physics 16th Biennial Congress, Canberra, Australia, 2005. 121. Todorova, N.; Spencer, M. J. S.; Yarovsky, I. Surf. Sci . 2007, 601 , 665. 122. Verlet, L. Phys. Rev . 1967, 159 , 98. 123. Nose, S. Prog. Theor. Phys. Suppl . 1991, 1. 124. Jiang, D. E.; Carter, E. A. J. Phys. Chem. B 2004, 108 , 19140. 125. Kamakoti, P.; Sholl, D. S. J. Membr. Sci . 2003, 225 , 145. 126. Haug, K.; Jenkins, T. J. Phys. Chem. B 2000, 104 , 10017. 127. Spencer, M. J. S.; Todorova, N.; Yarovsky, I. Surf. Sci . 2008, 602 , 1547. 128. Spencer, M. J. S.; Yarovsky, I. J. Phy. Chem. C 2007, 111 , 16372. 129. Narayan, P. B. V.; Anderegg, J. W.; Chen, C. W. J. Electron Spectrosc. Relat. Phenom. 1982, 27 , 233. 130. Shanabarger, M. R. A comparison of adsorption kinetics on iron of H2 and H2 S. In Hydrogen Effects in Metals, Bernstein, J. M., and Thompson, A. W., Eds., The Metallurgical Society of AIME, Warrendale, PA, 1981, p. 135.

REFERENCES

559

131. Spencer, M. J. S.; Hung, A.; Snook, I.; Yarovsky, I. Surf. Rev. Lett. 2003, 10 , 169. 132. Spencer, M. J. S.; Snook, I. K.; Yarovsky, I. J. Phys. Chem. B 2004, 108 , 10965. 133. Handbook of Chemistry and Physics, 70th ed., CRC Press, Metals Park, OH, 1989–1990. 134. Taylor, P. A.; Nelson, J. S.; Dodson, B. W. Phys. Rev. B 1991, 44 , 5834. 135. Taylor, P. A. Phys. Rev. B 1991, 44 , 13026. 136. Smith, J. R.; Bozzolo, G.; Banerjea, A.; Ferrante, J. Phys. Rev. Lett. 1989, 63 , 1269. 137. Good, B. S.; Banerjea, A.; Smith, J. R.; Bozzolo, G.; Ferrante, J. Mater. Res. Soc. Symp. Proc. 1990, 193 , 313. 138. Lynden-Bell, R. M. Surf. Sci . 1991, 244 , 266. 139. Good, B. S.; Banerjea, A. J. Phys. Condens. Matter 1996, 8 , 1325. 140. Banerjea, A.; Good, B. S. Int. J. Mod. Phys. B 1997, 11 , 315. 141. Banerjea, A.; Good, B. S. Indian J. Phys. 1995, 69A, 105. 142. Nelson, J. S.; Dodson, B. W.; Taylor, P. A. Phys. Rev. B 1992, 45 , 4439. 143. Hartweck, W.; Grabke, H. J. Surf. Sci . 1979, 89 , 174. 144. Buckley, D. H. Int. J. Nondestructive Test. 1970, 2 , 171. 145. Hartweck, W. G.; Grabke, H. J. Acta Metall . 1981, 29 , 1237. 146. Spencer, M. J. S.; Snook, I. K.; Yarovsky, I. J. Phys. Chem. B 2005, 109 , 10204. 147. Jiang, D. E.; Carter, E. A. Surf. Sci . 2003, 547 , 85. 148. Zhang, J. M.; Ma, F.; Xu, K. W. Surf. Interface Anal . 2003, 35 , 662. 149. Blonski, P.; Kiejna, A. Vacuum 2004, 74 , 179. 150. Wang, X. C.; Jia, Y.; Qiankai, Y.; Wang, F.; Ma, J. X.; Hu, X. Surf. Sci . 2004, 551 , 179. 151. Postnikov, A. V.; Entel, P.; Soler, J. M. Eur. Phys. J. D 2003, 25 , 261. 152. Postnikov, A. V. Surface relaxation in solids and nanoparticles. In Computational Materials Science, Vol. 187, Catlow, R., and Kotomin, E., Eds., IOS Press, Amsterdam, 2003, p. 245. 153. Mohaddes-Ardabili, L.; Zheng, H.; Ogale, S. B.; Hannoyer, B.; Tian, W.; Wang, J.; Lofland, S. E.; Shinde, S. R.; Zhao, T.; Jia, Y.; Salamanca-Riba, L.; Schlom, D. G.; Wuttig, M.; Ramesh, R. Nat. Mater. 2004, 3 , 533. 154. De Hosson, J. T. M.; Palasantzas, G.; Vystavel, T.; Koch, S. JOM 2004, 56 , 40. 155. Hao, S.; Moran, B.; Liu, W. K.; Olson, G. B. J. Comput. Aided Mater. Des. 2003, 10 , 99. 156. Hao, S.; Liu, W. K.; Moran, B.; Vernerey, F.; Olson, G. B. Comput. Methods Appl. Mech. Eng. 2004, 193 , 1865. 157. Gesari, S. B.; Pronsato, M. E.; Juan, A. J. Phys. Chem. Solids 2004, 65 , 1337.

17

Surface Chemistry and Catalysis from Ab Initio–Based Multiscale Approaches CATHERINE STAMPFL School of Physics, The University of Sydney, Sydney, Australia

SIMONE PICCININ CNR-INFM DEMOCRITOS National Simulation Center, [email protected] Group, Trieste, Italy

Chemical problems involving heterogeneous catalysis, diffusion, and related processes occur in systems that are too large to simulate using electronic structure methods directly, requiring either the use of prohibitively large samples and/or prohibitively long simulation times. However, methods such as density functional theory, augmented by statistical mechanics techniques such as kinetic Monte Carlo, can directly address the critical issues using multiscale techniques. As a result, phase diagrams for catalytic processes can be calculated and used to model real-time catalytic processes. Significant applications considered include CO catalytic conversion, hydrogen storage, and fuel cell operation.

17.1 INTRODUCTION

Theory, computation, and simulation have been identified repeatedly in international reports and technology road maps as key components of a successful strategy toward the implementation of new energy technologies.1,2 Indeed, they play a crucial role in the advancement and development of all new technologies that require knowledge and understanding on the atomic level as well as on the nanoscale. Materials by design and the growing, exciting role of computation/simulation are making impacts across multidisciplinary fields such as physics, chemistry, engineering, and biology.

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

561

562

SURFACE CHEMISTRY AND CATALYSIS

Advances in catalytic science laid the foundation for the rapid development of the petroleum and chemical industries in the twentieth century, which contributed directly to the substantial increase in the standard of living in industrialized countries. Traditionally, catalytic science has progressed through trial and error, requiring many thousands of experiments involving complex combinations of metals, metal compounds, promoters, and inhibitors.3 With increased awareness of the need for new and improved green energy technologies and processes for an environmentally clean and sustainable future, catalysis researchers are focusing on ways to improve existing applications and develop new ones. Control and understanding on the atomic level of surface and material properties is crucial for the development of cutting-edge technologies. Lack of such knowledge presently hinders further progress in already established applications and prevents real advances in promising ones which are still at the conceptual level. Modern imaging and spectroscopic techniques are being extended to operate under increasingly realistic conditions (e.g., high pressures, high temperatures),4 and can provide quantitative information at an unprecedented level. However, determination of important properties such as adsorption and reaction energetics, structure of surface species, and the nature of transient intermediates and transition states are still highly challenging. Increasingly, accurate quantum mechanical calculations are being used to investigate such quantities and to predict new materials and structures that may lead to improved efficiencies and selectivities. Indeed, an ultimate goal of catalysis and materials research is to control chemical reactions and materials properties so that one can synthesize any desired molecule or material. Understanding the mechanisms and dynamics of such transformations has been identified as a grand challenge for catalysis and advanced materials research.5 Calculation methods derived from advanced theoretical models and implemented in efficient algorithms are crucial for fundamental understanding and ultimately for steps toward first-principles design. By combining density functional theory (DFT) calculations with statistical mechanical approaches, phenomena and properties occurring on macroscopic length and long time scales can be achieved, affording accurate predictions of surface structures, phase transitions, diffusion, and increasingly, heterogeneous catalysis.6 – 10 The present chapter contains some recent applications of first-principles-based multiscale modeling approaches for describing and predicting surface structures, phase transitions, and catalysis. In particular, through specific applications, these approaches are highlighted: (1) ab initio atomistic thermodynamics, which predicts stable (and metastable) phases, from a pool of considered structures, in equilibrium with a gas-phase environment; (2) the ab initio lattice-gas Hamiltonian plus equilibrium Monte Carlo method, which can predict stable surface structures (without their explicit consideration), including order–disorder phase transitions; and (3) ab initio kinetic Monte Carlo simulations, which in addition to the above can describe the kinetics of a system (e.g., reaction rates).

PREDICTING SURFACE STRUCTURES AND PHASE TRANSITIONS

563

17.2 PREDICTING SURFACE STRUCTURES AND PHASE TRANSITIONS 17.2.1 Oxygen on Pd(111): The Lattice-Gas Plus Monte Carlo Approach

The surface structures that form on adsorbing species on a solid surface are dictated by the lateral interactions between them. Such interactions can also significantly affect the stability of the adsorption phase and thus affect the surface function and properties. This has important consequences, for example, for heterogeneous catalysis, which involves surface processes such as adsorption, diffusion, desorption, and chemical reactions. In particular, the carbon monoxide oxidation reaction has long served as a prototypical “simple” chemical reaction for experimental study, with the aim of achieving a deeper understanding on the microscopic level.11 This reaction is the basic reaction step in many industrial reactions and is also an important reaction in its own right, as illustrated, for example, by the fact that it is one of the main reactions that the three-way automotive catalytic converter catalyzes for pollution control and environmental protection. If atomic oxygen, adsorbed on transition metal surfaces, is exposed to CO gas, the metal catalyzes the formation of carbon dioxide through a Langmuir–Hinshelwood mechanism, in which both reactants are adsorbed on the surface prior to product formation, in this case CO2 .12 The activation energy of this reaction depends on the coverage of adsorbates, indicating that the lateral interactions are significant.13 In particular, for the O/Pd(111) system, it was found that upon exposure to CO, the p(2√× 2) √islands, which initially form on √ adsorption of oxygen, compress into ( 3 × 3)R30◦ (hereafter denoted by “ 3”) domains and finally into p(2 × 1) domains.14 These structural rearrangements have profound effects on the reactivity of CO2 formation: While the p(2 √ × 2) phase is unreactive for temperatures in the range 190 to 320 K, the 3 phase displays half-order kinetics with respect to oxygen coverage, suggesting that the reaction site is at the periphery of the O islands. For the p(2 × 1) phase, the reaction is first order, implying that the reaction proceeds uniformly over the O islands. As an initial step toward a detailed understanding of the role played by lateral interactions in the CO oxidation reaction over Pd(111), it is appropriate to investigate the behavior of the system in the presence of just the oxygen adsorbate. In the following, the lattice-gas Hamiltonian plus (LGH) Monte Carlo (MC) approach15,16 will be used to describe the O/Pd(111) system and to predict order–disorder phase transition temperatures for varying oxygen coverages.17 Such an approach affords identification of unanticipated geometries and stoichiometries and can be used to describe the coexistence of phases and disordered phases, as well as associated order–order and order–disorder phase transitions. The first step is to create a sufficiently accurate lattice-gas Hamiltonian (LGH),

564

SURFACE CHEMISTRY AND CATALYSIS

which can be written as H

LGH

=V

1

i

ni +

r m=1

Vm2

ij m

ni nj +

q m=1

Vm3

ni nj nk + · · ·

(17.1)

ij km

where ni indicates the occupation of site i , which is 0 if the site is empty or 1 if it is occupied; V 1 is the one-body term, which represents the adsorption energy of the isolated adsorbate; Vm2 are the two-body, or pair, interactions (where r pair interactions are considered, with m = 1 corresponding to nearest-neighbor interactions, m = 2 second nearest-neighbor interactions, and so on); Vm3 are the three-body, or trio, interactions (where q trio interactions are considered); and so on. The LGH [Eq. (17.1)] contains an infinite number of terms, but in practice it can be truncated, since higher-order interactions become negligible compared to the lower-order terms. The interactions considered to describe the O/Pd(111) system are illustrated in Fig. 17.1. The values of the interactions are determined from least-squares fits of energies for structures calculated using density functional theory, with oxygen coverages ranging from 19 monolayer (ML) to 1 ML. To determine which interactions to include in the expansion, and to evaluate the accuracy of the LGH, we use the leave-one-out cross-validation (LOO-CV) scheme (see Refs. 18–21). It is found for this system that the set of interaction

Fig. 17.1 (color online) Top view of the oxygen adsorbates on Pd(111), where the lateral interactions between O atoms considered in the lattice-gas Hamiltonian are shown. Light gray spheres represent Pd atoms, and small dark spheres, O atoms. (From Ref. 17.)

PREDICTING SURFACE STRUCTURES AND PHASE TRANSITIONS

565

parameters which yield a high accuracy consist of six lateral interactions: three two-body interactions (V12 , V22 , V32 , with respective values of 244, 39, and −6 meV; see Fig. 17.1), and three three-body interactions (V13 , V23 , V33 , with values 31, 30–49 meV) interactions.17 It is interesting to see that the values of the twobody interactions are remarkably similar to what has been reported for O/Pt(111) (238, 39, −6 meV)18 and for the O/Ru(0001) system (265, 44, −25 meV).22 Once the LGH has been constructed, its reliability can be tested by calculating the ground-state line (or convex hull ), which identifies the lowest-energy surface structures for a given coverage. In particular, it can be observed whether it correctly reproduces that obtained directly from DFT. The formation energies (from DFT or the LGH) are calculated as O(1×1)/Pd

Ef = [EbO/Pd − Eb

]

(17.2)

which shows the stability of a structure with respect to phase separation into a fraction of the full monolayer O(1 × 1)/Pd and a fraction, 1 − , of the clean slab. In Eq. (17.2), Eb represent the binding energy per oxygen atom of a given oxygen adsorption structure on the Pd(100) surface. For example, the binding energy of oxygen on a surface with 1 ML coverage is given by O(1×1)/Pd O(1×1)/Pd O(1×1)/Pd O O Pd Pd Eb = Etot − Etot − 1/2Etot2 , where Etot , Etot , and Etot2 are the total energies of the O(1 × 1)/Pd(100) structure, the clean Pd(100) surface, and an oxygen molecule, respectively. In Fig. 17.2, the formation energy as a function of oxygen coverage is shown. From it, the structures belonging to the convex hull (lowest-energy line) can be identified. All structures with a formation energy higher than that for the same coverage are unstable against phase

Fig. 17.2 (color online) Formation energy, Ef , versus coverage, , of the twenty-two structures calculated directly from density-functional theory (DFT) (large pale dots) and those obtained from the lattice-gas Hamiltonian (LGH). The continuous (lowest energy) line represents the convex hull. (From Ref. 17.)

566

SURFACE CHEMISTRY AND CATALYSIS

separation into the two closest structures belonging to the convex hull. It can be seen that there is an excellent agreement between the DFT and the LGH formation energies, except for very high coverages, where there are large atomic relaxations which are difficult to capture in the LGH. √ The ground-state geometries lying on the convex hull are the p(2 × 2), 3, and p(2 × 1) structures. The former two agree with experimental results.23 The p(2 × 1) structure is also observed experimentally, but only, for example, when the O/Pd(111) system is exposed to CO gas.14,23 Importantly, both DFT and the LGH calculations predict the same ground-state structures, indicating that the LGH is sufficiently accurate to describe the correct ordering of the adsorbates on the surface. Having constructed the LGH, it can be used, for example, to predict temperature-driven phase transitions. Although there are no experimental results for the O/Pd(111) system published to date, it can be expected, for example, that configurational entropy will drive a phase transition to a disordered phase at elevated temperatures. Such phase transitions have been reported for O/Ru(0001),15,24 where it was shown that the transition temperature depends strongly on the oxygen coverage. For this latter system, two peaks occur, one at 0.25 ML (800 K) and the other at 0.50 ML (600 K), which correspond to the stable p(2 × 2) and p(2 × 1) phases. Qualitatively, the same behavior was found for the O/Pt(111) system through similar theoretical simulations.18 Also, the O/Ni(111) system forms a stable p(2 × 2) structure, which exhibits a pronounced peak in the order–disorder transition temperature versus coverage curve.25 To investigate order–disorder phase transitions, Monte Carlo (MC) simulations can be carried out. In particular, we employ the Wang–Landau scheme, which affords an efficient evaluation of the configurational density of states, g(E ), (i.e., the number of system configurations with a certain energy, E ).26 – 29 From this, all major thermodynamic functions can be directly calculated, including the free energy, g(E)e−E/kB T = kB T ln(Z) (17.3) F (T ) = −kB T ln E

where Z is the partition function, kB is the Boltzmann constant, and T is the temperature. The internal energy is given as Eg(E)e−E/kB T (17.4) U (T ) = ET = E Z the specific heat as Cv (T ) =

E 2 T − E2T T2

(17.5)

PREDICTING SURFACE STRUCTURES AND PHASE TRANSITIONS

567

660 630

Tc (K)

600 570 540 510 480

0.2

0.3

0.4 Coverage (ML)

0.5

Fig. 17.3 Order–disorder transition temperature, Tc , as a function of the oxygen coverage. (From Ref. 17.)

and the entropy as X=

U −F T

(17.6)

Using the Wang–Landau scheme for a given coverage, a single simulation yields g(E ) and hence the transition temperature, Tc , while in traditional MC studies based on the Metropolis algorithm, one needs to perform a series of simulations at various temperatures to check the variations of a properly defined order parameter. From the divergence of the specific heat at the order–disorder transition temperature, the dependence on coverage of the transition temperature is obtained as shown in Fig. 17.3. In this figure two pronounced peaks occur, corresponding to the p(2 × 2) and p(2 × 1) phases. As noted above, to date, no experimental results have been reported for order–disorder phase-transition temperatures as a function of coverage for this system; thus, the predictions in Fig. 17.3 await experimental confirmation. A similar theoretical approach has been used to study the O/Pd(100) system.19 This study was limited to low oxygen coverages (i.e., 0 to 0.35 ML), but a similar peak of Tc at 0.25 ML was observed. Zhang et al.,19 through comparison with experiment and from investigation of different theoretical treatments found that the main source of uncertainty in the lateral interactions is the exchange-correlation functional employed, and other approximations, such as a finite number of lateral interactions, neglect of vibrational contributions, and neglect of population of other sites besides the most favorable one, have relatively negligible effects.

568

SURFACE CHEMISTRY AND CATALYSIS

17.3 SURFACE PHASE DIAGRAMS FROM AB INITIO ATOMISTIC THERMODYNAMICS 17.3.1 Ag–Cu Alloy Surface and Chemical Reactions in an Oxygen and Ethylene Atmosphere

The ab initio atomistic thermodynamics approach describes systems in thermodynamic equilibrium, taking into account the effect of the atmosphere or “environment” (e.g., a gas phase of one or more species) through the chemical potential.30 – 35 This method uses results from first-principles electronic structure theory to calculate the Gibbs free energy. Various surface structures can be compared to determine which is the most stable for certain temperature and gas pressure conditions, which is correlated to the chemical potential. It is an indirect approach in that its reliability depends on the structures explicitly considered. These structures are restricted to being ordered, due to the periodic boundary conditions employed in the supercell approach which most modern density functional theory codes use. Despite these restrictions, it represents a very valuable first step in the study of surfaces under realistic conditions. In the following, this approach is used for the study of ethylene epoxidation over an Ag–Cu alloy catalyst. On the basis of experiments and first-principles calculations, it has been proposed that if an Ag–Cu alloy is used instead of the traditional Ag catalyst, the selectivity toward ethylene oxide is improved. Experimentally, it was shown through ex situ x-ray photoelectron spectroscopy (XPS) measurements that the copper surface content is much higher than the overall content of the alloy, indicating copper segregation to the surface.36 This led to the theoretical consideration of a model in which one out of four silver atoms is replaced by a copper atom (i.e., representing a two-dimensional surface alloy).37,38 At the temperatures and pressures used in the experiments (e.g., ∼530 K, 0.1 atm), however, copper oxidizes to CuO, and at higher temperatures or lower pressures, to Cu2 O. Therefore, it is possible that more complex structures are present on the catalyst surface. Indeed, our recent studies show that a two-dimensional Ag–Cu surface alloy is not stable in an environment containing oxygen and ethylene at temperatures and pressures relevant for industrial applications, as explained below. Rather, the results show that thin surface copper oxide–like films form. These predictions are supported by recent XPS measurements and high-resolution transmission electron microscopy results.39 As a first step into the theoretical study of this system, the Ag–Cu alloy surfaces are considered in contact with a pure oxygen environment. As a second step, the effect of the ethylene gas phase is investigated. The most stable surface structures are those that minimize the change in the Gibbs surface free energy, G(μO ) =

1 O/Cu/Ag (G − Gslab − NAg μAg − NCu μCu − NO μO ) A

(17.7)

where NAg is the difference in the number of Ag atoms between the adsorption system and the clean Ag slab, and NCu is the number of Cu atoms. μCu , μAg ,

SURFACE PHASE DIAGRAMS FROM AB INITIO

569

and μO are the copper, silver, and oxygen chemical potentials, respectively. The Ag and Cu chemical potentials are taken to be that of an Ag and Cu atom in the respective bulk material. This assumes that the system is in equilibrium with bulk Ag, which acts as the reservoir. GO/Cu/Ag and Gslab are the free energies of the adsorbate structure and the clean Ag slab, respectively. Normalization to the surface area, A, allows comparison of structures with different unit cells. The temperature and pressure dependence enters through the oxygen chemical potential,31 1 pO2 total 0 0 μO (T , p) = ˜ O2 (T , p ) + kB T ln 0 EO2 (T , p ) + μ 2 p

(17.8)

Here p 0 is the standard pressure (1 atm) and μ ˜ O2 (T , p 0 ) is the chemical potential at the standard pressure. This can be obtained either from thermochemical tables40 (as done in this case) or calculated directly. Contributions to the free energy due to vibrations should be taken into account. For O/Ag34 and O/Cu41 systems studied in the literature, such contributions have been shown to be sufficiently small (e.g., ˚ 2 ) as not to play an important role. This was also found for two −1.23 eV. Figures 17.5a and 17.5b show the atomic geometry of the p2 and p4-OCu3 structures, as well as a CuO-like structure CuO(1L) (Fig. 17.5c), which is like a layer of bulk CuO forced to match the (2 × 2) lattice of the underlying Ag(111) surface. Also shown is a structure with 1 ML of Cu and 1 ML of O on top of the Cu layer, labeled O1ML (Fig. 17.5d). It is worth noting that in the absence of oxygen, Cu prefers to be located in the subsurface layer, that is, beneath the outermost Ag layer, but when there is oxygen in the atmosphere, the copper atoms segregate to the surface and form thin surface oxide–like structures. Moreover, a two-dimensional surface Ag–Cu alloy is not stable anywhere in the range of chemical potential considered. On the other hand, there is a narrow region in which two-dimensional O–Cu surface oxides are stable. This is indicated in Fig. 17.4 by the region labeled “surface oxides.” In this region thin O–Cu structures have the lowest Gibbs surface free energy. The results presented in Fig. 17.4 correspond to the situation where there is no limit to the Cu concentration. For the Ag–Cu alloy catalysts, however, there is only ≈2.5% Cu. At the surface, in an oxygen and reaction atmosphere, it is estimated from experiment that the surface has around 50 times more Cu atoms compared to the nominal bulk component. Moreover, from XPS studies, the Cu content on the surface is suggested to be in the range 0.1 to 0.75 ML.42

SURFACE PHASE DIAGRAMS FROM AB INITIO

(a)

(b)

(c)

(d)

571

Fig. 17.5 (color online) Top view of four surface structures considered: (a) p2; (b) p4-OCu3 ; (c) CuO(1L); (d) O1ML/Cu1ML. The gray spheres represent the underlying Ag(111) substrate. Copper atoms are shown as large dark circles, and oxygen atoms are the small dark circles. The black lines represent the surface unit cells. (From Ref. 30.)

To consider explicit Cu concentrations in the theory, we can use the results of Fig. 17.4 to determine the structures that will be present on the surface as a function of copper content and the oxygen chemical potential. In doing this, published results for many O–Ag structures were also utilized for the system in the absence of copper. To construct such a surface phase diagram, for a given value of the oxygen chemical potential, the surface free energy is plotted versus the copper content in the various considered structures. From this, the convex hull of the stable structures can be identified. By repeating this for the other values of the oxygen chemical potential in the range considered, the phase diagram as a function of the oxygen chemical potential and Cu content can be constructed. This is shown in Fig. 17.6. It can be seen that for a value of μO = −0.61eV, which

572

SURFACE CHEMISTRY AND CATALYSIS

Fig. 17.6 Surface phase diagram showing structures belonging to the convex hull as a function of the Cu surface content and the change in oxygen chemical potential, μO . (From Ref. 30.)

corresponds to conditions typical of industrial applications (p = 1 atm, T = 600 K) and for Cu content below 0.5 ML, the results predict that there will be patches of one-layer oxidic structures (i.e., p4-Cu3 ) which coexist with the clean Ag surface. For higher values of μO , O–Ag structures are predicted in coexistence with the p4-Cu3 structure. For higher Cu contents, the CuO(1L) and p2 structures are predicted to be present above and below μO = −0.75eV, respectively. For even higher Cu contents, bulk CuO is predicted to form on the surface. These predictions are consistent with recent experiments performed on the Ag–Cu system under catalytic conditions,43 where through a combination of in situ XPS and near-edge x-ray absorption fine structure measurements, thin layers of CuO are found to be present on the surface. Areas of clean Ag are also present on the surface, in agreement with theory. Analogous calculations have been carried out for the other two low-index surfaces, (100) and (110).44 A scenario similar to that of the (111) surface is found; that is, the presence of oxygen leads to copper segregation to the surface, and thin copper oxide–like layers are predicted on top of the silver surface, as well as copper-free structures. Having studied Ag–Cu alloy surfaces in a pure oxygen environment, it is important to consider the effect of the (reducing) reactant ethylene. This is discussed below for the (111) surface. To do this, a “constrained thermodynamic equilibrium” approach is assumed, which considers the stability of the thin oxide-like layers toward the oxidation of ethylene to acetaldehyde

SURFACE PHASE DIAGRAMS FROM AB INITIO

573

(thermodynamically favored reaction product). For a surface with stoichiometry Agx Cuy Oz , the condition of stability is μC2 H4 − μO ≤

−2 Hf (T = 0 K) + E mol z

(17.11)

where μC2 H4 is the ethylene chemical potential with respect to its zerotemperature value. Hf (T = 0 K) is the zero-temperature formation energy of the surface structure, and E mol = ECH3 CHO − EC2 H4 − 12 EO2

(17.12)

μC2H4 (eV)

calculated to be −2.18 eV. Considering a Cu surface coverage of 0.5 ML, the surface phase diagram, as a function of oxygen and ethylene chemical potentials is shown in Fig. 17.7. The region corresponding to typical experimental conditions is indicated as that enclosed by the black dashed lines. It can be seen that

μO (eV)

Fig. 17.7 Surface phase diagram for the (111) surface of the Ag–Cu alloy under constrained thermodynamic equilibrium with an atmosphere of oxygen and ethylene. The shaded areas represent the region of stability of a combination of two surface structures giving a Cu coverage of 0.5 ML. The white area corresponds to the clean Ag(111) surface, where Cu is assumed to be in a bulk reservoir, and ethylene is oxidized to acetaldehyde. The dashed polygon encloses the region that corresponds to typical values of temperature and pressure used in experiments (T = 300 to 600 K and pO2 , pC2 H4 = 10−4 − 1 atm). (From Ref. 39.)

574

SURFACE CHEMISTRY AND CATALYSIS

several structures can be present, all stable with respect to reduction by ethylene. Neglecting the effect of ethylene, therefore, the relative stability of the structures from all the low-index surfaces can be investigated as a function of the Cu surface content for a representative oxygen chemical potential (μO = −0.61 eV). Here the chemical potential of Cu is used as a parameter to control the Cu content. The results are shown in Fig. 17.8, where for several values of μCu the shapes predicted for the particles are shown, obtained by minimizing the surface free energy according to the Wulff construction.45 For the value selected of μO selected, the value of μCu above which Cu oxidizes to bulk copper oxide is −0.62 eV. The values of μCu compatible with the experimentally indicated Cu coverages (0.1 to 0.75 ML) are those close to the formation of bulk copper oxide. Around this region, both the (100) and (110) surfaces are covered with

Fig. 17.8 (color online) (Top) Atomic geometry of four of the most stable oxidelike structures on the surface of Ag–Cu particles in an oxidizing atmosphere. Large light gray spheres represent Ag atoms, small spheres, O atoms; and dark spheres, Cu atoms. (Bottom) Surface energy versus the Cu chemical potential for μO of −0.61 eV (corresponding to T = 600 K and pO2 = 1 atm). At selected values of μCu , the predicted particle shape, as obtained through the Wulff construction, is presented. (From Ref. 39.)

SURFACE PHASE DIAGRAMS FROM AB INITIO

575

a one-layer oxidelike structure with a ratio of Cu to O of 1, denoted “CuO/Ag.” For values of μCu < −0.65 eV, all facets are covered with Cu-free structures. Having predicted the equilibrium shape and surface structures of the Ag–Cu catalyst under conditions of practical interest, the adsorption of ethylene and the two competing chemical reactions leading to the formation of acetaldehyde (Ac) and ethylene oxide (EO) (see Fig. 17.9) can be investigated. For the (2 × 2)O/Ag(111) and (2 × 2)-O/Ag(100) surfaces, both reactions are known to proceed through a common oxametallacycle (OMC)37,38,46,47 intermediate, where ethylene is bonded with one C atom to a surface metal atom and with the other C atom bonded to oxygen. The OMC is shown in Fig. 17.9 (leftmost panel). Similar findings have also been reported for Ag oxides.48 From calculations of the reaction pathways for Ac and EO formation over the predicted stable surface structures, it is found that the behavior can be quite varied,49 depending on the surface structure; in particular, for the (111) surface formation of EO does not involve the formation of any intermediate for the p2/Ag(111), p4-OCu3 /Ag(111), and CuO/Ag(111) structures. For formation of Ac over the CuO/Ag(111) surface, the reaction does, however, proceed by an OMC, but this is a metastable state. Ac formation over the p2/Ag(111) surface involves the formation of a different stable intermediate in which ethylene is bound to one oxygen on each carbon. The OMC, on the other hand, is a common intermediate for both Ac and EO formation over the (2 × 2)-O/Ag(111), CuO/Ag(100), and CuO/Ag(110) surfaces. In Fig. 17.10 the transition states for Ac and EO formation over the (2 × 2)-O/Ag(111) and CuO/Ag(111) surfaces are shown as an example. The activation barrier for EO formation is lower than that of Ac for the CuO/Ag(111) structure, while the trend is the opposite for the (2 × 2)O/Ag(111) surface. This is consistent with, and possibly partially explains, the greater selectivity reported experimentally for the Ag–Cu catalysts compared to pure silver. As mentioned above, the nature of the reaction pathways for the surface structures identified to be potentially catalytically relevant for

Fig. 17.9 (color online) Atomic geometry of the oxametallacycle (OMC) intermediate (left) and final states acetaldehyde (Ac) (center) and ethylene oxide (EO) (right) on (2×2)O/Ag(111). (From Ref. 49.)

576

SURFACE CHEMISTRY AND CATALYSIS O(2 × 2)/Ag(111)

CuO/Ag(111)

Ac

EO

TOP

Fig. 17.10 (color online) Transition-state geometries for the formation of acetaldehyde (top panels) and ethylene oxide (central panels) and top view of the surface for the reaction over (2×2)-O/Ag(111) and for the CuO/Ag(111) structure (bottom panels). The large light gray spheres represent Ag atoms; the large dark ones, Cu; the medium dark ones, O; and the very small spheres, H atoms. (From Ref. 49.)

the low-index surfaces are quite varied, but the preliminary results point to the Cu-containing structures providing better selectivity toward EO formation, consistent with experimental measurements. Fore more details, see Ref. 49. 17.4 CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE CARLO SIMULATIONS 17.4.1 CO Oxidation Reaction over Pd(100)

The importance of molecular-level mechanisms and their interplay for determining observable macroscopic (and microscopic) material phenomena is without

CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE

577

question. Often, as, for example, in the study of order–disorder phase transition temperatures discussed in Section 17.2, there is no direct link between the microscopic (electronic) theory and experimental measurables, and appropriate “hierarchical” approaches have to be developed that link the physics across all relevant length and time scales into one multiscale simulation.50 A particularly successful approach is that of ab initio kinetic Monte (kMC). Considering, for example, the study of heterogeneous catalysis, for given gas-phase conditions, such calculations can determine the detailed surface composition and the occurrence of each individual elementary process at any time. From the latter, the catalytic activity (i.e., product formation) per surface area can also be obtained, either time-resolved (e.g., during induction, when the catalyst surface is being restructured to its active form) or time-averaged, during steady state. A recent comprehensive description of the kMC approach using microscopic parameters obtained from ab initio electronic structure total energy calculations for heterogeneous catalysis is given in Ref. 7. First-principles-based kMC involves, first, a determination of the elementary steps involved in the particular process to be studied, and their calculation by electronic structure, total energy calculations (most typically using density functional theory). For catalysis, these would include adsorption and desorption of reactants and reaction intermediates, as well as surface diffusion and surface reactions. The second step concerns describing the statistical interplay of the elementary processes as achieved by kinetic Monte Carlo simulations.51 In kMC the relationship between “MC time” and “real time” is obtained by regarding the MC process as providing a numerical solution to the Markovian master equation describing the dynamic system evolution.52 – 56 A sequence of configurations is generated using random numbers. For each step (new configuration), all possible elementary processes and the rates with which they occur are calculated. These processes are weighted by the rates, and one of the processes is executed randomly to achieve the new system configuration. In this way the kMC algorithm effectively simulates stochastic processes, and a direct relationship between kMC time and real time is established. The flow diagram for the kMC process is shown in Fig. 17.11. Properly evaluating the time evolution requires simulation cells that are large enough to capture the effects of correlation and spatial distribution of the species at the surface. Most processes considered in kMC are highly activated and occur on time scales orders of magnitude longer than, for example, a typical vibration (10−12 s). Due to these “rare events,” the statistical interplay of the elementary processes need to be evaluated over time scales that can reach to seconds and more. A recent application demonstrating the power of this approach is the study of the CO oxidation reaction over the Pd(100) surface. The motivation for this study is related to the increasing awareness that for oxidation catalysis (i.e., under atmospheric oxygen conditions) the surface of a transition metal (TM) catalyst may be oxidized, and instead of being the pure TM surface, which is often the subject of quantitative ultrahigh-vacuum (UHV) surface science studies, the oxidized material may be active for the catalysis. This has recently

578

SURFACE CHEMISTRY AND CATALYSIS

Fig. 17.11 (color online) Flow diagram showing the basic steps in a kinetic Monte Carlo simulation. First, loop over all the lattice sites and determine the elementary atomic processes that are possible for the current system configuration. Then generate two random numbers and advance the system configuration according to the process selected by the first random number. Then, increment the clock according to the rates and the second random number as prescribed by an ensemble of Poisson processes, and then start all over again or stop if the simulation time is sufficiently long. (From Ref. 6.)

been revealed for CO oxidation employing Ru catalysts. In this case, bulk oxide RuO2 is, in fact, the stable phase under reactive conditions.57,58 For TMs farther to the right in the periodic table, the late TM and noble metals, which are also used in oxidation catalysis, the situation is different; thus, it is of great interest to consider the analogous reaction of CO oxidation over the more noble metal, Pd. Briefly, from the kMC simulations described below, it was found that oxide formation in the reactive environment also plays a significant role, but a difference is that this oxide is not a bulklike film that once it becomes stable, actuates the catalysis; rather, the study indicates the relevance of a subnanometer surface oxide structure which is probably formed continuously and reacted away in the sustained catalytic operation. As a first step in this study, using the approach of ab initio atomistic thermodynamics described in Section 17.3, the surface structure and stability of the Pd(100) surface in an atmosphere containing oxygen and carbon monoxide, for a wide range of partial pressures and temperatures, is studied. The resulting phase diagram is shown in Fig. 17.12.59,60 Here, a constrained atomic thermodynamics approach was employed,61,62 as for the Ag–Cu alloy catalysts described in Section 17.3 for ethylene oxidation, in which it is assumed that the surface is in equilibrium with i separate reservoirs representing the i gas-phase species, each characterized by the chemical potential μi (T , pi ) with partial pressure pi and temperature T . The character of the surface phase diagram can be described in terms of three regions: first, a region where bulklike thick oxide films are stable (crosshatched region); then a region consisting of adsorption

CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE

579

√ √ √ phases on a ( 5 × 5)R27◦ (hereafter denoted “ 5”) surface oxide (hatched area), which has recently been characterized and resembles a layer of PdO(101) on the surface63 ; and finally, a region with different CO and O adsorption phases on Pd(100). Gas-phase conditions, representative of technological CO oxidation catalysis (pi ∼ 1 atm, T ∼ 300 to 600 K),√correspond to the phase boundary between the regions of adsorption on the 5 surface oxide and that of COcovered Pd(100). Thus, unlike for Ru, the presence of bulk oxides in the √ reactive environment can be ruled out, while the stability region of the thin 5 surface oxide structure extends into such conditions. √ To investigate the reactivity of the 5 phase, and to see if its stability region changes when the kinetic effects of catalytic reaction on the surface are taken into account, kinetic Monte Carlo calculations are carried out. In these simulations, hollow and bridge sites are considered and all nonconcerted adsorption, desorption, diffusion, and Langmuir–Hinshelwood reaction processes (where both reactants are adsorbed on the surface prior to reaction to the product) involving these sites: in all, 26 elementary processes. Also, nearest-neighbor lateral interactions are taken into account in the elementary process rates. The required (14) interaction parameters are determined from DFT calculations of √ 29 ordered configurations with O and/or CO in bridge and hollow sites of the 5 surface unit cell. The resulting adsorption energies are expressed in terms of the LGH expansion. The kMC simulations are performed on a lattice comprising (50 × 50) surface unit cells for fixed (T , pO2 , pCO ) conditions, in particular for pO2 = 1 atm and temperatures in the range 300 to 600 K. Initially, the CO partial pressure was chosen √ to be low, 10−5 atm, corresponding to the middle of the stability region of the 5 phase, and subsequently increased, moving closer and closer to the √ boundary of the stability region of the 5 phase. This is indicated by the vertical arrows in Fig. 17.12. When √ the surface reaction consumes surface oxygen faster than it is replenished, the 5 phase becomes destabilized. To determine the onset of the structural destabilization from the kMC simulations, the percentage occupation of O atoms in hollow sites is monitored as a function √ of CO partial pressure. Full occupation of these sites corresponds to the intact 5 phase. The results are shown in Fig. 17.13. Interpreting a reduction to 95% occupation as the onset of decomposition, the results predict critical CO pressures of 5 × 10−2 , 10−1 , and 10 atm at 300, 400, and 600 K, respectively. These results are rather similar to those obtained from the constrained atomistic thermodynamics approach, which are shown in Fig. 17.13 as the vertical lines. The critical pressures obtained (e.g., at 400 K pO2 /pCO ≈ 10 : 1) are in good accord with reactor scanning tunneling microscopy (STM) experiments64 performed under such gas-phase conditions. Importantly, the theoretical results show that for relevant pO2 /pCO ratios, the turnover frequencies (number of CO2 molecules produced per site per second) √ for the intact 5 surface oxide alone are already of a similar order of magnitude to those reported experimentally65 for the Pd(100) surface under comparable gas-phase conditions. This shows that this particular surface oxide is certainly not “inactive” with respect to the oxidation of CO, which is contrary to early prevalent general preconceptions.

580 ΔμCO (eV)

600 K

300 K

1

105

400 K

10–10

1

105

0.0

PdO bulk

10–30 10–10

10–20 10–5

10–10 1

1

600 k 300 k

1010

Surface oxide (√5 × √5) R27°

–1.0 –0.5 ΔμO (eV)

10–20

10

10–30

10

P(2 × 2) –O/Pd(100)

–2.5 –1.5

–2.0

–1.5

–1.0

–0.5

0.0

10

pO2 (atm) –5

–10

Surface oxide +O bridge

Surface oxide +CO bridge

Surface oxide +2CO bridge

Fig. 17.12 (color online) Surface phase diagram for the Pd(100) surface in constrained thermodynamic equilibrium with an environment containing O2 and CO. The various surface structures corresponding to the regions in the phase diagram are illustrated. The pressures corresponding to the O2 and CO chemical potentials are shown for temperatures of 300 and 600 K. The thick black line marks gas-phase conditions representative of that employed for technological CO oxidation catalysis (i.e., partial pressures of 1 atm and temperatures between 300 and 600 K). The three vertical lines correspond to the gas-phase conditions employed in the kinetic Monte Carlo simulations shown in Fig. 17.13. (From Ref. 60.)

Clean Pd(100)

(2 √2 × √2) R 45° CO/Pd(100)

(3 √2 × √2) R 45° CO/Pd(100)

(4 √2 × √2) R 45° CO/Pd(100)

(1 × 1)–CO bridge/ Pd(100)

–15

PCO (atm)

CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE

581

Coverage ΘOhol (%)

100

pO = 1 atm 2

50

0 10–5

T = 300 K T = 400 K T = 600 K

100

105

CO pressure (atm)

Fig. 17.13 (color Average coverage (occupation) of oxygen atoms in hollow √ √ √ online) as obtained from kinetic sites of the ( 5 × 5)R27◦ ( 5) surface oxide-like structure √ Monte √ Carlo simulations. 100% corresponds to the intact 5 structure. The reduction of the 5 surface oxide-like phase occurs at CO pressures close to those corresponding to the stability boundary (transition from the hatched to the plain areas in Fig. 17.12) and indicated by the vertical lines in Fig. 17.12. (From Ref. 59.)

17.4.2 Permeability of Hydrogen in Amorphous Materials

In a new application, first-principles kinetic Monte Carlo–based simulations have recently been used for the study of the permeability of hydrogen through crystalline and amorphous membranes.9,66,67 The use of metal membranes can potentially play an important role in the large-scale production of high-purity hydrogen, which is required for its use as a fuel in (polymer electrolyte) fuel cell technologies.68 In these membranes, hydrogen permeates through the film by dissociation of molecular hydrogen, diffusion of atomic H through interstitial sites, and then recombination to H2 . Permeation of hydrogen occurs at much greater rates than other elements; thus, the membranes, can deliver high-purity H2 from gas mixtures containing large concentrations of other species. There has been a recent focus on exploring the possibility that amorphous metals may represent a promising new class of membranes, which are to date relatively unexplored compared to crystalline metals and alloys. Hao and Sholl9 have recently investigated hydrogen permeability through amorphous and crystalline Fe3 B metal films. The scheme involves kinetic Monte Carlo simulations and the goal is that this approach could be used to identify materials with high potential for improved performance through an efficient screening of candidate structures. The structure of crystalline Fe3 B is shown in Fig. 17.14b, while an amorphous structure obtained from molecular dynamics simulations is shown in Fig. 17.14a. Considering H2 transport through a film, the rate is often limited by interstitial diffusion of H through the bulk material. In this case, the flux can be related to the operating conditions if the solubility and diffusion coefficient of interstitial H is known. The latter quantity can be accurately calculated for crystalline materials from first-principles-based approaches. For amorphous solids the situation is, however, more complex. In this case a detailed model for the atomic structure must first be generated. Once this is established, the sites can

582

SURFACE CHEMISTRY AND CATALYSIS

B Fe (a)

B Fe (b)

Fig. 17.14 (color online) Atomic structure of crystalline Fe3 B (b) and an example of an amorphous structure of Fe3 B (a) as generated from a molecular dynamics simulation. (From S. Hao, private communication.)

be occupied with interstitial hydrogen and the transition states for diffusion of H atoms between sites can be identified. For amorphous materials, the solubility is typically stronger than in the crystalline counterpart, due to the greater range of interstitial binding sites, some of which can bind H notably stronger. This results in the effects of H concentration being greater for amorphous systems, and this must be taken into account. To investigate this, Hao and Sholl9 carried out simulations for various concentrations for both crystalline c-Fe3 B and amorphous a-Fe3 B. As the first step, the amorphous geometry was created through an ab initio molecular dynamics simulation of a representative liquidlike sample of 100 atoms, which was rapidly quenched and then an energy minimization carried out. Subsequently, the interstitial sites were identified. This was done using an automatic procedure for the amorphous structure, due to the great number of them.

CATALYSIS AND DIFFUSION FROM AB INITIO KINETIC MONTE

583

The binding energies and the interactions between H atoms in the interstitial sites were then calculated using density functional theory. From the site energies and the H–H interaction energies, the solubility of H in a-Fe3 B and c-Fe3 B was obtained using grand canonical Monte Carlo calculations.69 The result is shown in Fig. 17.15, plotted as a function of temperature and H2 pressure. An important finding (Fig. 17.15) is that the H solubility is far larger in the amorphous material than in the crystalline material (e.g., two to three orders of magnitude at 600 K). It can also be noticed that the qualitative dependence of the solubility on temperature is different for the amorphous and crystalline materials, which is attributed to the broad distribution of site energies in the amorphous material.9 Calculation of H diffusion requires the calculation of transition states between adjacent H sites. Initially, Hao and Sholl employed an approximation for the positions of the transition states before carrying out the more computationally expensive DFT calculations. For a-Fe3 B, this involved determining a huge number (462) of transition states, highlighting the complexity of treating the amorphous structure. Once determined, the rates and the H diffusion can be calculated using kinetic MC. On investigating the concentration dependence of the diffusion coefficient for amorphous Fe3 B, it was found that for increasing concentration, the diffusion coefficient increases (e.g., at 600 K by around three orders of magnitude for H concentration varying from 0 to 0.2H/M) and then begins to decrease again. This behavior was explained by the fact that at low concentrations the strongest binding sites are occupied, which have associated large diffusion barriers. For higher concentrations, these sites are occupied, and less favored sites become populated which have smaller barriers for diffusion. For even higher concentration, the diffusion coefficient decreases due to blocking effects by the interstitial H atoms.

Solubility (H/M)

10–1

10–2

10–3

10

–4

10–5 200

a–, 10 atm a–, 1 atm a–, 0.01 atm c–, 10 atm c–, 1 atm c–, 0.01 atm 400

600 800 Temperature (K)

1000

Fig. 17.15 (color online) Calculated H solubility in a-Fe3 B (solid curves) and c-Fe3 B (dashed curves) as a function of temperature for several H2 pressures. Lines are guides to the eye. (From Ref. 9.)

584

SURFACE CHEMISTRY AND CATALYSIS

H2 permeability (mol/m/s/Pa0.5)

10–7 10–8 Pd a– Fe3B c– Fe3B

10–9 10–10 10–11 10–12 10–13

600

700

800 900 Temperature (K)

1000

Fig. 17.16 (color online) Calculated permeability of H2 in a-Fe3 B and c-Fe3 B at different temperatures. The “feed pressure” was 10 atm and the permeate pressure was 1 atm. The permeability of pure Pd is also shown for comparison. (From Ref. 9.)

To make contact with the experimental results, the more relevant quantity is H permeation through these materials, which involves calculation of the flux through the membrane. Here it was assumed that the net transport is dominated by diffusion through the bulk of the membrane. The results obtained are shown in Fig. 17.16 for particular pressures. It can be seen that the permeability of the amorphous material is about 1.5 to 2 orders of magnitude larger than the crystalline material, supporting the notion that amorphous structures can indeed have higher permeabilities. It is noted that the permeability of pure Pd is greater than that of both a-Fe3 B and c-Fe3 B, although the latter material was chosen not because it was thought it may yield greater permeabilities than Pd, but because it represented a system in which a detailed comparison of the behavior of a crystalline versus an amorphous system could be achieved. 17.5 SUMMARY

In this chapter, recent applications and results of first-principles-based approaches to describing and predicting surface properties, such as structures, stoichiometry, phase transitions, and heterogeneous catalysis, and also bulk properties, including solubility, diffusivity, and permeability, were discussed. Three particular calculation approaches were highlighted which are often described under the label “multiscale modeling.” First, using the lattice-gas Hamiltonian (LGH) in combination with equilibrium Monte Carlo (MC) simulations, order–disorder phase transitions for the O/Pd(111) system were presented. This approach is truly predictive in nature in that completely unanticipated structures can be found. It can, in principle, also describe the coexistence of phases and configurational

SUMMARY

585

entropy. For the case of O/Pd(111) the recently introduced MC scheme of Wang and Landau was used. This algorithm enables direct evaluation of the density of (configurational) states, and thus straightforward determination of the main thermodynamic functions. Using the ab initio atomistic thermodynamics approach, the alloy catalyst Ag–Cu was investigated regarding its surface structure and activity for the ethylene epoxidation reaction. In this approach the free energy for surface structures are calculated, from which the stability range of various identified low-energy phases are predicted. The main limitation of this method is that its predictive power is limited to the explicitly considered surface structures, and that due to the supercell approach used in most modern first-principles approaches, the structures investigated are restricted to be periodic. From investigation of the chemical reactions over the surface phases identified, the calculations showed that first under reaction conditions the catalyst surface is very different to a hitherto assumed AgCu surface alloy. In particular, the results point to a dynamical coexistence of thin CuO and AgO–CuO films on the Ag substrate. This is likely to have important consequences regarding the mechanism by which Cu enhances the catalyst selectivity since the active O species will be part of the oxide layer rather than adsorbed O atoms on a metal surface. Preliminary investigations indicate that some reaction pathways for ethylene oxidation over such Cu-oxide layers have a lower activation energy than that of the (undesired) competing reaction to acetaldehyde. These findings may also be of high relevance for understanding the activity of other dilute alloy catalysts. The most complex approach discussed, kinetic MC, links an accurate description of the elementary processes, which have a clear microscopic meaning (obtained through use of first-principles calculations) with a proper evaluation of their statistical interplay. Important to the success of this approach is the identification of all relevant elementary processes, which can be nontrivial. Further, for increasingly complex systems, the number of elementary processes can virtually explode. In the literature there have been some attempts to generate the list of elementary reactions “on the fly” (see, e.g., Refs. 70 and 71, where this approach is discussed in more detail and distributed). Typically, ab initio kMC studies have been carried out with “home-grown” codes written around a particular application. In the present chapter, two recent examples were described: the first, the carbon monoxide oxidation reaction over Pd(100) in which the importance of the formation of a thin surface-oxide-like film was identified, and the second, the permeability of hydrogen through amorphous and crystalline films of Fe3 B. In the latter study, the calculations predicted a greater permeability for the amorphous membrane, pointing to amorphous structures possibly representing a new class of higher-efficiency membranes for hydrogen purification. Over the years there has been a considerable increase in the atomic-level understanding of material systems, which has arisen primarily due to the synergy between experiment and first-principles-based studies. It is envisaged that this trend will continue, with the theoretical methods described here, as well as new

586

SURFACE CHEMISTRY AND CATALYSIS

approaches that will be developed together with the seemingly ever-increasing computer power, proving very valuable for advancing the performance of technological applications right across the multidisciplinary fields of physics, chemistry, biology, engineering, and materials science, yielding many exciting discoveries along the way.

REFERENCES 1. Basic research needs for the hydrogen economy. Presented at the Workshop on Production, Storage and Use, U.S. Department of Energy, Office of Basic Energy Sciences, Washington, DC, 2003. 2. Basic research needs for solar energy utilization. Report of the Basic Energy Sciences Workshop on Solar Energy Utilization, 2005. 3. Satterfield, C. N. Heterogeneous Catalysis in Industrial Practice, McGraw-Hill, New York, 1991. 4. Lundgren, E.; Over, H. J. Phys. Condens. Matter 2008, 20 , 180302, and references therein. 5. Basic research needs: catalysis for energy. Presented at the Workshop on Production, Storage and Use, U.S. Department of Energy, Office of Basic Energy Sciences, Washington, DC, 2007. 6. Reuter, K.; Stampfl, C.; Scheffler, M. Ab initio atomistic thermodynamics and statistical mechanics of surface properties and functions. In Handbook of Materials Modeling, Vol. 1., Yip, S., Ed., Springer-Verlag, Berlin, 2005, pp. 149–194. 7. Reuter, K. First-principles kinetic Monte Carlo simulations for heterogeneous catalysis: Concepts, status and frontiers. In Modeling Heterogeneous Catalytic Reactions: From the Molecular Process to the Technical System, Deutschmann, O., Ed., WileyVCH, Weinberg, Germany, 2009. 8. Stampfl, C. Catal. Today 2005, 105 , 17. 9. Hao, S.; Sholl, D. S. Energy Environ. Sci . 2008, 1 , 175. 10. Sholl, D. S.; Steckel, J. A. Density Functional Theory: A Practical Introduction, Wiley, New York, 2009. 11. Engel, T.; Ertl, G. J. Chem. Phys. 1978, 69 , 1267; Adv. Catal . 1979, 28 , 1; The Chemical Physics of Solid Surfaces and Heterogeneous Catalysis, Vol. 4, King, D. A. and Woodruff, D. P., Eds., Elsevier, Amsterdam, 1982. 12. Campbell, C. T.; Ertl, G.; Kuipers, H.; Segner, J. J. Chem. Phys. 1980, 73 , 5862. 13. Zaera, F. Prog. Surf. Sci . 2002, 69 , 1. 14. Nakai, I.; Kondoh, H.; Shimada, T.; Resta, A.; Andersen, J.; Ohta, T. J. Chem. Phys. 2006, 124 , 224712. 15. McEwen, J.-S.; Payne, S. H.; Stampfl, C. Chem. Phys. Lett. 2002, 361 , 317. 16. Borg, M.; Stampfl, C.; Mikkelsen, A.; Gustafson, J.; Lundgren, E.; Scheffler, M.; Andersen, J. N. ChemPhysChem 2005, 6 , 1923. 17. Piccinin, S.; Stampfl, C. Phys. Rev. B 2010, 81 , 155427. 18. Tang, H.; Van der Ven, A.; Trout, B. L. Phys. Rev. B 2004, 70 , 045420. 19. Zhang, Y.; Blum, V.; Reuter, K. Phys. Rev. B 2007, 75 , 235406.

REFERENCES

587

20. Shao, J. J. Am. Stat. Assoc. 1993, 88 , 486. 21. Zhang, P. Ann. Math. Stat. 1993, 21 , 299. 22. Stampfl, C.; Kreuzer, H. J.; Payne, S. H.; Pfn¨ur, H.; Scheffler, M. Phys. Rev. Lett. 1999, 83 , 2993. 23. Mendez, J.; Kim, S. H.; Cerd´a, J.; Wintterlin, J.; Ertl, G. Phys. Rev. B 2005, 71 , 085409. 24. Piercy, P,; De’Bell, K.; Pfn¨ur, H. Phys. Rev. B 1992, 45 , 1869. 25. Kortan, A. R.; Park, R. L. Phys. Rev. B 1981, 23 , 6340. 26. Wang, F.; Landau, D. P. Phys. Rev. Lett. 2001, 86 , 2050. 27. Wang, F.; Landau, D. P. Phys. Rev. E 2001, 64 , 056101. 28. Schulz, B. J.; Binder, K.; M¨uller, M.; Landau, D. P. Phys. Rev. E 2003, 67 , 067102. 29. Keil, F. J. J. Univ. Chem. Technol. Metall . 2008, 43 , 19. 30. Piccinin, S.; Stampfl, C.; Scheffler, M. Phys. Rev. B 2008, 77 , 075426. 31. Reuter, K; Scheffler, M. Phys. Rev. B 2002, 65 , 035406. 32. Weinert, C.; Scheffler, M. Mater. Sci. Forum 1986, 10–12 , 25. 33. Scheffler, M.; Dabrowski, J. Phil. Mag. A 1988, 58 , 107. 34. Li, W.-X.; Stampfl, C.; Scheffler, M. Phys. Rev. B 2003, 67 , 045408. 35. Stampfl, C. Catal. Today 2005, 105 , 17. 36. Linic, S., Jankowiak, J.; Barteau, M. A. J. Catal . 2004, 224 , 489. 37. Linic, S.; Barteau, M. A. J. Am. Chem. Soc. 2002, 124 , 310. 38. Linic, S.; Barteau, M. A. J. Am. Chem. Soc. 2004, 125 , 4034. 39. Piccinin, S.; Zafeiratos, S.; Stampfl, C.; Hansen, T.; H¨avecker, M.; Teschner, D.; Knop-Gericke, A.; Schl¨ogl, R.; Scheffler, M. Phys. Rev. Lett. 2010, 104 , 035503. 40. Stull, D. R.; Prophet, H. JANAF Thermochemical Tables, 2nd ed., U.S. National Bureau of Standards, Washington, DC, 1971. 41. Soon, A.; Todorova, M.; Delley, B.; Stampfl, C. Phys. Rev. B 2006, 73 , 165424. 42. Jankowiak, J. T.; Barteau, M. A. J. Catal . 2005, 236 , 366. 43. Zafeiratos, S.; H¨avecker, M.; Teschner, D.; Vass, E.; Schn¨orch, P.; Girgsdies, F.; Hansen, T.; Knop-Gericke, A.; Schl¨ogl, R.; Bukhiyarov, V. Unpublished. 44. Piccinin, S.; Stampfl, C.; Scheffler, M. Surf. Sci . 2009, 603 , 1467. 45. Wulff, G. Z. Kristallogr . 1901, 34 , 449. 46. Kokalj, A.; Gava, P.; de Gironcoli, S.; Baroni, S. J. Catal . 2008, 254 , 304. 47. Torres, D.; Lopes, N.; Illas, F.; Lambert, R. J. Am. Chem. Soc. 2005, 127 , 10774. 48. Bocquet, F.; Loffreda, D. J. Am. Chem. Soc. 2005, 127 , 17207. 49. Piccinin, S.; Nguyen, N. L.; Stampfl, C.; Scheffler, M. J. Mater. Chem. 2010, 20 , 10521. 50. Yip, S., Ed. Handbook of Materials Modeling, Springer-Verlag, Berlin, 2005. 51. Voter, A. F. Introduction to the kinetic Monte Carlo method. In Radiation Effects in Solids, Sickafus, K. E., Kotomin, E. A., and Uberuaga, B. P., Eds., Springer-Verlag, Berlin, 2007. 52. Bortz, A. B.; Kalos, M. H.; Lebowitz, J. L. J. Comput. Phys. 1975, 17 , 10. 53. Gillespie, D. T. J. Comput. Phys. 1976, 22 , 403. 54. Voter, A. F. Phys. Rev. B 1986, 34 , 6819.

588

55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.

SURFACE CHEMISTRY AND CATALYSIS

Kang, H. C.; Weinberg, W. H. J. Chem. Phys. 1989, 90 , 2824. Fichthorn, K. A.; Weinberg, W. H. J. Chem. Phys. 1991, 95 , 1090. Reuter, K.; Scheffler, M. Appl. Phys. A 2004, 78 , 793. Over, H.; M¨uhler, M. Prog. Surf. Sci . 2003, 72 , 3. Rogal, J.; Reuter, K.; Scheffler, M. Phys. Rev. B 2008, 77 , 155410. Rogal, J.; Reuter, K.; Scheffler, M. Phys. Rev. Lett. 2007, 98 , 046101. Reuter, K.; Scheffler, M. Phys. Rev. B 2003, 68 , 045407. Reuter, K.; Scheffler, M. Phys. Rev. Lett. 2003, 90 , 046103. Todorova, M.; Lundgren, E.; Blum, V.; Mikkelsen, A.; Gray, S.; Gustafson, J.; Borg, M.; Rogal, J.; Reuter, K.; Andersen, J. N.; Scheffler, M. Surf. Sci . 2003, 541 , 101. Hendriksen, B. L. M.; Bobaru, S. C.; Frenken, J. W. M. Surf. Sci . 2004, 552 , 229. Szanyi, J.; Goodman, D. W. J. Phys. Chem. 1994, 98 , 2972. Semidey-Flecha, L.; Sholl, D. S. J. Chem. Phys. 2008, 128 , 144701. Hao, S.; Sholl, D. S. J. Chem. Phys. 2009, 130 , 244705. Schlapbach, L.; Z¨uttel, A. Nature 2001, 414 , 353. Ling, C.; Sholl, D. S. J. Membr. Sci . 2007, 303 , 162. Henkelman, G.; J´onsson, H. J. Chem. Phys. 2001, 115 , 9657. Pedersen, A.; J´onsson, H. Math. Comput. Simul . 2010, 10 , 1487.

18

Molecular Spintronics WOO YOUN KIM and KWANG S. KIM Center for Superfunctional Materials, Pohang University of Science and Technology, Pohang, Korea

Molecular spintronics is a new rising field to share and maximize the common area between spintronics and molecular electronics. This chapter offers a pedagogical introduction to the theoretical work on molecular spintronics. Theoretical backgrounds for both spintronics and molecular electronics are overviewed and their numerical implementation issues are discussed in detail. In particular, we review molecular analogs of conventional spin valve devices and graphene nanoribbon–based super magnetoresistance.

18.1 INTRODUCTION

Spintronics is a promising research field where electronic devices exploit the spin of an electron as a transport carrier rather than its charge in conventional electronics. Manipulation of the spin using external magnetic fields enables us to store information with high density in an electronic device.1 In addition, nonvolatility of the spin empowers the device to keep the information without electric power. This new idea triggered by the discovery of the giant magnetoresistance (GMR) effect in 1988 has led to the innovation of information storage techniques, with successful application of the GMR device to the read head sensor in hard disk drives.2,3 It eventually advanced an information-oriented era. As a result, in 2007, Nobel prizes were awarded to A. Fert and P. Gr¨unberg for their discovery of the GMR effect. In the meantime, popularization of small and portable electronic devices has led to increased demand to develop not only nonvolatile but also low power consumption, high-speed access, and high-density memory devices. Emergence of tunneling magnetoresistance (TMR) has opened a new way to develop high-performance magnetoresistive random access memory (MRAM), which has attracted great attention as a next generation of information storage.4

Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, First Edition. Edited by Jeffrey R. Reimers. © 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.

589

590

MOLECULAR SPINTRONICS

On the other hand, molecular electronics is a rapidly growing field where a single or a few molecules are used as an individual electronic device.5 – 9 Such a bottom-up approach would provide an ideal means to construct nanoscale devices, complementing or even replacing conventional top-down approaches.8,9 In addition, organic molecules have essential advantages to be used in spintronics. There are two intrinsic sources to collapse long spin coherence in materials: spin-orbit coupling and hyperfine interactions. Organic molecules are composed of low-mass atoms, while the strength of the spin-orbit coupling increases with the atomic number Z (proportional to Z 4 in the case of atoms). Carbon-12 (12 C), the most abundant isotopes of carbon as well as the main component of organic molecules, has zero nuclear spin, so that it has no hyperfine interactions. Moreover, delocalized orbitals of conjugated molecules have small hyperfine interactions. These properties of molecules promise long spin-relaxation length, which is vital to fabricate high-performance spintronic devices. In this regard, novel combination of both spintronics and molecular electronics would be the natural evolution toward molecular-scale spintronic devices. This new emerging field, molecular spintronics, has already shown the feasibility of real applications with successful measurements of spin-dependent electrical currents in molecule-based devices.10 – 15 The first experiment was carried out by exploiting a multiwall carbon nanotube (CNT) sandwiched between cobalt electrodes.11 CNTs have attracted much interest because of their superior properties, such as high carrier mobility, ballistic electron transport, and mechanical robustness. Furthermore, they are composed of only carbon atoms, so that they have negligible spin-orbit coupling and hyperfine interactions. Indeed, CNTs have shown very long spin relaxation length reaching over micrometers.14 Subsequently, organic molecules and graphene (a single graphite layer) have been used in spintronic devices.12 – 15 In addition, a new type of spintronic devices can be made when exploiting a magnetic molecule in spintronics.16 – 20 Particular molecules comprised of transition metals show internal spin ordering whose orientation can be controlled by an external magnetic field. Electron transport through such a magnetic molecule shows nontrivial spin-dependent effects due to the internal spin dynamics of the molecule. All this experimental evidence shows the bright future of molecular spintronics. Alongside experimental works, theoretical studies have also been active.8 As quantum chemistry, including density functional theory (DFT), the Hartree–Fock (HF) method, and post-HF methods, has offered versatile tools to study electronic structures for a variety of materials, theoretical modeling should be a powerful means to investigate transport properties in molecular spintronic devices. However, it is not straightforward to use conventional quantum chemistry for this purpose, since we are dealing not only with nonequilibrium states driven by a bias voltage (for which the variational principle is not valid) but also open boundary systems made by a contact between two semi-infinite metallic electrodes and a finite molecule. A general way to study such a system is to utilize the nonequilibrium Green’s function (NEGF) method.21,22 At present, several schemes based

THEORETICAL BACKGROUND

591

on the NEGF method to describe quantum transport quantitatively as well as qualitatively are available23 – 33 (see also Chapters 1 and 19). Some of them are also used for spin-polarized transport.29 – 33 Especially, parameter-free methods enable us to design novel spintronic devices as well as to interpret experimental observations. The goal of this chapter is to offer a pedagogical introduction of the exciting molecular spintronics based on theoretical works. In the following sections we discuss theoretical backgrounds on spintronics and molecular electronics, practical schemes for numerical implementation, and interesting example studies.

18.2 THEORETICAL BACKGROUND 18.2.1 Magnetoresistance

A representative spintronic device is the spin valve that is composed of two ferromagnetic (FM) electrodes connected by a spacer as shown in Figs. 18.1 and 18.2. The resistance in the spin-valve device depends on the relative spin orientation between the two FM electrodes. In general, the resistance is smaller for the parallel spin orientation than for the antiparallel spin orientation. Consequently, the resistance in a spin-valve device is tuned by an external magnetic

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 18.1 (color online) (a,b) Schematic structure of a GMR device with parallel and antiparallel spin alignments; (c,d) corresponding density of states (with respect to energy) and spin-transfer paths (from the left to right electrode through a spacer); (e,f), schematic presentation of resistance for the spin-transfer paths.

592

MOLECULAR SPINTRONICS

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 18.2 (color online)

Same as Fig. 18.1 for a TMR device.

field. Magnetoresistance (MR), the quantitative value measuring the effectiveness of a spin-valve device, is typically defined as follows: MR =

GP − GAP RAP − RP = RP GAP

(optimistic)

(18.1)

MR =

RAP − RP GP − GAP = RAP GP

(pessimistic)

(18.2)

or

where R/G is resistance/conductance and P/AP is parallel/antiparallel. The optimistic version is most commonly used. However, the pessimistic MR is useful when a system has a vanishing GAP , because in this case the pessimistic MR is bounded by 1, while the optimistic MR is unbounded. The type of MR is determined by a spacer material, since the mechanism of spin transport is different according to the spacer material. Figures 18.1 and 18.2 show schematic structures of two conventional spin-valve devices. As shown in Fig. 18.1, a GMR device adopts a nonmagnetic metal (NM) as a spacer, so that spins injected from one of the FM electrodes travel through conducting channels of the NM spacer to the other FM electrode. Figure 18.1c and d show configurations of density of states (DOS) and spin-transfer paths from

593

THEORETICAL BACKGROUND

FM to NM and from NM to FM for the parallel and antiparallel spin cases. Spins of the left FM electrode transfer to the nonmagnetic metal and then to the right FM electrode, which has the same spin DOS as that of the left FM electrode. In this process, spin-up and spin-down carriers have different resistance due to the asymmetric spin DOS at both electrodes as described in Fig. 18.1e and f. Resistances for the parallel and antiparallel spin configurations are as follows: RP =

2(Rlarge Rsmall ) ≈ 2Rsmall (Rlarge + Rsmall )

and RAP =

Rlarge + Rsmall Rlarge ≈ 2 2

Thus, the GMR device gives a substantial MR value. When an insulator is used as a spacer, the spin transfer between two FM electrodes is achieved by quantum mechanical tunneling through the potential barrier due to the insulator, as shown in Fig. 18.2. The magnetoresistance through this mechanism is called TMR. As in the GMR device, both spin carriers have different resistance, as depicted in Fig. 18.2e and f. Resistance according to the relative spin configurations is given by RP =

Rlarge Rsmall ≈ Rsmall Rlarge + Rsmall

and RAP =

Rlarge 2

The spin flip during the tunneling process is negligible, so that the TMR can be directly expressed by spin polarization of the two FM contacts, as derived by Julli`ere34 : TMR =

2P1 P2 RAP − RP = RP 1 − P1 P2

(18.3)

Here P1(2) is the polarization of the first (second) FM electrodes: Pi =

Ni↑ (EF ) − Ni↓ (EF ) Ni↑ (EF ) + Ni↓ (EF )

(18.4)

with the number of spin-up electrons Ni↑ (EF ) and the number of spin-down electrons Ni↓ (EF ) at the Fermi level (EF ). Typical TMR values (∼100%) are larger than typical GMR values (∼10%). Relatively low MR values in GMR devices may originate from spin flip occurred during the diffusion of the injected spins through a NM spacer. 18.2.2 Molecular Electronics

Figure 18.3 is a schematic of a two-terminal molecular electronic device. Under an applied bias voltage, electrical currents are driven through the molecule(center) from the source (left) to the drain (right) electrodes. For small molecules whose spatial extension is smaller than the mean free path of the system, electron transport shows the ballistic behavior if the device has continuum bands, while it

594

MOLECULAR SPINTRONICS

Fig. 18.3

Two-terminal molecular electronic device.

shows resonant or nonresonant tunneling behavior if the device has discrete energy levels.8 Molecular orbitals (MOs) of the device provide channels for electron transport. Therefore, an accurate description of molecular energy levels in the junction is vital to understanding transport properties. As a molecule is bonded to metal electrodes, we need to take into account the following. First, there would be a significant charge transfer between electrodes and a molecule due to the dissimilarity of their electronic structures, resulting in the MO energy level shifts (). Second, the molecular states are coupled to the continuum states of the electrodes, and this coupling results in a finite broadening () of molecular energy levels. Consequently, the MO energy levels are renormalized by the contact effects in the junction as depicted in Fig. 18.4. Here, we discuss how to calculate the renormalized molecular energy levels and electrical currents through them. Before going into the detailed discussion, we describe how electrical currents are determined by alignment of the molecular energy levels with respect to the

Γ

ELUMO

EF

EHOMO

Contacted

Isolated

Fig. 18.4 Renormalization of the molecular energy levels in the metal–molecule contact. (From Ref. 8, with permission of RSC Publishing.)

THEORETICAL BACKGROUND

595

energy bands of both leads. As an external bias voltage is applied, the chemical potential of both electrodes is split by the bias voltage, giving rise to two different Fermi functions at both electrodes. The two Fermi functions determine the energy range to allow transmission of electrons, which is called the bias window . The incoming electrons would transmit through the broadened energy levels as depicted in Fig. 18.5. Some of them transmit with high probability, especially at the resonance energy level, whereas others are reflected. In this way, the transmission probability as a function of energy [T (ε)] is determined by the renormalized molecular energy levels. Finally, we can calculate the current (I ) by integrating this function over all energy ranges in the bias window restricted by the two Fermi functions [fL (ε) and fR (ε)] as follows: 2e ∞ T (ε)[fL (ε) − fR (ε)] dε (18.5) I= h −∞ where h is the Planck constant and e is the electron charge. It should be emphasized that the energy-level shift and broadening are very important to determine the transmission probability and electrical currents. Let us consider the simplest system having a single energy level. In this case, one can intuitively derive the explicit form of the transmission function. The energy broadening factor is related to the electron hopping rate between the energy states of the molecule and one of the electrodes by the energy–time uncertainty principle: E t = τ ∼ h

(18.6)

where τ is the lifetime of an electron in the molecular state, and thus the hopping rate is given by 1/τ(∼/ h). Using the definition of the current, we obtain the mL

mR

R(E)

T(E)

Fig. 18.5 (color online) Transmission probability in a molecular junction. R/T (E) is a reflection/transmission probability as a function of energy. μL/R is the chemical potential of the left/right electrode. T (E) + R(E) = 1. μL − μR = eV , where V is the applied bias voltage.

596

MOLECULAR SPINTRONICS

following formula for the current (IL ) from the left electrode to the molecule: e(N − NL ) L (18.7) =e (N − NL ) IL = τ h where L is the broadening factor due to the left contact, and N and NL [= 2fL (ε)] are the number of electrons in the molecule and the left electrode, respectively. In the same way, the current at the right contact is given by R e(N − NR ) =e (N − NR ) (18.8) IR = τ h where NR = 2fR (ε). Assuming that I = IL = −IR , we calculate the number of electrons in the molecular energy level at the steady state. Then we have N=

L fL (ε) − R fR (ε) L + R

(18.9)

and I (ε) =

2e L R [fL (ε) − fR (ε)] h L + R

(18.10)

On the other hand, the molecular energy level is broadened with a factor (= L + R ) due to the contact effect, as shown in Fig. 18.5. To take such an effect into account, the total current should be obtained by integrating the current as a function of energy in Eq. (18.10) over all the energy range with a weighting factor [D(ε)], which presents an energy-dependent distribution for the broadened molecular energy level: L R 2e ∞ D(ε) [fL (ε) − fR (ε)] dε (18.11) I= h −∞ L + R By comparing Eq. (18.11) with Eq. (18.5), we find that the transmission function for the single energy level is T (ε) = D(ε)

L R L + R

(18.12)

To extend formula (18.12) for the realistic case comprised of multienergy levels, we need to deal with the Keldysh NEGF method.22 18.2.3 Nonequilibrium Green’s Function Method for Quantum Transport

A target system that we want to describe in terms of the NEGF method is composed of the device molecule and the left and right electrodes (Fig. 18.3). To establish the Hamiltonian for the system, we start from an uncoupled state where

597

THEORETICAL BACKGROUND

each part is in its own equilibrium state independently, while the interaction terms between them are turned on later as a perturbative potential. By assuming that both electrodes are noninteracting systems, the Hamiltonian is Hα =

+ εkα ckα ckα

(18.13)

k + where ckα (ckα ) is the creation (annihilation) operator of an electron with momentum k and kinetic energy εkα for the α (= L,R) electrode region. For the device region, the form of the Hamiltonian depends on how to treat electron–electron or electron–phonon interactions. For the sake of simplicity, we concentrate on the noninteracting case. Then the Hamiltonian of the device part (Hdev ) is

Hdev =

εn dn+ dn

(18.14)

n

where dn+ (dn ) is the creation (annihilation) operator of the electron in the state |n with energy εn . We refer readers to the more specialized literature for generalization of the formalism in the case of interacting systems.22,35 In most practical calculations, the electron–electron interaction is effectively considered by the noninteracting Kohn–Sham potential using DFT. The coupling effect is taken into account by turning on the interaction potential term Vint,α between the device and electrode α: Vint,α =

+ τkα,n ckα dn + τ∗kα,n dn+ ckα

(18.15)

k,n

where τkα,n denotes the hopping term from state |n > to state |k >. Finally, the total Hamiltonian is given by H = Hdev + HL + HR + Vint,L + Vint,R

(18.16)

By definition, electrical currents from the left electrode to the device part (IL ) can be calculated from Heisenberg’s equation of motion22,35 : d ie (18.17) eNL (t) = [H, NL (t)] dt + (t)ckL (t) is the number operator of electrons in the left where NL (t) ≡ k ckL electrode. Since HL/R and Hdev commute with the number operator, Eq. (18.17) is simplified as IL =

IL =

ie ie + [Vint,L , NL (t)] = τkL,n ckL (t)dn (t) − τ∗kL,n dn+ (t)ckL (t) k,n

(18.18)

598

MOLECULAR SPINTRONICS

TABLE 18.1

Definition of Various Green’s Functions

Definition of Various Green’s Functionsa Grij (t, t ) = −iθ(t − t ) {ci (t), cj+ (t)} Gaij (t, t ) = θ(t − t) {ci (t), cj+ (t)} + G< ij (t, t ) = i cj (t )ci (t) + G> ij (t, t ) = −i ci (t)cj (t )

Gtij (t, t ) = −i T {cj+ (t )ci (t)} Gtij (t, t ) = −i T {cj+ (t )ci (t)}

Name Retarded Green’s function Advanced Green’s function Lesser Green’s function Greater Green’s function Time-ordered Green’s function Anti-time-ordered Green’s function

Physical Meaning

Particle propagator Hole propagator

Source: Ref. 22. a + ci (ci ) denotes the particle creation (annihilation) operator for state |i>. T (T ) is the time-ordering ˆ over the ˆ means the thermal average of the operator A (anti-time-ordering) operator. Symbol A grand canonical ensemble.

By introducing the lesser Green’s function defined in Table 18.1, Eq. (18.18) becomes IL =

e ∗ < τkL,n G< kL,n (t, t) + τkL,n Gn,kL (t, t) k,n

(18.19)

Equation (18.19) can be rewritten in the energy domain by using Fourier transform: e ∞ dε ∗ < [τkL,n G< (18.20) IL = n,kL (ε) + τkL,n GkL,n (ε)] k,n −∞ 2π Equation (18.20) indicates that the current at the left contact equals the sum of all possible contributions of the particle (electron) propagations from the arbitrary state |n > in the device part to an arbitrary state |k > in the left electrode, or vice versa. According to the Keldysh nonequilibrium Green’s function formalism, the lesser Green’s function in Eq. (18.20) is decomposed into the propagation part in the electrodes and the propagation part in the device molecule with a corresponding hopping term between them22 : G< kL,n (ε) =

t < t τkL,m [gkL,kL (ε)G< m,n (ε) − gkL,kL (ε)Gm,n (ε)]

(18.21)

< t τ∗kL,m [gkL,kL (ε)Gtn,m (ε) − gkL,kL (ε)G< n,m (ε)]

(18.22)

m

G< n,kL (ε) =

m

THEORETICAL BACKGROUND

599

Here we introduced time-ordered and anti-time-ordered Green’s functions from Table 18.1. In Eqs. (18.21) and (18.22), Gn,m (ε) represents particle propagation between states |n > and |m > in the device part, and gkL,kL (ε) denotes the Green’s function for the noninteracting left electrode: < gkL,kL (ε) = 2πif (ε)δ(ε − εk )

(18.23)

> (ε) = −2πi[1 − f (ε)]δ(ε − εk ) gkL,kL

(18.24)

By inserting Eqs. (18.21) and (18.22) into Eq. (18.20), one finally arrives at the following: ie ∞ r a dετL,n τ∗L,m ρL (ε){G< IL = n,m (ε) + fL (ε)[Gn,m (ε) − Gn,m (ε)]} n,m −∞ (18.25) where ρL (ε) is the density of states for the left electrode and we use the following relations22 : Gt (ε) + Gt (ε) = G> (ε) + G< (ε) and G> (ε) − G< (ε) = Gr (ε) − Ga (ε). In Eq. (18.25), Gr (ε) and Ga (ε) denote the retarded and advanced Green’s functions for the device part, respectively, which can be obtained by Fourier transformation of the retarded and advanced Green’s functions defined in Table 18.1 to the energy domain. We can evaluate the current at the right contact IR in the same way. For a steady state, which means that I = IL = −IR , the current in a matrix version is ie ∞ Tr{[fL (ε)L (ε) − fR (ε)R (ε)][Gr (ε) − Ga (ε)]} I = 2 −∞ + Tr{[L (ε) − R (ε)]G< (ε)} dε

(18.26)

where + r r L/R (ε) = 2τ+ L/R ρL/R (ε)τL/R = −2 Im[τL/R gL/R (ε)τL/R ] = −2 Im[L/R (ε)] (18.27) The L/R (ε) is twice the imaginary part of the retarded self-energy for the left/right electrodes [L/R (ε)]. The lesser Green’s function in the device part for the noninteracting system is defined by35

G< (ε) ≡ ifL (ε)Gr (ε)L (ε)Ga (ε) + ifL (ε)Gr (ε)R (ε)Ga (ε) Finally, one obtains the electrical current: e Tr[Ga (ε)R (ε)Gr (ε)L (ε)][fL (ε) − fR (ε)] dε I= h

(18.28)

(18.29)

The final expression for the noninteracting system is exactly the same as Eq. (18.5) if Eq. (18.29) is multiplied by 2 to take into account the spin

600

MOLECULAR SPINTRONICS

degeneracy. Thus, the transmission in the noninteracting regime is given by T (ε) ≡ Tr[Ga (ε)R (ε)Gr (ε)L (ε)]

(18.30)

The next step is to calculate the retarded/advanced Green’s function and the left/right coupling (i.e., self-energy) terms.

18.3 NUMERICAL IMPLEMENTATION

Theoretical description of quantum transport requires sophisticated calculations for a metal–molecule junction composed of a large number of atoms. Density functional theory (DFT), as reviewed in Chapters 1 to 3, enables us to perform accurate calculations of electronic structure for such a system at the firstprinciples level with computational efficiency. In addition, the NEGF method can easily be implemented in a usual DFT code, since an electron density, the main ingredient in DFT, can be obtained directly from the NEGF method for an open system. In this section we discuss the detailed numerical implementation issues of the NEGF method based on DFT. 18.3.1 Green’s Function

Accurate description of the metal–molecu