Large Language Model-Based Solutions
Home > Computing and Information Technology > Computer science > Artificial intelligence > Machine learning > Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications
Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications

Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications

|
     0     
5
4
3
2
1




Available


About the Book

Learn to build cost-effective apps using Large Language Models In Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications, Principal Data Scientist at Amazon Web Services, Shreyas Subramanian, delivers a practical guide for developers and data scientists who wish to build and deploy cost-effective large language model (LLM)-based solutions. In the book, you'll find coverage of a wide range of key topics, including how to select a model, pre- and post-processing of data, prompt engineering, and instruction fine tuning. The author sheds light on techniques for optimizing inference, like model quantization and pruning, as well as different and affordable architectures for typical generative AI (GenAI) applications, including search systems, agent assists, and autonomous agents. You'll also find: Effective strategies to address the challenge of the high computational cost associated with LLMs Assistance with the complexities of building and deploying affordable generative AI apps, including tuning and inference techniques Selection criteria for choosing a model, with particular consideration given to compact, nimble, and domain-specific models Perfect for developers and data scientists interested in deploying foundational models, or business leaders planning to scale out their use of GenAI, Large Language Model-Based Solutions will also benefit project leaders and managers, technical support staff, and administrators with an interest or stake in the subject.

Table of Contents:
Introduction xix Chapter 1: Introduction 1 Overview of GenAI Applications and Large Language Models 1 The Rise of Large Language Models 1 Neural Networks, Transformers, and Beyond 2 GenAI vs. LLMs: What’s the Difference? 5 The Three-Layer GenAI Application Stack 6 The Infrastructure Layer 6 The Model Layer 7 The Application Layer 8 Paths to Productionizing GenAI Applications 9 Sample LLM-Powered Chat Application 11 The Importance of Cost Optimization 12 Cost Assessment of the Model Inference Component 12 Cost Assessment of the Vector Database Component 19 Benchmarking Setup and Results 20 Other Factors to Consider 23 Cost Assessment of the Large Language Model Component 24 Summary 27 Chapter 2: Tuning Techniques for Cost Optimization 29 Fine-Tuning and Customizability 29 Basic Scaling Laws You Should Know 30 Parameter-Efficient Fine-Tuning Methods 32 Adapters Under the Hood 33 Prompt Tuning 34 Prefix Tuning 36 P-tuning 39 IA3 40 Low-Rank Adaptation 44 Cost and Performance Implications of PEFT Methods 46 Summary 48 Chapter 3: Inference Techniques for Cost Optimization 49 Introduction to Inference Techniques 49 Prompt Engineering 50 Impact of Prompt Engineering on Cost 50 Estimating Costs for Other Models 52 Clear and Direct Prompts 53 Adding Qualifying Words for Brief Responses 53 Breaking Down the Request 54 Example of Using Claude for PII Removal 55 Conclusion 59 Providing Context 59 Examples of Providing Context 60 RAG and Long Context Models 60 Recent Work Comparing RAG with Long Content Models 61 Conclusion 62 Context and Model Limitations 62 Indicating a Desired Format 63 Example of Formatted Extraction with Claude 63 Trade-Off Between Verbosity and Clarity 66 Caching with Vector Stores 66 What Is a Vector Store? 66 How to Implement Caching Using Vector Stores 66 Conclusion 69 Chains for Long Documents 69 What Is Chaining? 69 Implementing Chains 69 Example Use Case 70 Common Components 70 Tools That Implement Chains 72 Comparing Results 76 Conclusion 76 Summarization 77 Summarization in the Context of Cost and Performance 77 Efficiency in Data Processing 77 Cost-Effective Storage 77 Enhanced Downstream Applications 77 Improved Cache Utilization 77 Summarization as a Preprocessing Step 77 Enhanced User Experience 77 Conclusion 77 Batch Prompting for Efficient Inference 78 Batch Inference 78 Experimental Results 80 Using the accelerate Library 81 Using the DeepSpeed Library 81 Batch Prompting 82 Example of Using Batch Prompting 83 Model Optimization Methods 83 Quantization 83 Code Example 84 Recent Advancements: GPTQ 85 Parameter-Efficient Fine-Tuning Methods 85 Recap of PEFT Methods 85 Code Example 86 Cost and Performance Implications 87 Summary 88 References 88 Chapter 4: Model Selection and Alternatives 89 Introduction to Model Selection 89 Motivating Example: The Tale of Two Models 89 The Role of Compact and Nimble Models 90 Examples of Successful Smaller Models 91 Quantization for Powerful but Smaller Models 91 Text Generation with Mistral 7B 93 Zephyr 7B and Aligned Smaller Models 94 CogVLM for Language-Vision Multimodality 95 Prometheus for Fine-Grained Text Evaluation 96 Orca 2 and Teaching Smaller Models to Reason 98 Breaking Traditional Scaling Laws with Gemini and Phi 99 Phi 1, 1.5, and 2 B Models 100 Gemini Models 102 Domain-Specific Models 104 Step 1 - Training Your Own Tokenizer 105 Step 2 - Training Your Own Domain-Specific Model 107 More References for Fine-Tuning 114 Evaluating Domain-Specific Models vs. Generic Models 115 The Power of Prompting with General-Purpose Models 120 Summary 122 Chapter 5: Infrastructure and Deployment Tuning Strategies 123 Introduction to Tuning Strategies 123 Hardware Utilization and Batch Tuning 124 Memory Occupancy 126 Strategies to Fit Larger Models in Memory 128 KV Caching 130 PagedAttention 131 How Does PagedAttention Work? 131 Comparisons, Limitations, and Cost Considerations 131 AlphaServe 133 How Does AlphaServe Work? 133 Impact of Batching 134 Cost and Performance Considerations 134 S3: Scheduling Sequences with Speculation 134 How Does S3 Work? 135 Performance and Cost 135 Streaming LLMs with Attention Sinks 136 Fixed to Sliding Window Attention 137 Extending the Context Length 137 Working with Infinite Length Context 137 How Does StreamingLLM Work? 138 Performance and Results 139 Cost Considerations 139 Batch Size Tuning 140 Frameworks for Deployment Configuration Testing 141 Cloud-Native Inference Frameworks 142 Deep Dive into Serving Stack Choices 142 Batching Options 143 Options in DJL Serving 144 High-Level Guidance for Selecting Serving Parameters 146 Automatically Finding Good Inference Configurations 146 Creating a Generic Template 148 Defining a HPO Space 149 Searching the Space for Optimal Configurations 151 Results of Inference HPO 153 Inference Acceleration Tools 155 TensorRT and GPU Acceleration Tools 156 CPU Acceleration Tools 156 Monitoring and Observability 157 LLMOps and Monitoring 157 Why Is Monitoring Important for LLMs? 159 Monitoring and Updating Guardrails 160 Summary 161 Conclusion 163 Index 181


Best Sellers


Product Details
  • ISBN-13: 9781394240722
  • Publisher: John Wiley & Sons Inc
  • Publisher Imprint: John Wiley & Sons Inc
  • Height: 234 mm
  • No of Pages: 224
  • Returnable: N
  • Spine Width: 15 mm
  • Weight: 494 gr
  • ISBN-10: 1394240724
  • Publisher Date: 29 Apr 2024
  • Binding: Paperback
  • Language: English
  • Returnable: Y
  • Returnable: N
  • Sub Title: How to Deliver Value with Cost-Effective Generative AI Applications
  • Width: 185 mm


Similar Products

Add Photo
Add Photo

Customer Reviews

REVIEWS      0     
Click Here To Be The First to Review this Product
Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications
John Wiley & Sons Inc -
Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications
Writing guidlines
We want to publish your review, so please:
  • keep your review on the product. Review's that defame author's character will be rejected.
  • Keep your review focused on the product.
  • Avoid writing about customer service. contact us instead if you have issue requiring immediate attention.
  • Refrain from mentioning competitors or the specific price you paid for the product.
  • Do not include any personally identifiable information, such as full names.

Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications

Required fields are marked with *

Review Title*
Review
    Add Photo Add up to 6 photos
    Would you recommend this product to a friend?
    Tag this Book Read more
    Does your review contain spoilers?
    What type of reader best describes you?
    I agree to the terms & conditions
    You may receive emails regarding this submission. Any emails will include the ability to opt-out of future communications.

    CUSTOMER RATINGS AND REVIEWS AND QUESTIONS AND ANSWERS TERMS OF USE

    These Terms of Use govern your conduct associated with the Customer Ratings and Reviews and/or Questions and Answers service offered by Bookswagon (the "CRR Service").


    By submitting any content to Bookswagon, you guarantee that:
    • You are the sole author and owner of the intellectual property rights in the content;
    • All "moral rights" that you may have in such content have been voluntarily waived by you;
    • All content that you post is accurate;
    • You are at least 13 years old;
    • Use of the content you supply does not violate these Terms of Use and will not cause injury to any person or entity.
    You further agree that you may not submit any content:
    • That is known by you to be false, inaccurate or misleading;
    • That infringes any third party's copyright, patent, trademark, trade secret or other proprietary rights or rights of publicity or privacy;
    • That violates any law, statute, ordinance or regulation (including, but not limited to, those governing, consumer protection, unfair competition, anti-discrimination or false advertising);
    • That is, or may reasonably be considered to be, defamatory, libelous, hateful, racially or religiously biased or offensive, unlawfully threatening or unlawfully harassing to any individual, partnership or corporation;
    • For which you were compensated or granted any consideration by any unapproved third party;
    • That includes any information that references other websites, addresses, email addresses, contact information or phone numbers;
    • That contains any computer viruses, worms or other potentially damaging computer programs or files.
    You agree to indemnify and hold Bookswagon (and its officers, directors, agents, subsidiaries, joint ventures, employees and third-party service providers, including but not limited to Bazaarvoice, Inc.), harmless from all claims, demands, and damages (actual and consequential) of every kind and nature, known and unknown including reasonable attorneys' fees, arising out of a breach of your representations and warranties set forth above, or your violation of any law or the rights of a third party.


    For any content that you submit, you grant Bookswagon a perpetual, irrevocable, royalty-free, transferable right and license to use, copy, modify, delete in its entirety, adapt, publish, translate, create derivative works from and/or sell, transfer, and/or distribute such content and/or incorporate such content into any form, medium or technology throughout the world without compensation to you. Additionally,  Bookswagon may transfer or share any personal information that you submit with its third-party service providers, including but not limited to Bazaarvoice, Inc. in accordance with  Privacy Policy


    All content that you submit may be used at Bookswagon's sole discretion. Bookswagon reserves the right to change, condense, withhold publication, remove or delete any content on Bookswagon's website that Bookswagon deems, in its sole discretion, to violate the content guidelines or any other provision of these Terms of Use.  Bookswagon does not guarantee that you will have any recourse through Bookswagon to edit or delete any content you have submitted. Ratings and written comments are generally posted within two to four business days. However, Bookswagon reserves the right to remove or to refuse to post any submission to the extent authorized by law. You acknowledge that you, not Bookswagon, are responsible for the contents of your submission. None of the content that you submit shall be subject to any obligation of confidence on the part of Bookswagon, its agents, subsidiaries, affiliates, partners or third party service providers (including but not limited to Bazaarvoice, Inc.)and their respective directors, officers and employees.

    Accept

    New Arrivals

    Inspired by your browsing history


    Your review has been submitted!

    You've already reviewed this product!