Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Netdata | 65,264 | 13 hours ago | 370 | gpl-3.0 | C | |||||
Monitor your servers, containers, and applications, in high-resolution and in real-time! | ||||||||||
Jina | 19,120 | 13 | a day ago | 2,421 | July 30, 2023 | 17 | apache-2.0 | Python | ||
☁️ Build multimodal AI applications with cloud-native stack | ||||||||||
Awesome Kubernetes | 14,249 | 12 days ago | 7 | other | Shell | |||||
A curated list for awesome kubernetes sources :ship::tada: | ||||||||||
Mlcourse.ai | 8,803 | 4 months ago | 4 | other | Python | |||||
Open Machine Learning Course | ||||||||||
Cog | 5,631 | 2 days ago | 106 | August 07, 2023 | 300 | apache-2.0 | Python | |||
Containers for machine learning | ||||||||||
Fate | 5,200 | 1 | 10 days ago | 30 | April 18, 2022 | 786 | apache-2.0 | Python | ||
An Industrial Grade Federated Learning Framework | ||||||||||
Pipeline | 4,159 | a year ago | 85 | July 18, 2017 | 1 | apache-2.0 | Jsonnet | |||
PipelineAI Kubeflow Distribution | ||||||||||
Deeplearningproject | 4,043 | 3 years ago | 3 | mit | HTML | |||||
An in-depth machine learning tutorial introducing readers to a whole machine learning pipeline from scratch. | ||||||||||
Orchest | 3,876 | 4 months ago | 19 | December 13, 2022 | 125 | apache-2.0 | TypeScript | |||
Build data pipelines, the easy way 🛠️ | ||||||||||
Serve | 3,664 | 13 | 15 hours ago | 19 | June 14, 2023 | 327 | apache-2.0 | Java | ||
Serve, optimize and scale PyTorch models in production |
Build multimodal AI applications with cloud-native technologies
Jina lets you build multimodal AI services and pipelines that communicate via gRPC, HTTP and WebSockets, then scale them up and deploy to production. You can focus on your logic and algorithms, without worrying about the infrastructure complexity.
Jina provides a smooth Pythonic experience for serving ML models transitioning from local deployment to advanced orchestration frameworks like Docker-Compose, Kubernetes, or Jina AI Cloud. Jina makes advanced solution engineering and cloud-native technologies accessible to every developer.
Data structure and communication protocols
Advanced orchestration and scaling capabilities
Journey to the cloud
pip install jina
Find more install options on Apple Silicon/Windows.
Jina has three fundamental layers:
The full glossary is explained here.
Let's build a fast, reliable and scalable gRPC-based AI service. In Jina we call this an Executor. Our simple Executor will wrap the StableLM LLM from Stability AI. We'll then use a Deployment to serve it.
Note A Deployment serves just one Executor. To combine multiple Executors into a pipeline and serve that, use a Flow.
Let's implement the service's logic:
executor.py |
---|
|
Then we deploy it with either the Python API or YAML:
Python API: deployment.py |
YAML: deployment.yml |
---|---|
|
And run the YAML Deployment with the CLI: |
Use Jina Client to make requests to the service:
from jina import Client
from docarray import DocList, BaseDoc
class Prompt(BaseDoc):
text: str
class Generation(BaseDoc):
prompt: str
text: str
prompt = Prompt(
text='suggest an interesting image generation prompt for a mona lisa variant'
)
client = Client(port=12345) # use port from output above
response = client.post(on='/', inputs=[prompt], return_type=DocList[Generation])
print(response[0].text)
a steampunk version of the Mona Lisa, incorporating mechanical gears, brass elements, and Victorian era clothing details
Note In a notebook, you can't use
deployment.block()
and then make requests to the client. Please refer to the Colab link above for reproducible Jupyter Notebook code snippets.
Sometimes you want to chain microservices together into a pipeline. That's where a Flow comes in.
A Flow is a DAG pipeline, composed of a set of steps, It orchestrates a set of Executors and a Gateway to offer an end-to-end service.
Note If you just want to serve a single Executor, you can use a Deployment.
For instance, let's combine our StableLM language model with a Stable Diffusion image generation model. Chaining these services together into a Flow will give us a service that will generate images based on a prompt generated by the LLM.
text_to_image.py |
---|
|
Build the Flow with either Python or YAML:
Python API: flow.py |
YAML: flow.yml |
---|---|
|
Then run the YAML Flow with the CLI: |
Then, use Jina Client to make requests to the Flow:
from jina import Client
from docarray import DocList, BaseDoc
from docarray.documents import ImageDoc
class Prompt(BaseDoc):
text: str
prompt = Prompt(
text='suggest an interesting image generation prompt for a mona lisa variant'
)
client = Client(port=12345) # use port from output above
response = client.post(on='/', inputs=[prompt], return_type=DocList[ImageDoc])
response[0].display()
Why not just use standard Python to build that service and pipeline? Jina accelerates time to market of your application by making it more scalable and cloud-native. Jina also handles the infrastructure complexity in production and other Day-2 operations so that you can focus on the data application itself.
Increase your application's throughput with scalability features out of the box, like replicas, shards and dynamic batching.
Let's scale a Stable Diffusion Executor deployment with replicas and dynamic batching:
Normal Deployment | Scaled Deployment |
---|---|
|
|
Assuming your machine has two GPUs, using the scaled deployment YAML will give better throughput compared to the normal deployment.
These features apply to both Deployment YAML and Flow YAML. Thanks to the YAML syntax, you can inject deployment configurations regardless of Executor code.
In order to deploy your solutions to the cloud, you need to containerize your services. Jina provides the Executor Hub, the perfect tool to streamline this process taking a lot of the troubles with you. It also lets you share these Executors publicly or privately.
You just need to structure your Executor in a folder:
TextToImage/
├── executor.py
├── config.yml
├── requirements.txt
config.yml |
requirements.txt |
---|---|
|
|
Then push the Executor to the Hub by doing: jina hub push TextToImage
.
This will give you a URL that you can use in your Deployment
and Flow
to use the pushed Executors containers.
jtype: Flow
with:
port: 12345
executors:
- uses: jinai+docker://<user-id>/StableLM
- uses: jinai+docker://<user-id>/TextToImage
Using Kubernetes with Jina is easy:
jina export kubernetes flow.yml ./my-k8s
kubectl apply -R -f my-k8s
And so is Docker Compose:
jina export docker-compose flow.yml docker-compose.yml
docker-compose up
Note You can also export Deployment YAML to Kubernetes and Docker Compose.
That's not all. We also support OpenTelemetry, Prometheus, and Jaeger.
What cloud-native technology is still challenging to you? Tell us and we'll handle the complexity and make it easy for you.
You can also deploy a Flow to JCloud, where you can easily enjoy autoscaling, monitoring and more with a single command.
First, turn the flow.yml
file into a JCloud-compatible YAML by specifying resource requirements and using containerized Hub Executors.
Then, use jina cloud deploy
command to deploy to the cloud:
wget https://raw.githubusercontent.com/jina-ai/jina/master/.github/getting-started/jcloud-flow.yml
jina cloud deploy jcloud-flow.yml
Warning
Make sure to delete/clean up the Flow once you are done with this tutorial to save resources and credits.
Read more about deploying Flows to JCloud.
Large Language Models can power a wide range of applications from chatbots to assistants and intelligent systems. However, these models can be heavy and slow and your users want systems that are both intelligent and fast!
Large language models work by turning your questions into tokens and then generating new token one at a time until it decides that generation should stop. This means you want to stream the output tokens generated by a large language model to the client. In this tutorial, we will discuss how to achieve this with Streaming Endpoints in Jina.
The first step is to define the streaming service schemas, as you would do in any other service framework. The input to the service is the prompt and the maximum number of tokens to generate, while the output is simply the token ID:
from docarray import BaseDoc
class PromptDocument(BaseDoc):
prompt: str
max_tokens: int
class ModelOutputDocument(BaseDoc):
token_id: int
generated_text: str
Our service depends on a large language model. As an example, we will use the gpt2
model. This is how you would load
such a model in your executor
from jina import Executor, requests
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
class TokenStreamingExecutor(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.model = GPT2LMHeadModel.from_pretrained('gpt2')
Our streaming endpoint accepts a PromptDocument
as input and streams ModelOutputDocument
s. To stream a document back to
the client, use the yield
keyword in the endpoint implementation. Therefore, we use the model to generate
up to max_tokens
tokens and yield them until the generation stops:
class TokenStreamingExecutor(Executor):
...
@requests(on='/stream')
async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument:
input = tokenizer(doc.prompt, return_tensors='pt')
input_len = input['input_ids'].shape[1]
for _ in range(doc.max_tokens):
output = self.model.generate(**input, max_new_tokens=1)
if output[0][-1] == tokenizer.eos_token_id:
break
yield ModelOutputDocument(
token_id=output[0][-1],
generated_text=tokenizer.decode(
output[0][input_len:], skip_special_tokens=True
),
)
input = {
'input_ids': output,
'attention_mask': torch.ones(1, len(output[0])),
}
Learn more about streaming endpoints from the Executor
documentation.
The final step is to serve the Executor and send requests using the client. To serve the Executor using gRPC:
from jina import Deployment
with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep:
dep.block()
To send requests from a client:
import asyncio
from jina import Client
async def main():
client = Client(port=12345, protocol='grpc', asyncio=True)
async for doc in client.stream_doc(
on='/stream',
inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10),
return_type=ModelOutputDocument,
):
print(doc.generated_text)
asyncio.run(main())
The
The capital
The capital of
The capital of France
The capital of France is
The capital of France is Paris
The capital of France is Paris.
Jina is backed by Jina AI and licensed under Apache-2.0.