Google researchers have developed a new robotic navigation system that combines natural language processing and computer vision. This allows robots to understand and respond to both verbal and visual instructions, enabling them to navigate complex environments with greater accuracy. The system leverages the power of Gemini 1.5 Pro AI, demonstrating impressive results in real-world scenarios.
Results for: Natural Language Processing
Despite initial excitement, ChatGPT has limitations such as hallucinating false information. However, its ability to generate coherent text in over 50 languages, including smaller ones with limited training data, is a significant achievement. This opens up possibilities for overcoming language barriers and democratizing access to language technologies.
In a test of ChatGPT’s knowledge about Sonos Ace headphones, the AI model displayed several errors: it identified them as speakers, suggested non-existent color options, and provided generic recommendations that ignored the unique design and features of the headphones. This highlights potential limitations in ChatGPT’s understanding and reasoning capabilities when dealing with specific products and concepts.
The free version of ChatGPT has received a significant upgrade, now offering features that were previously exclusive to the paid ChatGPT Plus version. Among the most notable is the addition of custom GPTs, which allow users to create personalized versions of ChatGPT tailored to specific tasks. These GPTs can be enhanced with additional knowledge and customized prompts, making them highly versatile tools. The update also includes access to GPT-4o features like image understanding, file uploads for analysis, and web information retrieval. However, some features remain limited for free users, and limitations will result in a fallback to GPT-3.5. These enhancements bring the free version of ChatGPT closer to the capabilities of ChatGPT Plus while expanding its accessibility and utility for a wider range of users.
India has the potential to become a leader in artificial intelligence (AI) for non-English markets by developing new multilingual models that can overcome the complexity of the country’s diverse languages. This was the vision shared by Pranav Mistry, founder and CEO of Two Platforms Inc., at the Mint Digital Innovation Summit 2024 in Mumbai. According to Mistry, Indian startups can drive this change by fine-tuning existing models and developing new ones that do not require training from scratch. Two Platforms and South Korea’s Naver Corp. have already released Sutra, a multilingual large language model designed specifically for the Indian market. Sutra has its own tokenizer that includes all Indian languages in a balanced manner and has been outperforming other local Indian LLMs as well as models like GPT3.5,4 and llama.
OpenAI introduces a groundbreaking update to ChatGPT, dubbed GPT-4o, showcasing its advanced abilities to comprehend facial expressions, mimic human speech patterns, and engage in near real-time conversations with an impressive command of subtle emotional cues. The chatbot’s demonstration revealed its proficiency in tasks such as language translation, mathematical problem-solving, and guiding a visually impaired individual through the streets of London.
OpenAI has released a significant update to its popular AI language model, ChatGPT, making it freely available to all users. The new version, GPT-4o, offers improved performance and efficiency, and introduces innovative features such as voice and video modes. OpenAI’s CEO, Sam Altman, emphasized the company’s commitment to providing free and accessible AI tools to the public, enabling others to innovate and derive diverse benefits through their use of OpenAI’s technology.
OpenAI founder Sam Altman has shared his thoughts on the newly released GPT-4o model, highlighting its impressive capabilities and potential. Altman described the AI’s performance as something out of a movie, particularly praising its voice and video modes for providing a natural and intuitive interface. He emphasized the significant improvement in response times, making interactions with GPT-4o feel more like real conversations. Altman also discussed the evolution of OpenAI’s vision and the potential for monetization while providing free access to billions of people. GPT-4o integrates voice, video, and text for more natural interactions, with reduced response times of milliseconds mimicking real-time conversations. The updated version will be available to both premium and basic tier users.
Google’s Gemini is a suite of generative AI models, apps, and services that aims to revolutionize various aspects of computing. It consists of three tiers: Nano, Pro, and Ultra, each offering unique capabilities and use cases. Gemini models are trained to be multimodal, enabling them to work with and use more than just words. This sets them apart from previous models like LaMDA, which was limited to text data. Gemini has applications in various fields, including physics homework assistance, scientific research, image generation, and language processing. It is accessible through the Gemini apps, Vertex AI, AI Studio, and various Google products like Gboard, Recorder, and Magic Compose. Despite early impressions that have raised concerns, Google claims that Gemini outperforms current state-of-the-art models in benchmarks. The company is continuously updating and improving Gemini, with plans for future advancements and integrations.
Apple has released eight open source large language models (LLMs) called OpenELM, designed to run on-device rather than through cloud servers. These models are available on the Hugging Face Hub and include code, training logs, and multiple versions, enabling researchers and developers to investigate and modify them for various applications.