{"id":20654,"date":"2025-01-23T10:00:15","date_gmt":"2025-01-23T11:00:15","guid":{"rendered":"http:\/\/medexperts.pro\/?p=20654"},"modified":"2025-01-23T11:24:22","modified_gmt":"2025-01-23T11:24:22","slug":"a-test-so-hard-no-ai-system-can-pass-it-yet","status":"publish","type":"post","link":"https:\/\/medexperts.pro\/?p=20654","title":{"rendered":"A Test So Hard No AI System Can Pass It \u2014 Yet"},"content":{"rendered":"<div><\/div>\n<div class=\"css-s99gbd StoryBodyCompanionColumn\" data-testid=\"companionColumn-0\">\n<div class=\"css-53u6y8\">\n<p class=\"css-at9mc1 evys1bk0\">If you\u2019re looking for a new reason to be nervous about artificial intelligence, try this: Some of the smartest humans in the world are struggling to create tests that A.I. systems can\u2019t pass.<\/p>\n<p class=\"css-at9mc1 evys1bk0\">For years, A.I. systems were measured by giving new models a variety of standardized benchmark tests. Many of these tests consisted of challenging, S.A.T.-caliber problems in areas like math, science and logic. Comparing the models\u2019 scores over time served as a rough measure of A.I. progress.<\/p>\n<p class=\"css-at9mc1 evys1bk0\">But A.I. systems eventually got too good at those tests, so new, harder tests were created \u2014 often with the types of questions graduate students might encounter on their exams.<\/p>\n<p class=\"css-at9mc1 evys1bk0\">Those tests aren\u2019t in good shape, either. New models from companies like OpenAI, Google and Anthropic have been getting high scores on many Ph.D.-level challenges, limiting those tests\u2019 usefulness and leading to a chilling question: Are A.I. systems getting too smart for us to measure?<\/p>\n<\/div>\n<\/div>\n<div data-testid=\"Dropzone-1\"><\/div>\n<div class=\"css-s99gbd StoryBodyCompanionColumn\" data-testid=\"companionColumn-1\">\n<div class=\"css-53u6y8\">\n<p class=\"css-at9mc1 evys1bk0\">This week, researchers at the Center for AI Safety and Scale AI are releasing a possible answer to that question: A new evaluation, called \u201cHumanity\u2019s Last Exam,\u201d that they claim is the hardest test ever administered to A.I. systems. <\/p>\n<p class=\"css-at9mc1 evys1bk0\">Humanity\u2019s Last Exam is the brainchild of Dan Hendrycks, a well-known A.I. safety researcher and director of the Center for AI Safety. (The test\u2019s original name, \u201cHumanity\u2019s Last Stand,\u201d was discarded for being overly dramatic.)<\/p>\n<div class=\"css-1336jj\">\n<div class=\"css-121kum4\">\n<div class=\"css-171d1bw\"><\/div>\n<div class=\"css-asuuk5\">\n<div class=\"css-7axq9l\" data-testid=\"optimistic-truncator-noscript\">\n<div data-testid=\"optimistic-truncator-noscript-message\" class=\"css-6yo1no\">\n<p class=\"css-3kpklk\">We are having trouble retrieving the article content.<\/p>\n<p class=\"css-3kpklk\">Please enable JavaScript in your browser settings.<\/p>\n<\/div>\n<\/div>\n<div class=\"css-1dv1kvn\" id=\"optimistic-truncator-a11y\">\n<hr \/>\n<p>Thank you for your patience while we verify access. If you are in Reader mode please exit and\u00a0<a href=\"https:\/\/myaccount.nytimes.com\/auth\/login?response_type=cookie&amp;client_id=vi&amp;redirect_uri=https%3A%2F%2Fwww.nytimes.com%2F2025%2F01%2F23%2Ftechnology%2Fai-test-humanitys-last-exam.html&amp;asset=opttrunc\">log into<\/a>\u00a0your Times account, or\u00a0<a href=\"https:\/\/www.nytimes.com\/subscription?campaignId=89WYR&amp;redirect_uri=https%3A%2F%2Fwww.nytimes.com%2F2025%2F01%2F23%2Ftechnology%2Fai-test-humanitys-last-exam.html\">subscribe<\/a>\u00a0for all of The Times.<\/p>\n<hr \/>\n<\/div>\n<div class=\"css-1g71tqy\">\n<div data-testid=\"optimistic-truncator-message\" class=\"css-6yo1no\">\n<p class=\"css-3kpklk\">Thank you for your patience while we verify access.<\/p>\n<p class=\"css-3kpklk\">Already a subscriber?\u00a0<a data-testid=\"log-in-link\" class=\"css-z5ryv4\" href=\"https:\/\/myaccount.nytimes.com\/auth\/login?response_type=cookie&amp;client_id=vi&amp;redirect_uri=https%3A%2F%2Fwww.nytimes.com%2F2025%2F01%2F23%2Ftechnology%2Fai-test-humanitys-last-exam.html&amp;asset=opttrunc\">Log in<\/a>.<\/p>\n<p class=\"css-3kpklk\">Want all of The Times?\u00a0<a data-testid=\"subscribe-link\" class=\"css-z5ryv4\" href=\"https:\/\/www.nytimes.com\/subscription?campaignId=89WYR&amp;redirect_uri=https%3A%2F%2Fwww.nytimes.com%2F2025%2F01%2F23%2Ftechnology%2Fai-test-humanitys-last-exam.html\">Subscribe<\/a>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>If you\u2019re looking for a new reason to be nervous about artificial intelligence, try this: Some of the smartest humans in the world are struggling to create tests that A.I. systems can\u2019t pass.For years, A.I. systems were measured by giving new models a variety of standardized benchmark tests. Many of these tests consisted of challenging, S.A.T.-caliber problems in areas like math, science and logic. Comparing the models\u2019 scores over time served as a rough measure of A.I. progress.But A.I. systems eventually got too good at those tests, so new, harder tests were created \u2014 often with the types of questions graduate students might encounter on their exams.Those tests aren\u2019t in good shape, either. New models from companies like OpenAI, Google and Anthropic have been getting high scores on many Ph.D.-level challenges, limiting those tests\u2019 usefulness and leading to a chilling question: Are A.I. systems getting too smart for us to measure?This week, researchers at the Center for AI Safety and Scale AI are releasing a possible answer to that question: A new evaluation, called \u201cHumanity\u2019s Last Exam,\u201d that they claim is the hardest test ever administered to A.I. systems. Humanity\u2019s Last Exam is the brainchild of Dan Hendrycks, a well-known A.I. safety researcher and director of the Center for AI Safety. (The test\u2019s original name, \u201cHumanity\u2019s Last Stand,\u201d was discarded for being overly dramatic.)We are having trouble retrieving the article content.Please enable JavaScript in your browser settings.Thank you for your patience while we verify access. If you are in Reader mode please exit and\u00a0log into\u00a0your Times account, or\u00a0subscribe\u00a0for all of The Times.Thank you for your patience while we verify access.Already a subscriber?\u00a0Log in.Want all of The Times?\u00a0Subscribe.<\/p>\n","protected":false},"author":1,"featured_media":20656,"comment_status":"close","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-20654","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/medexperts.pro\/index.php?rest_route=\/wp\/v2\/posts\/20654","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/medexperts.pro\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/medexperts.pro\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/medexperts.pro\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/medexperts.pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=20654"}],"version-history":[{"count":2,"href":"https:\/\/medexperts.pro\/index.php?rest_route=\/wp\/v2\/posts\/20654\/revisions"}],"predecessor-version":[{"id":20657,"href":"https:\/\/medexperts.pro\/index.php?rest_route=\/wp\/v2\/posts\/20654\/revisions\/20657"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/medexperts.pro\/index.php?rest_route=\/wp\/v2\/media\/20656"}],"wp:attachment":[{"href":"https:\/\/medexperts.pro\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=20654"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/medexperts.pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=20654"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/medexperts.pro\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=20654"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}