Gadgets. Tech. Data Warehouse. Business Intelligence. NFL. College Football.
880 stories

Six Gotchas with Running Docker Containers on Hadoop

1 Share

Despite the potential value of containerizing workloads on Hadoop, Cloudera’s Daniel Templeton recommends waiting for Hadoop 3.0 before deploying Docker containers, citing security issues and other caveats.

“I thought of titling this, ‘It’s cool, but you can’t use it.’ There’s a lot of potential here, but until 3.0 — [it’s] not going to solve your problems,” he told those attending ApacheCon North America in Miami last week.

Templeton, who is a software engineer on the YARN development team at Cloudera, delved into the Docker support (download) provided by the Hadoop LinuxContainerExecutor as well as discussed when there might be better alternatives. He stipulated that he was talking about Docker on Hadoop, not Hadoop on Docker, which he called “an entirely different story.”

“I’ve got a Hadoop cluster. I want to execute my workloads in Docker containers,” he explained.

Hadoop’s YARN scheduler supports Docker as an execution engine for submitted applications, but there are a handful of things you should understand before you enter this brave new world of Docker on YARN, he said, explaining:

1. The application owner must exist in the Docker container

Currently, with Docker, when you run a container, you specify a user to run it as. If you specify the UID — not the username — and if the UID doesn’t exist, it will spontaneously create it for you. This remapping won’t work well with large numbers of images, where the user needs to be specified beforehand. Otherwise, you can’t access anything. You can’t access your launch script and you can’t write your logs; therefore it’s broken.

“There is no good way to deal with this. The discussion is YARN-4266. If you have a brilliant idea how to fix this, jump in on it,” he said. The approach taken by YARN-4266 “might not get exactly what you wanted, but it’s the least destructive thing we could think of doing. …This is one I don’t see resolving soon until Docker extends what they let you do,” he said.

From Daniel Templeton’s presentation.

2. Docker containers won’t be independent of the environment they run in

One of the chief benefits of Docker containers is their portability. Guess what? They won’t be very portable in Hadoop. If you want HDFS access, if you need to be able to deserialize your tokens, if you need a framework like MapReduce, if you’re doing Spark — you’ve got to have those binaries or those jars in your image. And versions have to line up.

There is a patch posted on this. The patch allows white-listed volume mounts, and you as an administrator can say, “These directories are allowed to be mounted into Docker containers.” And you can specify for those directories to be mounted when you submit your job. Problem solved as long as administrators pay attention to the fact that it could be running as root in the container, so don’t let them mount anything that could screw it up, he said.

3. Large images may cause failures

There is currently nothing in YARN to do with Docker image caching. When you execute your job, that docker_run will implicitly pull the image from the repo. Spark and MapReduce both have a 10-minute timeout. If you have an image in the network that takes more than 10 minutes to download, your job will fail. If you persistently resubmit, it eventually will land on a node that you’ve already tried it on, and it will run. But that’s not the greatest solution.

YARN 3854 is a first step, not a solution. It lets YARN localize images in the same way it localizes data. In YARN, you can say, “I’m submitting this application, and this is the data, the ancillary libraries — the whatever the heck it — that this job is going to need. Please distribute it to all the nodes where my job will run.” And YARN will do that. The problem is that will not save you from the 10-minute timeout. So there’s more work to do there.

4. There is no real support for secure repos.

Docker stores its credentials for accessing a secure repo in a client_config, which is always set to your .docker/config.json. You have no way from YARN to change that. That means when you’re accessing a secure repo, you’re subject to the .docker/config.json file in your user’s home directory on whatever node manager you land on. That’s probably not what you want. There is a JIRA for that, however, 5428, which will make it configurable.

5. There is only basic support for networks

“When you’re thinking Docker on YARN, you start thinking about Kubernetes, Mesos, that type of thing. Kubernetes gives you this really nice facility for doing network management, right? You submit jobs and you say, ‘This is part of the network and that’s part of the network.’ And networks magically materialize and CNS routing is handled, and the world’s a wonderful place with puppies and unicorns,” he said.

YARN does not offer you that. It does not offer the notion of pods where you can say, “These applications are all part of the same pod. Go run them together and share the network.” There’s no notion of port mapping built in. There’s no real automated management over the network. Instead, you can explicitly create networks in Docker on all your node manager machines, then you can request those networks. But that’s it.

6. There are massive security implications

Some people are paranoid about this, though he says he’s not: You can execute privileged containers. A privileged container in Docker gets to peek into the underlying operating system, access to things like slash-proc and devices. You can turn that off or limit it to a certain set of users, so it is controlled, but you have to be aware of it.

The other side of the coin is you can only do terrible things to the underlying OS if you’re running as root in the container. At this point, YARN provides you no way to specify your user. In the future it likely will. “There are security implications with Docker on Hadoop that you really have to think through.”

Hadoop 3.0

While some Docker fixes are in Hadoop 2.8, they’re not enough to be useful, according to Templeton. Among the 3.0 features not in 2.8:

  • Mounts localized file directories as volumes
  • cgroups support
  • Support for different networking options
  • Documentation

The Hadoop 3.0 release is scheduled by the end of the year, according to release manager Andrew Wang, also a software engineer at Cloudera. It’s undergoing two alphas, and a third alpha is planned before it goes to beta.

Its major feature will be improved Hbase erasure coding, which will provide users with 1.5 times the storage, meaning they can save half the cost of hard disks. This reworking of storage will have a massive impact on users of YARN and MapReduce, Wang said in a separate interview.

The project has been working with major users including Yahoo, Twitter and Microsoft to ensure compatibility with existing systems and enable rolling upgrades without pain, Wang said.

Feature image via Pixabay.

The post Six Gotchas with Running Docker Containers on Hadoop appeared first on The New Stack.

Read the whole story
3 days ago
santa clara, CA
Share this story

Can faux meat produce meaty profits? Entrepreneurs survey the food frontier

1 Share
Josh Balk and Hampton Creek cookies
Josh Balk, a co-founder of Hampton Creek Foods, grins over a spread of cookies made with Hampton Creek’s vegan cookie dough. (GeekWire Photo / Alan Boyle)

Is there money to be made by going meatless? Substitutes for meat, dairy and eggs have been around for decades, as demonstrated by the success of Seattle-based Field Roast Grain Meat Co., but new technologies may well give what’s now known as “clean meat” a boost.

“I don’t know of any companies that are true innovators in this space that are flailing,” said Chris Kerr, investment manager at New Crop Capital, a D.C.-based venture capital firm that specializes in the food frontier.

Kerr was among the experts speaking at a survey of the marketplace for clean meat – that is, meat products that are essentially grown from cells in a vat rather than animals in a feedlot – as well as for plant-based proteins like Field Roast. Monday’s presentation was organized by the University of Washington’s CoMotion Labs in collaboration with the Good Food Institute, a clean-meat advocacy group.

Clean meat made a splash in 2013 when Dutch researcher Mark Post served up a hamburger built from lab-grown stem cells, at a cost of $330,000 for the burger.

Since then, a number of startups have been working to bring that cost down. Post formed a company called Mosa Meat to commercialize the technology. Other cultured-meat ventures include Memphis Meats in the U.S. and SuperMeat in Israel.

Meanwhile, other ventures are working to make plant-based proteins more palatable for fans of meat, dairy and egg products. Hampton Creek Foods, for example, offers mayonnaise, salad dressings, cookies and cookie dough that should pass muster with the strictest vegans. Other entrants – including Impossible Foods, Beyond Meat, New Wave Foods and Miyoko’s Kitchen – are working on newfangled plant-based versions of burgers, seafood, cheese and butter.

Food frontier panel at CoMotion
An event on the food frontier, presented by UW CoMotion Labs, featured Christie Lagally. a senior scientist at the Good Food Institute; Josh Balk, a co-founder of Hampton Creek Foods; and Amy Webster of the Humane Society of the United States. (GeekWire Photo / Alan Boyle)

One of the motivations for marketing (and eating) meat substitutes is to make a dent in the billions of animals and sea creatures that are killed every year to fuel humanity’s appetite.

Another is a realization that livestock agriculture will be too inefficient to feed the estimated 9.7 billion people who will be living on Earth by 2050. By one measure, it takes 40 calories of energy to produce each calorie of food output from beef.

“We actually see our food system as being kind of a disaster,” Kerr said.

Then there’s the profit angle: The next frontier for clean meat and plant-based protein is to produce products that are trendier and more affordable. That’s what it’ll take to expand the market from those who are committed to a meatless lifestyle or sustainable agriculture to the price-conscious mass market.

“What I don’t think has been made is a super-cheap nugget that can displace chicken,” said Josh Balk, a co-founder of Hampton Creek who is now vice president for farm animal protection at the Humane Society of the United States. “If someone would create that, I guarantee that is going to be a coup for this business.”

Washington state could be well-placed to play a role in the faux meat industry: Eastern Washington ranks among the nation’s biggest producers of pulse crops – dry peas, lentils and chickpeas – which happen to be the readiest sources for plant-based meat substitutes.

“Right now, peas are a pretty hot item,” Kerr said.

Kerr said Western Washington’s ports could provide the channels for sending those frontier foods out to the rest of the world. “I think it’d be great,” he said.

But David Lee, president and founder of Field Roast, said that over the past 20 years, he’s learned a lesson that newer entrants would do well to emulate.

“Our fundamental innovation is, we set out not to imitate animal meat,” Lee said. Instead, the company focused on a process that takes natural ingredients and transforms them into meat substitutes that can stand on their own – for example, smoked apple sage, Field Roast’s top-selling sausage.

“People want to know where their food comes from,” Lee said. “If you bite into a Boca Burger, where’s your field of reference?”

Clean meat has yet to face a true market test. So is the food frontier poised for a bloodless revolution?

“There are two answers to that question,” Lee told GeekWire. “On the one hand, there’s opportunity. But on the other hand, there’s a lot of established companies, there’s only so much room on the shelf, and when you’re the first, second or third out in the market, that’s a good position to be in.

“We’re lucky to have been in that position. … But I think it’s more difficult now for companies coming up. There’s a lot of pressure on the shelf right now.”

Read the whole story
5 days ago
santa clara, CA
Share this story

Build your own web search service with Bing Custom Search

1 Share

Today at Build 2017, we announced the release of our latest addition to the Microsoft Cognitive Services portfolio – Bing Custom Search. Coming at a time when there is a demand for tailored search experiences, Bing Custom Search is an exciting new development.

Bing Custom Search is a commercial-grade solution that allows you to create a highly-customized web search experience that delivers dramatically better and more relevant results from a targeted web space. Now available as a free trial on the Microsoft Cognitive Services website, additional availability is planned for later this year.

In many ways, web search engines are now the gateways to information. By making it possible for you to create a custom web search service, Bing Custom Search opens up new possibilities for you to find knowledge about the things you deeply care about in many different ways. Our goal is to democratize access to information tailored to your area of interest and focused on a particular subset of the web.

While Bing Web Search API allows you to search over the entire web, Bing Custom Search allows you to select the slices of the web that you want to search over and control the ranking when searching over your targeted web space. You can programmatically retrieve your custom search results with Bing Web Search API , using an additional query parameter.

Build it quickly


With a straightforward UI, Bing Custom Search enables you to create your own web search engine without a line of code. Setting-up a web search becomes easy, fast and enjoyable.

The core technology works in three steps: it identifies on-topic sites, applies the Bing ranker and delivers relevant search results while allowing you to adjust the parameters at any time. You can specify the slices of the web to draw from and explore site suggestions to intelligently expand the scope of your search domain. Also, pin the websites that you care about most to the top, which will deliver dramatically better and more relevant search results for your area of interest.

Bing Custom Search Diagram

Ultimately, Bing Custom Search allows you to leverage the power of Bing’s globally operating search backend (i.e., index, ranking and document processing) to build a search that fits your needs.

For example, if you are an enthusiastic bike touring blogger, you might want to have an awesome bike touring search integrated into your blog. Bing Custom Search allows you to build such a targeted search with only few steps.

Bing Custom Search - Bike Tours example

It is very easy to plug the custom search solution into your blog and share it with like-minded people.

Ad free and commercial grade


Displaying the results retrieved via Custom Search is totally ad free – no matter how much or how little of the service you use. It empowers businesses of any size, hobbyists or entrepreneurs to design and deploy web search applications for any possible scenario. For example, enthusiasts can plug it into their private websites to create a web search for fellow enthusiasts, and businesses can leverage it to set up a high-coverage web search quickly and affordably.

Bing Custom Search Preview webpage

As a commercial-grade solution, Bing Custom Search empowers you to design and deploy applicable search experiences for unlimited scenarios. Also, you have API access to your search results – giving you the capability to present the results as your customers want to receive them.

Get started


To get started with the Bing Custom Search, go to, or Bing Custom Search on Azure to sign up for a trial key.

We are excited to introduce Bing Custom Search to the developer community and are eager to get feedback about how you are using custom search and what you would like to see in the service. The team is steadily working to make our APIs even better with each release, so we want to hear from you.

You can contact the team at

- The Bing Team

Read the whole story
10 days ago
santa clara, CA
Share this story

SQL Server 2017 on Linux surpasses 1 million Docker pulls as the next preview version rolls out

1 Comment

This post was authored by Rohan Kumar, General Manager, Database Systems Group

SQL Server 2017 makes it easier and simpler to work with data, with more deployment options than before and monthly preview releases offering regular innovation and improvements. The momentum behind these new options is clear. We are excited to mark a new milestone: Last week, SQL Server on Linux passed 1 million pulls of its container image! The image has been on Docker Hub for the six months since we first launched the SQL Server on Linux public preview in November 2016, with steadily growing customer use. In fact, we now have customers like dv01 going into production with SQL Server 2017 in Docker containers using the production support agreement from our Early Adoption Program (EAP). The container image is also available in the Docker Store, where it’s currently one of the featured images.

Customer interest in containers is high because of the benefits for production, and especially development and test: consistent and reliable behavior across environments, in a lightweight and easy- to-use format. Containers are fast to set up, can easily be stopped and started, and give users the ability to spin up multiple containers together using tools like docker-compose to easily start and interconnect database, application, and other services containers in a micro-services architecture.

SQL Server on Linux containers has been tested extensively in our test lab over the course of SQL Server 2017 public previews. We have been deploying SQL Server on a 150-node Kubernetes cluster in Azure to test each successive monthly Community Technology Preview (CTP). For each test pass, we automatically deploy 750 containers and run over a million tests. In addition to Kubernetes, we are testing on other container platforms with our partners and the community, including Red Hat OpenShift, Docker Swarm, and Mesosphere DC/OS.

Financial technology startup cuts database management time by 90 percent

Customers are already adopting SQL Server in containers. dv01 is a Wall Street startup, offering a reporting and analytics platform to institutional investors interested in greater insight into consumer lending markets. dv01 had initially based its solution on PostgreSQL and Amazon Redshift, but moved to SQL Server 2016 in Windows Azure Virtual Machines for faster query response times and scalability as its data grew. Because the firm runs all its other workloads on Linux, dv01 signed up for the Early Adoption Program for SQL Server 2017 to get Microsoft advice and assistance on migrating its solution to SQL Server on Linux. This move will help the company avoid managing multiple operating systems within its environment. It opted to deploy the application to production on Docker Engine, using a SQL Server 2017 on Linux image. Its choice to implement SQL Server and Docker containers has cut database management time by 90 percent, freeing its development team to focus on adding new capabilities to the product. To learn more about dv01’s SQL Server 2017 journey, you can read its story here.

“SQL Server 2016 offered the combination of performance and scalability that we needed,” said Dean Chen, VP of Engineering, dv01. “Expensive queries that were taking 30 seconds or more with our previous system now take 1-2 seconds, which means we’re able to do analytics queries in close to real time for our users.”

Making SQL Server on a Linux Docker container easy

With SQL Server 2017 CTP 2.1, available today, we continue to add to the manageability features for SQL Server on Linux Docker containers. We have introduced the ability to configure the SQL Server configuration settings through environment variables passed as parameters to docker run. This enables some of the most common SQL Server configuration scenarios in Docker containers, such as setting the server collation when creating a new SQL Server instance in a container. If you’d like to learn more about the SQL Server 2017 CTP 2.1 release, read our detailed blog for information on the other enhancements and how to get started with the preview.

We want to make it as easy as possible to get started with this technology. If you’d like to learn about how to get started with building a data-centric CI/CD pipeline using SQL Server on Linux containers, join SQL Server engineers Travis Wright and Tobias Ternstrom for this how-to video from the Microsoft Build event for developers.

Reasons to consider running SQL Server in containers

In many ways, container technology is at an inflection point much like hypervisors were 15 years ago. The benefits are immense and increasing every day and include the following:

  • Reduced size on disk for better hardware utilization
  • Reduced CPU/memory consumption, which also results in better hardware utilization
  • Reduced deployment size for faster deployments and scale up/down
  • Reduced patching for less effort, less vulnerability, less down time
  • Better composability using layers of Images, applications defined as multiple containers
  • Easier sharing with Docker Hub and Registry

But in some cases, there are still areas for improvement. For example, configuring high availability in a container platform is not well defined yet. Persistence to local and remote storage is still relatively new and is a complex area of any container platform. Because containers are still new, finding people that are experienced in working with containers can be a challenge. We look forward to working with the community to expand on and refine the capabilities of container platforms in the months to come.

The road ahead for SQL Server in containers

We are targeting support for SQL Server on Linux containers by General Availability of SQL Server 2017 later this year. Customers in our Early Adoption Program can deploy into production on containers right now with full support of our support and engineering teams. We have created a GitHub repository called mssql-docker where you can get Dockerfiles, example entrypoint scripts, and provide us with feedback and feature requests. It’s also a great place to engage with other people running SQL Server in containers.

We are also working on testing SQL Server in Windows containers, including SQL Server 2016 SP1 Developer and Express editions and SQL Server 2017 Evaluation edition. The Windows container images are available now on Docker Hub for testing and experimentation as well.

Thanks again to our community for your interest in and support for SQL Server in containers. We look forward to your continued feedback.

–Rohan Kumar, General Manager, Database Systems Group

Read the whole story
11 days ago
what the wha???
santa clara, CA
Share this story

Making search conversational: Finding and chatting with bots on Bing

1 Share

We believe the future of search will be more conversational, instead of the traditional query to document approach that searchers have become accustomed to. While having a conversation with your search engine may still be years away, chatbots can be a great way to get your questions answered and today we're making Bing even more powerful by adding relevant chatbots to your search results.

Find chatbots for Skype, Facebook Messenger and more

Bing is now the best place to find bots across multiple messaging platforms. Just run a search such as 'travel bots' and you can add relevant bots directly from the Bing search results page to your favorite messaging platform including Skype, Facebook Messenger, Slack and Telegram.

Travel Bots

Plan dinner out

Chatbots on Bing can help you plan a dinner out. Starting in Seattle, now when you search on Bing, many of your favorite restaurants have bots that you can chat with to learn things such as if the restaurant has vegetarian or gluten-free options, where you can park and if they accept certain credit cards. This information is often not readily available or difficult to find on restaurant websites. In the coming months, we will be expanding this program to more US metropolitan areas.

Monsoon Seattle search with chatbot

Coming soon!

At Microsoft Build 2017 we demonstrated additional bot innovations that are currently being tested with our users. We expect these bots to be broadly available in the near future.

Learn something new

The Bing team is applying cutting edge AI techniques to solve problems in Machine Reading Comprehension (MRC) and conversational understanding at web scale. Built on Bot Framework, Bing InfoBot applies deep learning techniques to Bing’s knowledge of the web to automatically build chatbots from existing web content. This means you can converse with an InfoBot to get answers to your questions from any web site. For example, you can chat with the Bing InfoBot to get answers from sites like Wikipedia as you can see in the screenshot below. Bing InfoBot will enable us to “botify” the web without site owners needing to do custom development, which will move search towards a more conversational model at scale. Bing InfoBot is currently being evaluated with a small group of users on multiple domains including,,, and more.

Singapore demographics search with chatbot

Build your own bot for Bing

If you are a developer, you can build your own chatbot for Bing. Just build your bot using Microsoft Bot Framework and then publish your bot to the Bing channel. After your bot has been reviewed and approved, users can discover and chat with your bot in their Bing search results. We showed bots from several partners integrated in Bing at Build. To learn more about developing bots for Bing, check out the Microsoft Bot Framework documentation and the blog post about the Bot Framework announcements at Build 2017.

Build Contoso bot

Happy chatting!

- The Bing Team

Read the whole story
14 days ago
santa clara, CA
Share this story

Introducing OneDrive Files On-Demand and other features making it easy to access files


Today’s post was written by Jeff Teper, corporate vice president for the Office, OneDrive and SharePoint teams.

As people create and collaborate on more files, take more photos and work across multiple devices, it’s increasingly important to access your important content, both from your work and personal life—all in one place. You shouldn’t have to worry about whether there is enough storage on your device or if you can access your files on an airplane.

Today, we are excited to share a set of new features that will allow you to see and access all your files on Windows 10, be more productive offline on your mobile devices and quickly share files on iOS.

OneDrive Files On-Demand—access all your files without using up your device storage

At Microsoft Build 2017, Joe Belfiore announced that the new OneDrive Files On-Demand feature will be delivered with the Windows 10 Fall Creators Update. With Files On-Demand, you can access all your files in the cloud without having to download them and use storage space on your device. You don’t have to change the way you work, because all your files—even online files—can be seen in File Explorer and work just like every other file on your device.

Files On-Demand also allows you to open online files from within desktop or Windows store apps using the Windows file picker. Simply select the file you want to open in the file picker, and the file will automatically download and open in your app. Furthermore, you’re covered in both your home and professional life since it works with your personal and work OneDrive, as well as your SharePoint Online team sites.

This has been the #1 requested feature for OneDrive on UserVoice, and we’re excited to deliver it in a simple and powerful new way.

You can see that the folder selected in the SharePoint Online team site contains 1.37 TB of content but takes 0 bytes of storage on the disk.

New status icons in File Explorer make it easy to know whether your files are locally available or online files. For files that you need to access when you don’t have an internet connection, you can easily make files or folders always available by right-clicking and selecting Always keep on this device.

Right-click and select Always keep on this device to make files and folders accessible when you do not have an internet connection.

Online files will automatically download and become locally available when you need them. Simply double-click a file in File Explorer or open it from within an app. Your online files will always be visible even if you are offline. Now you won’t have to make tough decisions about which files to sync to your PC.

Double-click an online file and it will automatically download and open.

In addition to users, Files On-Demand benefits organizations and IT admins. Today, when someone syncs a SharePoint Online team site, files are re-downloaded on all synced devices when anyone makes a change. Files On-Demand will reduce network bandwidth by eliminating the need to continuously sync shared files on every synced device as teams collaborate.

Files On-Demand is coming to Windows Insider Preview early this summer and will be publicly available with the Windows 10 Fall Creators Update. Tune in to the SharePoint Virtual Summit on May 16, 2017 to learn more about Files On-Demand and how to create a connected workplace in Office 365 with OneDrive and SharePoint.

OneDrive Offline Folders—save entire folders for offline access on Android and iOS

In addition to Files On-Demand, we want to share a new feature with you to help you stay productive on your mobile device when you don’t have an internet connection, like on those long flights or weekends up at the cabin. OneDrive Offline Folders lets you save folders to your mobile device and open them when you don’t have an internet connection. Changes made by other users to the files while you’re offline will automatically be updated when you have an internet connection again. This new feature is now available on Android devices to Office 365 Personal and Home subscribers and OneDrive business accounts. We expect to roll it out to iOS in the next few months.

Select a folder and click the Parachute icon to make a folder and its contents available offline.

OneDrive for iMessage—quickly share OneDrive files on iOS devices

With OneDrive for iMessage, we made it even easier to share files on your iOS devices by allowing you to quickly share documents and photos with friends and family without leaving your iMessage conversation. Choose to share an entire folder or only a file and instantly preview documents and photos shared with you in iMessage. Update to the latest version of OneDrive and enable OneDrive for iMessage on your device to try today.

Open OneDrive in iMessage and click a file to share it in your conversation.

Let us know what you think. Please share your thoughts and ideas through the Microsoft Technical Community and UserVoice. There’s so much more to come with OneDrive!

—Jeff Teper

The post Introducing OneDrive Files On-Demand and other features making it easy to access files appeared first on Office Blogs.

Read the whole story
14 days ago
santa clara, CA
Share this story
Next Page of Stories